2. API Function

2.1. Driver

This section describes the Driver Initialization and Version.

TOPS_PUBLIC_API topsError_t topsInit(unsigned int flags)

Explicitly initializes the TOPS runtime.

Most TOPS APIs implicitly initialize the TOPS runtime. This API provides control over the timing of the initialization.

TOPS_PUBLIC_API topsError_t topsDriverGetVersion(int *driverVersion)

Returns the approximate TOPS driver version.

The version is returned as (1000 major + 10 minor). For example, topsrider 2.2 would be represented by 2020.

Parameters

driverVersion[out]

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsRuntimeGetVersion(int *runtimeVersion)

Returns the approximate TOPS Runtime version.

The version is returned as (1000 major + 10 minor). For example, topsrider 2.2 would be represented by 2020.

Parameters

runtimeVersion[out]

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceGet(topsDevice_t *device, int ordinal)

Returns a handle to a compute device.

Parameters
  • device[out]

  • ordinal[in]

Returns

topsSuccess, topsErrorInvalidDevice

TOPS_PUBLIC_API topsError_t topsDeviceComputeCapability(int *major, int *minor, topsDevice_t device)

Returns the compute capability of the device.

Parameters
  • major[out]

  • minor[out]

  • device[in]

Returns

topsSuccess, topsErrorInvalidDevice

TOPS_PUBLIC_API topsError_t topsDeviceGetName(char *name, int len, topsDevice_t device)

Returns an identifier string for the device.

Warning

these versions are ignored.

Parameters
  • name[out]

  • len[in]

  • device[in]

Returns

topsSuccess, topsErrorInvalidDevice

TOPS_PUBLIC_API topsError_t topsDeviceGetPCIBusId(char *pciBusId, int len, int device)

Returns a PCI Bus Id string for the device, overloaded to take int device ID.

Parameters
  • pciBusId[out]

  • len[in]

  • device[in]

Returns

topsSuccess, topsErrorInvalidDevice

TOPS_PUBLIC_API topsError_t topsDeviceGetByPCIBusId(int *device, const char *pciBusId)

Returns a handle to a compute device.

Parameters
  • device[out] handle

  • pciBusId[in]

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceTotalMem(size_t *bytes, topsDevice_t device)

Returns the total amount of memory on the device.

Parameters
  • bytes[out]

  • device[in]

Returns

topsSuccess, topsErrorInvalidDevice

2.2. Device

This section describes the device management functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsDeviceSynchronize(void)

Waits on all active streams on current device.

When this command is invoked, the host thread gets blocked until all the commands associated with streams associated with the device. TOPS does not support multiple blocking modes (yet!).

Returns

topsSuccess

TOPS_PUBLIC_API topsError_t topsDeviceReset(void)

The state of current device is discarded and updated to a fresh state.

Calling this function deletes all streams created, memory allocated, kernels running, events created. Make sure that no other thread is using the device or streams, memory, kernels, events associated with the current device.

Returns

topsSuccess

TOPS_PUBLIC_API topsError_t topsSetDevice(int deviceId)

Set default device to be used for subsequent tops API calls from this thread.

Sets device as the default device for the calling host thread. Valid device id’s are 0… (topsGetDeviceCount()-1).

Many TOPS APIs implicitly use the “default device” :

  • Any device memory subsequently allocated from this host thread (using topsMalloc) will be allocated on device.

  • Any streams or events created from this host thread will be associated with device.

  • Any kernels launched from this host thread (using topsLaunchKernel) will be executed on device (unless a specific stream is specified, in which case the device associated with that stream will be used).

This function may be called from any host thread. Multiple host threads may use the same device. This function does no synchronization with the previous or new device, and has very little runtime overhead. Applications can use topsSetDevice to quickly switch the default device before making a TOPS runtime call which uses the default device.

The default device is stored in thread-local-storage for each thread. Thread-pool implementations may inherit the default device of the previous thread. A good practice is to always call topsSetDevice at the start of TOPS coding sequency to establish a known standard device.

Parameters

deviceId[in] Valid device in range 0…(topsGetDeviceCount()-1).

Returns

topsSuccess, topsErrorInvalidDevice, #topsErrorDeviceAlreadyInUse

TOPS_PUBLIC_API topsError_t topsGetDevice(int *deviceId)

Return the default device id for the calling host thread.

TOPS maintains an default device for each thread using thread-local-storage. This device is used implicitly for TOPS runtime APIs called by this thread. topsGetDevice returns in * device the default device for the calling host thread.

Parameters

deviceId[out] *deviceId is written with the default device

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsGetDeviceCount(int *count)

Return number of compute-capable devices.

Returns in *count the number of devices that have ability to run compute commands. If there are no such devices, then topsGetDeviceCount will return topsErrorNoDevice. If 1 or more devices can be found, then topsGetDeviceCount returns topsSuccess.

Parameters

[output] – count Returns number of compute-capable devices.

Returns

topsSuccess, topsErrorNoDevice

TOPS_PUBLIC_API topsError_t topsDeviceGetAttribute(int *pi, topsDeviceAttribute_t attr, int deviceId)

Query for a specific device attribute.

Parameters
  • pi[out] pointer to value to return

  • attr[in] attribute to query

  • deviceId[in] which device to query for information

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsGetDeviceProperties(topsDeviceProp_t *prop, int deviceId)

Returns device properties.

Populates topsGetDeviceProperties with information for the specified device.

Parameters
  • prop[out] written with device properties

  • deviceId[in] which device to query for information

Returns

topsSuccess, topsErrorInvalidDevice

TOPS_PUBLIC_API topsError_t topsResourceReservationGetAttribute(void *data, topsResourceReservationAttribute_t attr, int clusterId)

Query for a attribute of current device’s resource reservation.

Parameters
  • data[out] pointer to value to return

  • attr[in] attribute to query

  • clusterId[in] which cluster to query for information

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceGetLimit(size_t *pValue, enum topsLimit_t limit)

Get Resource limits of current device.

Parameters
  • pValue[out]

  • limit[in]

Returns

topsSuccess, #topsErrorUnsupportedLimit, topsErrorInvalidValue Note: Currently, only topsLimitMallocHeapSize is available

TOPS_PUBLIC_API topsError_t topsGetDeviceFlags(unsigned int *flags)

Gets the flags set for current device.

Parameters

flags[out]

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsSetDeviceFlags(unsigned flags)

The current device behavior is changed according the flags passed.

topsDeviceScheduleSpin : TOPS runtime will actively spin in the thread which submitted the work until the command completes. This offers the lowest latency, but will consume a CPU core and may increase power.

topsDeviceScheduleYield : The TOPS runtime will yield the CPU to system so that other tasks can use it. This may increase latency to detect the completion but will consume less power and is friendlier to other tasks in the system.

topsDeviceScheduleBlockingSync : This is a synonym for topsDeviceScheduleYield.

topsDeviceScheduleAuto : Use a heuristic to select between Spin and Yield modes. If the number of TOPS contexts is greater than the number of logical processors in the system, use Spin scheduling. Else use Yield scheduling.

topsDeviceMapHost : Allow mapping host memory. On GCU, this is always allowed and the flag is ignored.

topsDeviceLmemResizeToMax :

Warning

GCU silently ignores this flag.

Parameters

flags[in] The schedule flags impact how TOPS waits for the completion of a command running on a device.

Returns

topsSuccess, topsErrorInvalidDevice, #topsErrorSetOnActiveProcess

TOPS_PUBLIC_API topsError_t topsChooseDevice(int *device, const topsDeviceProp_t *prop)

Device which matches topsDeviceProp_t is returned.

Parameters
  • device[out] The device ID

  • prop[in] The device properties pointer

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsIpcGetMemHandle(topsIpcMemHandle_t *handle, void *devPtr)

Gets an interprocess memory handle for an existing device memory allocation.

Takes a pointer to the base of an existing device memory allocation created with topsMalloc and exports it for use in another process. This is a lightweight operation and may be called multiple times on an allocation without adverse effects.

If a region of memory is freed with topsFree and a subsequent call to topsMalloc returns memory with the same device address, topsIpcGetMemHandle will return a unique handle for the new memory.

Parameters
  • handle – - Pointer to user allocated topsIpcMemHandle to return the handle in.

  • devPtr – - Base pointer to previously allocated device memory

Returns

topsSuccess, topsErrorInvalidHandle, topsErrorOutOfMemory, topsErrorMapFailed,

TOPS_PUBLIC_API topsError_t topsIpcOpenMemHandle(void **devPtr, topsIpcMemHandle_t handle, unsigned int flags)

Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process.

Maps memory exported from another process with topsIpcGetMemHandle into the current device address space. For contexts on different devices topsIpcOpenMemHandle can attempt to enable peer access between the devices as if the user called topsDeviceEnablePeerAccess. This behavior is controlled by the topsIpcMemLazyEnablePeerAccess flag. topsDeviceCanAccessPeer can determine if a mapping is possible.

Contexts that may open topsIpcMemHandles are restricted in the following way. topsIpcMemHandles from each device in a given process may only be opened by one context per device per other process.

Memory returned from topsIpcOpenMemHandle must be freed with topsIpcCloseMemHandle.

Calling topsFree on an exported memory region before calling topsIpcCloseMemHandle in the importing context will result in undefined behavior.

Note

No guarantees are made about the address returned in *devPtr. In particular, multiple processes may not receive the same address for the same handle.

Parameters
  • devPtr – - Returned device pointer

  • handle – - topsIpcMemHandle to open

  • flags – - Flags for this operation. currently only flag 0 is supported

Returns

topsSuccess, topsErrorMapFailed, topsErrorInvalidHandle, topsErrorTooManyPeers

TOPS_PUBLIC_API topsError_t topsIpcCloseMemHandle(void *devPtr)

Close memory mapped with topsIpcOpenMemHandle.

Unmaps memory returned by topsIpcOpenMemHandle. The original allocation in the exporting process as well as imported mappings in other processes will be unaffected.

Any resources used to enable peer access will be freed if this is the last mapping using them.

Parameters

devPtr – - Device pointer returned by topsIpcOpenMemHandle

Returns

topsSuccess, topsErrorMapFailed, topsErrorInvalidHandle,

TOPS_PUBLIC_API topsError_t topsIpcGetEventHandle(topsIpcEventHandle_t *handle, topsEvent_t event)

Gets an opaque interprocess handle for an event.

This opaque handle may be copied into other processes and opened with topsIpcOpenEventHandle. Then topsEventRecord, topsEventSynchronize, topsEventQuery may be used in remote processes. The topsStreamWaitEvent is called in local processes only. Operations on the imported event after the exported event has been freed with topsEventDestroy will result in undefined behavior.

Parameters
  • handle[out] Pointer to topsIpcEventHandle to return the opaque event handle

  • event[in] Event allocated with topsEventInterprocess and topsEventDisableTiming flags

Returns

topsSuccess, #topsErrorInvalidConfiguration, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsIpcOpenEventHandle(topsEvent_t *event, topsIpcEventHandle_t handle)

Opens an interprocess event handle.

Opens an interprocess event handle exported from another process with topsIpcGetEventHandle. The returned topsEvent_t behaves like a locally created event with the topsEventDisableTiming flag specified. This event need be freed with topsEventDestroy. Operations on the imported event after the exported event has been freed with topsEventDestroy will result in undefined behavior. If the function is called within the same process where handle is returned by topsIpcGetEventHandle, it will return topsErrorInvalidContext.

Parameters
  • event[out] Pointer to topsEvent_t to return the event

  • handle[in] The opaque interprocess handle to open

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext

TOPS_PUBLIC_API topsError_t topsIpcOpenEventHandleExt(topsEvent_t *event, topsIpcEventHandle_t handle, topsTopologyMapType map, int port)

Opens an interprocess event handle with topsTopologyMapType.

Opens an interprocess event handle exported from another process with topsIpcGetEventHandle. The returned topsEvent_t behaves like a locally created event with the topsEventDisableTiming flag specified. This event need be freed with topsEventDestroy. Operations on the imported event after the exported event has been freed with topsEventDestroy will result in undefined behavior. If the function is called within the same process where handle is returned by topsIpcGetEventHandle, it will return topsErrorInvalidContext.

Note. This function is only supported when the event is exported for a remote process on a different device. If the event is exported for a remote process on the same device, please use topsIpcOpenEventHandle instead.

Parameters
  • event[out] Pointer to topsEvent_t to return the event

  • handle[in] The opaque interprocess handle to open

  • map[in] The link that is expected to pass

  • port[in] ESL port id

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext

TOPS_PUBLIC_API topsError_t topsOpenEventHandle(topsEvent_t *event, topsEvent_t handle)

Opens an event for the same process.

Opens a same process event handle exported from another device with topsEventCreateWithFlags. The returned topsEvent_t behaves like a locally created event with the topsEventDisableTiming flag specified. This event need be freed with topsEventDestroy. Operations on the imported event after the exported event has been freed with topsEventDestroy will result in undefined behavior.

Parameters
  • event[out] Pointer to topsEvent_t to return the event

  • handle[in] The another device event handle to open

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext

TOPS_PUBLIC_API topsError_t topsOpenEventHandleExt(topsEvent_t *event, topsEvent_t handle, topsTopologyMapType map, int port)

Opens an event for the same process with topsTopologyMapType.

Opens a same process event handle exported from another device with topsEventCreateWithFlags. The returned topsEvent_t behaves like a locally created event with the topsEventDisableTiming flag specified. This event need be freed with topsEventDestroy. Operations on the imported event after the exported event has been freed with topsEventDestroy will result in undefined behavior.

Parameters
  • event[out] Pointer to topsEvent_t to return the event

  • handle[in] The another device event handle to open

  • map[in] The link that is expected to pass

  • port[in] ESL port id

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext

2.3. Error

This section describes the error handling functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsGetLastError(void)

Return last error returned by any TOPS runtime API call and resets the stored error code to topsSuccess.

Returns the last error that has been returned by any of the runtime calls in the same host thread, and then resets the saved error to topsSuccess.

Returns

return code from last TOPS called from the active host thread

TOPS_PUBLIC_API topsError_t topsPeekAtLastError(void)

Return last error returned by any TOPS runtime API call.

Returns the last error that has been returned by any of the runtime calls in the same host thread. Unlike topsGetLastError, this function does not reset the saved error code.

Returns

topsSuccess

TOPS_PUBLIC_API const char *topsGetErrorName(topsError_t tops_error)

Return name of the specified error code in text form.

Parameters

tops_error – Error code to convert to name.

Returns

const char pointer to the NULL-terminated error name

TOPS_PUBLIC_API const char *topsGetErrorString(topsError_t topsError)

Return handy text string message to explain the error which occurred.

Warning

: This function returns the name of the error (same as topsGetErrorName)

Parameters

topsError – Error code to convert to string.

Returns

const char pointer to the NULL-terminated error string

2.4. Stream

This section describes the stream management functions of TOPS runtime API.

typedef void (*topsStreamCallback_t)(topsStream_t stream, topsError_t status, void *userData)

Stream CallBack struct

TOPS_PUBLIC_API topsError_t topsStreamCreate(topsStream_t *stream)

Create an asynchronous stream.

Create a new asynchronous stream. stream returns an opaque handle that can be used to reference the newly created stream in subsequent topsStream* commands. The stream is allocated on the heap and will remain allocated even if the handle goes out-of-scope. To release the memory used by the stream, application must call topsStreamDestroy.

Parameters

stream[inout] Pointer to new stream

Returns

topsSuccess, topsErrorInvalidValue

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsStreamCreateWithFlags(topsStream_t *stream, unsigned int flags)

Create an asynchronous stream.

Create a new asynchronous stream. stream returns an opaque handle that can be used to reference the newly created stream in subsequent topsStream* commands. The stream is allocated on the heap and will remain allocated even if the handle goes out-of-scope. To release the memory used by the stream, application must call topsStreamDestroy. Flags controls behavior of the stream. See topsStreamDefault, topsStreamNonBlocking.

Parameters
  • stream[inout] Pointer to new stream

  • flags[in] to control stream creation.

Returns

topsSuccess, topsErrorInvalidValue

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsStreamDestroy(topsStream_t stream)

Destroys the specified stream.

Destroys the specified stream.

If commands are still executing on the specified stream, some may complete execution before the queue is deleted.

The queue may be destroyed while some commands are still inflight, or may wait for all commands queued to the stream before destroying it.

Note: stream resource should be released before process exit

Parameters

stream[in] stream to destroy

Returns

topsSuccess #topsErrorInvalidHandle

TOPS_PUBLIC_API topsError_t topsStreamQuery(topsStream_t stream)

Return topsSuccess if all of the operations in the specified stream have completed, or topsErrorNotReady if not.

This is thread-safe and returns a snapshot of the current state of the queue. However, if other host threads are sending work to the stream, the status may change immediately after the function is called. It is typically used for debug.

Parameters

stream[in] stream to query

Returns

topsSuccess, topsErrorNotReady, #topsErrorInvalidHandle

TOPS_PUBLIC_API topsError_t topsStreamSynchronize(topsStream_t stream)

Wait for all commands in stream to complete.

This command is host-synchronous : the host will block until the specified stream is empty.

This command follows standard null-stream semantics. Specifically, specifying the null stream will cause the command to wait for other streams on the same device to complete all pending operations.

This command honors the topsDeviceLaunchBlocking flag, which controls whether the wait is active or blocking.

Parameters

stream[in] stream identifier.

Returns

topsSuccess, #topsErrorInvalidHandle

TOPS_PUBLIC_API topsError_t topsStreamWaitEvent(topsStream_t stream, topsEvent_t event, unsigned int flags)

Make the specified compute stream wait for an event.

This function inserts a wait operation into the specified stream. All future work submitted to stream will wait until event reports completion before beginning execution.

This function only waits for commands in the current stream to complete. Notably,, this function does not implicit wait for commands in the default stream to complete, even if the specified stream is created with topsStreamNonBlocking = 0.

Parameters
  • stream[in] stream to make wait.

  • event[in] event to wait on

  • flags[in] control operation [must be 0]

Returns

topsSuccess, #topsErrorInvalidHandle

TOPS_PUBLIC_API topsError_t topsStreamAddCallback(topsStream_t stream, topsStreamCallback_t callback, void *userData, unsigned int flags)

Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each topsStreamAddCallback call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.

Parameters
  • stream[in] - Stream to add callback to

  • callback[in] - The function to call once preceding stream operations are complete

  • userData[in] - User specified data to be passed to the callback function

  • flags[in] - topsStreamDefault: non-blocking stream execution; topsStreamCallbackBlocking: stream blocks until callback is completed.

Returns

topsSuccess, #topsErrorInvalidHandle, topsErrorNotSupported

TOPS_PUBLIC_API topsError_t topsStreamWriteValue32(topsDeviceptr_t dst, int value, unsigned int flags)

Write a value to local device memory.

Parameters
  • dst[in] - The device address to write to.

  • value[in] - The value to write.

  • flags[in] - Reserved for future expansion; must be 0.

Returns

topsSuccess, topsErrorInvalidDevicePointer

TOPS_PUBLIC_API topsError_t topsStreamWriteValue32Async (topsDeviceptr_t dst, int value, unsigned int flags, topsStream_t stream __dparm(0))

Write a value to local device memory async.

Parameters
  • dst[in] - The device address to write to.

  • value[in] - The value to compare with the memory location.

  • flags[in] - Reserved for future expansion; must be 0.

  • stream[in] - The stream to synchronize on the memory location.

Returns

topsSuccess, topsErrorInvalidDevicePointer

TOPS_PUBLIC_API topsError_t topsStreamWaitValue32(topsDeviceptr_t dst, int value, unsigned int flags)

Wait on a memory location.

Parameters
  • dst[in] - The memory location to wait on.

  • value[in] - The value to compare with the memory location.

  • flags[in] - Reserved for future expansion; must be 0.

Returns

topsSuccess, topsErrorInvalidDevicePointer

TOPS_PUBLIC_API topsError_t topsStreamWaitValue32Async (topsDeviceptr_t dst, int value, unsigned int flags, topsStream_t stream __dparm(0))

Wait on a memory location async.

Parameters
  • dst[in] - The memory location to wait on.

  • value[in] - The value to compare with the memory location.

  • flags[in] - Reserved for future expansion; must be 0.

  • stream[in] - The stream to synchronize on the memory location.

Returns

topsSuccess, topsErrorInvalidDevicePointer

2.5. Event

This section describes the event management functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsEventCreateWithFlags(topsEvent_t *event, unsigned flags)

Create an event object with the specified flags.

Creates an event object for the current device with the specified flags. Valid values include: -topsEventDefault: Default event create flag. The event will use active synchronization and will support timing. Blocking synchronization provides lowest possible latency at the expense of dedicating a CPU to poll on the event. -topsEventBlockingSync: Specifies that event should use blocking synchronization. A host thread that uses topsEventSynchronize() to wait on an event created with this flag will block until the event actually completes. -topsEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the topsEventBlockingSync flag not specified will provide the best performance when used with topsStreamWaitEvent() and topsEventQuery(). -topsEventInterprocess: Specifies that the created event may be used as an interprocess event by topsIpcGetEventHandle(). topsEventInterprocess must be specified along with topsEventDisableTiming.

Note

  • Note that this function may also return error codes from previous, asynchronous launches.

Parameters
  • event[inout] Returns the newly created event.

  • flags[in] Flags to control event behavior.

Returns

topsSuccess, #topsErrorNotInitialized, topsErrorInvalidValue, topsErrorLaunchFailure, #topsErrorOutOfMemory

TOPS_PUBLIC_API topsError_t topsEventCreate(topsEvent_t *event)

Create an event object.

Creates an event object for the current device using topsEventDefault.

Parameters

event[inout] Returns the newly created event.

Returns

topsSuccess, #topsErrorNotInitialized, topsErrorInvalidValue, topsErrorLaunchFailure, #topsErrorOutOfMemory

TOPS_PUBLIC_API topsError_t topsEventRecord(topsEvent_t event, topsStream_t stream)

Record an event in the specified stream.

Captures in event the contents of stream at the time of this call. event and stream must be on the same TOPS context. Calls such as topsEventQuery() or topsStreamWaitEvent() will then examine or wait for completion of the work that was captured. Uses of stream after this call do not modify event.

topsEventRecord() can be called multiple times on the same event and will overwrite the previously captured state. Other APIs such as topsStreamWaitEvent() use the most recently captured state at the time of the API call, and are not affected by later calls to topsEventRecord(). Before the first call to topsEventRecord(), an event represents an empty set of work, so for example topsEventQuery() would return topsSuccess.

topsEventQuery() or topsEventSynchronize() must be used to determine when the event transitions from “recording” (after topsEventRecord() is called) to “recorded” (when timestamps are set, if requested).

Events which are recorded in a non-NULL stream will transition to from recording to “recorded” state when they reach the head of the specified stream, after all previous commands in that stream have completed executing.

If topsEventRecord() has been previously called on this event, then this call will overwrite any existing state in event.

If this function is called on an event that is currently being recorded, results are undefined

  • either outstanding recording may save state into the event, and the order is not guaranteed.

Parameters
  • event[in] event to record.

  • stream[in] stream in which to record event.

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized, #topsErrorInvalidHandle, topsErrorLaunchFailure

TOPS_PUBLIC_API topsError_t topsEventDestroy(topsEvent_t event)

Destroy the specified event.

Releases memory associated with the event. An event may be destroyed before it is complete (i.e., while topsEventQuery() would return topsErrorNotReady). If the event is recording but has not completed recording when topsEventDestroy() is called, the function will return immediately and any associated resources will automatically be released asynchronously at completion.

Note

Use of the handle after this call is undefined behavior.

Parameters

event[in] Event to destroy.

Returns

topsSuccess, #topsErrorNotInitialized, topsErrorInvalidValue, topsErrorLaunchFailure

TOPS_PUBLIC_API topsError_t topsEventSynchronize(topsEvent_t event)

Wait for an event to complete.

This function will block until the event is ready, waiting for all previous work in the stream specified when event was recorded with topsEventRecord().

If topsEventRecord() has not been called on event, this function returns immediately.

Note:This function needs to support topsEventBlockingSync parameter.

Parameters

event[in] Event on which to wait.

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized, #topsErrorInvalidHandle, topsErrorLaunchFailure

TOPS_PUBLIC_API topsError_t topsEventElapsedTime(float *ms, topsEvent_t start, topsEvent_t stop)

Return the elapsed time between two events.

Computes the elapsed time between two events. Time is computed in ms, with a resolution of approximately 1 us.

Events which are recorded in a NULL stream will block until all commands on all other streams complete execution, and then record the timestamp.

Events which are recorded in a non-NULL stream will record their timestamp when they reach the head of the specified stream, after all previous commands in that stream have completed executing. Thus the time that the event recorded may be significantly after the host calls topsEventRecord().

If topsEventRecord() has not been called on either event, then #topsErrorInvalidHandle is returned. If topsEventRecord() has been called on both events, but the timestamp has not yet been recorded on one or both events (that is, topsEventQuery() would return topsErrorNotReady on at least one of the events), then topsErrorNotReady is returned.

Note, for TOPS Events used in kernel dispatch using topsExtLaunchKernelGGL/topsExtLaunchKernel, events passed in topsExtLaunchKernelGGL/topsExtLaunchKernel are not explicitly recorded and should only be used to get elapsed time for that specific launch. In case events are used across multiple dispatches, for example, start and stop events from different topsExtLaunchKernelGGL/ topsExtLaunchKernel calls, they will be treated as invalid unrecorded events, TOPS will throw error “topsErrorInvalidHandle” from topsEventElapsedTime.

Parameters
  • ms[out] : Return time between start and stop in ms.

  • start[in] : Start event.

  • stop[in] : Stop event.

Returns

topsSuccess, topsErrorInvalidValue, topsErrorNotReady, #topsErrorInvalidHandle, #topsErrorNotInitialized, topsErrorLaunchFailure

TOPS_PUBLIC_API topsError_t topsEventQuery(topsEvent_t event)

Query event status.

Query the status of the specified event. This function will return topsErrorNotReady if all commands in the appropriate stream (specified to topsEventRecord()) have completed. If that work has not completed, or if topsEventRecord() was not called on the event, then topsSuccess is returned.

Parameters

event[in] Event to query.

Returns

topsSuccess, topsErrorNotReady, #topsErrorInvalidHandle, topsErrorInvalidValue, #topsErrorNotInitialized, topsErrorLaunchFailure

2.6. Memory

This section describes the memory management functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsPointerGetAttributes(topsPointerAttribute_t *attributes, const void *ptr)

Return attributes for the specified pointer.

Parameters
  • attributes[out] attributes for the specified pointer

  • ptr[in] pointer to get attributes for

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsPointerGetAttribute(void *data, topsPointer_attribute attribute, topsDeviceptr_t ptr)

Returns information about the specified pointer.

Parameters
  • data[inout] returned pointer attribute value

  • attribute[in] attribute to query for

  • ptr[in] pointer to get attributes for

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDrvPointerGetAttributes(unsigned int numAttributes, topsPointer_attribute *attributes, void **data, topsDeviceptr_t ptr)

Returns information about the specified pointer.

Parameters
  • numAttributes[in] number of attributes to query for

  • attributes[in] attributes to query for

  • data[inout] a two-dimensional containing pointers to memory locations where the result of each attribute query will be written to

  • ptr[in] pointer to get attributes for

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMalloc(void **ptr, size_t size)

Allocate memory on the default accelerator.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

Parameters
  • ptr[out] Pointer to the allocated memory

  • size[in] Requested memory size

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsExtCodecMemHandle(void **pointer, uint64_t dev_addr, size_t size)

convert device memory to efcodec memory handle

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

Parameters
  • ptr[out] Pointer to the allocated memory handle

  • dev_addr[in] Requested memory device address

  • size[in] Requested memory size

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsExtMallocWithFlags(void **ptr, size_t sizeBytes, unsigned int flags)

Allocate memory on the default accelerator.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

Parameters
  • ptr[out] Pointer to the allocated memory

  • sizeBytes[in] Requested memory size

  • flags[in] Type of memory allocation flags only support topsDeviceMallocDefault/topsMallocTopDown/ topsMallocForbidMergeMove/topsMallocPreferHighSpeedMem

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsHostMalloc(void **ptr, size_t size, unsigned int flags)

Allocate device accessible page locked host memory.

If size is 0, no memory is allocated, *ptr returns nullptr, and topsSuccess is returned.

Parameters
  • ptr[out] Pointer to the allocated host pinned memory

  • size[in] Requested memory size

  • flags[in] Type of host memory allocation

Returns

topsSuccess, #topsErrorOutOfMemory

TOPS_PUBLIC_API topsError_t topsHostGetDevicePointer(void **devPtr, void *hostPtr, unsigned int flags)

Get Device pointer from Host Pointer allocated through topsHostMalloc.

Parameters
  • devPtr[out] Device Pointer mapped to passed host pointer

  • hostPtr[in] Host Pointer allocated through topsHostMalloc

  • flags[in] Flags to be passed for extension

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorOutOfMemory

TOPS_PUBLIC_API topsError_t topsHostGetFlags(unsigned int *flagsPtr, void *hostPtr)

Return flags associated with host pointer.

See also

topsHostMalloc

Parameters
  • flagsPtr[out] Memory location to store flags

  • hostPtr[in] Host Pointer allocated through topsHostMalloc

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsHostRegister(void *hostPtr, size_t sizeBytes, unsigned int flags)

Register host memory so it can be accessed from the current device.

Flags:

After registering the memory, use topsHostGetDevicePointer to obtain the mapped device pointer. On many systems, the mapped device pointer will have a different value than the mapped host pointer. Applications must use the device pointer in device code, and the host pointer in device code.

On some systems, registered memory is pinned. On some systems, registered memory may not be actually be pinned but uses OS or hardware facilities to all GCU access to the host memory.

Developers are strongly encouraged to register memory blocks which are aligned to the host cache-line size. (typically 64-bytes but can be obtains from the CPUID instruction).

If registering non-aligned pointers, the application must take care when register pointers from the same cache line on different devices. TOPS’s coarse-grained synchronization model does not guarantee correct results if different devices write to different parts of the same cache block - typically one of the writes will “win” and overwrite data from the other registered memory region.

Parameters
  • hostPtr[out] Pointer to host memory to be registered.

  • sizeBytes[in] size of the host memory

  • flags.[in] See below.

Returns

topsSuccess, #topsErrorOutOfMemory

TOPS_PUBLIC_API topsError_t topsHostUnregister(void *hostPtr)

Un-register host pointer.

See also

topsHostRegister

Parameters

hostPtr[in] Host pointer previously registered with topsHostRegister

Returns

Error code

TOPS_PUBLIC_API topsError_t topsFree(void *ptr)

Free memory allocated by the tops memory allocation API. This API performs an implicit topsDeviceSynchronize() call. If pointer is NULL, the tops runtime is initialized and topsSuccess is returned.

Parameters

ptr[in] Pointer to memory to be freed

Returns

topsSuccess

Returns

topsErrorInvalidDevicePointer (if pointer is invalid, including host pointers allocated with topsHostMalloc)

TOPS_PUBLIC_API topsError_t topsHostFree(void *ptr)

Free memory allocated by the tops host memory allocation API This API performs an implicit topsDeviceSynchronize() call. If pointer is NULL, the tops runtime is initialized and topsSuccess is returned.

Parameters

ptr[in] Pointer to memory to be freed

Returns

topsSuccess, topsErrorInvalidValue (if pointer is invalid, including device pointers allocated with topsMalloc)

TOPS_PUBLIC_API topsError_t topsMemcpy(void *dst, const void *src, size_t sizeBytes, topsMemcpyKind kind)

Copy data from src to dst.

It supports memory from host to device, device to host, device to device and host to host The src and dst must not overlap.

For topsMemcpy, the copy is always performed by the current device (set by topsSetDevice). For multi-gcu or peer-to-peer configurations, it is recommended to set the current device to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument. if this is not done, the topsMemcpy will still work, but will perform the copy using a staging buffer on the host. Calling topsMemcpy with dst and src pointers that do not match the topsMemcpyKind results in undefined behavior.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • kind[in] Memory copy type

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemcpyWithStream(void *dst, const void *src, size_t sizeBytes, topsMemcpyKind kind, topsStream_t stream)

Copy data from src to dst.

It supports memory from host to device, device to host, device to device and host to host The src and dst must not overlap.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • kind[in] Memory copy type

  • stream[in] Stream to enqueue this operation.

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemcpyHtoD(topsDeviceptr_t dst, void *src, size_t sizeBytes)

Copy data from Host to Device.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyDtoH(void *dst, topsDeviceptr_t src, size_t sizeBytes)

Copy data from Device to Host.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyDtoD(topsDeviceptr_t dst, topsDeviceptr_t src, size_t sizeBytes)

Copy data from Device to Device.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyHtoDAsync(topsDeviceptr_t dst, void *src, size_t sizeBytes, topsStream_t stream)

Copy data from Host to Device asynchronously.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • stream[in] Stream to enqueue this operation.

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyDtoHAsync(void *dst, topsDeviceptr_t src, size_t sizeBytes, topsStream_t stream)

Copy data from Device to Host asynchronously.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • stream[in] Stream to enqueue this operation.

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyDtoDAsync(topsDeviceptr_t dst, topsDeviceptr_t src, size_t sizeBytes, topsStream_t stream)

Copy data from Device to Device asynchronously.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • stream[in] Stream to enqueue this operation.

Returns

topsSuccess, #topsErrorDeInitialized, #topsErrorNotInitialized, topsErrorInvalidContext, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsModuleGetGlobal(topsDeviceptr_t *dptr, size_t *bytes, topsModule_t hmod, const char *name)

Returns a global pointer from a module. Returns in *dptr and *bytes the pointer and size of the global symbol located in module hmod. If no variable of that name exists, it returns topsErrorNotFound. Both parameters dptr and bytes are optional. If one of them is NULL, it is ignored and topsSuccess is returned.

Parameters
  • dptr[out] Returns global device pointer

  • bytes[out] Returns global size in bytes

  • hmod[in] Module to retrieve global from

  • name[in] Name of global to retrieve

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotFound, topsErrorInvalidContext

TOPS_PUBLIC_API topsError_t topsGetSymbolAddress(void **devPtr, const void *symbol)

Gets device pointer associated with symbol on the device.

Parameters
  • devPtr[out] pointer to the device associated the symbol

  • symbol[in] pointer to the symbol of the device

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsGetSymbolSize(size_t *size, const void *symbol)

Gets the size of the given symbol on the device.

Parameters
  • symbol[in] pointer to the device symbol

  • size[out] pointer to the size

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyToSymbol (const void *symbol, const void *src, size_t sizeBytes, size_t offset __dparm(0), topsMemcpyKind kind __dparm(topsMemcpyHostToDevice))

Copies data to the given symbol on the device. Symbol TOPS APIs allow a kernel to define a device-side data symbol which can be accessed on the host side. The symbol can be in __constant or device space. Note that the symbol name needs to be encased in the TOPS_SYMBOL macro. This also applies to topsMemcpyFromSymbol, topsGetSymbolAddress, and topsGetSymbolSize.

Parameters
  • symbol[out] pointer to the device symbol

  • src[in] pointer to the source address

  • sizeBytes[in] size in bytes to copy

  • offset[in] offset in bytes from start of symbol

  • kind[in] type of memory transfer

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyToSymbolAsync (const void *symbol, const void *src, size_t sizeBytes, size_t offset, topsMemcpyKind kind, topsStream_t stream __dparm(0))

Copies data to the given symbol on the device asynchronously.

Parameters
  • symbol[out] pointer to the device symbol

  • src[in] pointer to the source address

  • sizeBytes[in] size in bytes to copy

  • offset[in] offset in bytes from start of symbol

  • kind[in] type of memory transfer

  • stream[in] stream identifier

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyFromSymbol (void *dst, const void *symbol, size_t sizeBytes, size_t offset __dparm(0), topsMemcpyKind kind __dparm(topsMemcpyDeviceToHost))

Copies data from the given symbol on the device.

Parameters
  • dptr[out] Returns pointer to destination memory address

  • symbol[in] pointer to the symbol address on the device

  • sizeBytes[in] size in bytes to copy

  • offset[in] offset in bytes from the start of symbol

  • kind[in] type of memory transfer

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyFromSymbolAsync (void *dst, const void *symbol, size_t sizeBytes, size_t offset, topsMemcpyKind kind, topsStream_t stream __dparm(0))

Copies data from the given symbol on the device asynchronously.

Parameters
  • dptr[out] Returns pointer to destination memory address

  • symbol[in] pointer to the symbol address on the device

  • sizeBytes[in] size in bytes to copy

  • offset[in] offset in bytes from the start of symbol

  • kind[in] type of memory transfer

  • stream[in] stream identifier

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyAsync (void *dst, const void *src, size_t sizeBytes, topsMemcpyKind kind, topsStream_t stream __dparm(0))

Copy data from src to dst asynchronously.

For multi-gcu or peer-to-peer configurations, it is recommended to use a stream which is a attached to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument. if this is not done, the topsMemcpy will still work, but will perform the copy using a staging buffer on the host.

Warning

If host or dest are not pinned, the memory copy will be performed synchronously. For best performance, use topsHostMalloc to allocate host memory that is transferred asynchronously.

Warning

topsMemcpyAsync does not support overlapped H2D and D2H copies. For topsMemcpy, the copy is always performed by the device associated with the specified stream.

Parameters
  • dst[out] Data being copy to

  • src[in] Data being copy from

  • sizeBytes[in] Data size in bytes

  • kind[in] type of memory transfer

  • stream[in] stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemset(void *dst, int value, size_t sizeBytes)

Fills the first sizeBytes bytes of the memory area pointed to by dest with the constant byte value value.

Parameters
  • dst[out] Dst Data being filled

  • value[in] Constant value to be set

  • sizeBytes[in] Data size in bytes

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetD8(topsDeviceptr_t dest, unsigned char value, size_t count)

Fills the first sizeBytes bytes of the memory area pointed to by dest with the constant byte value value.

Parameters
  • dst[out] Data ptr to be filled

  • value[in] Constant value to be set

  • count[in] Number of values to be set

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetD8Async (topsDeviceptr_t dest, unsigned char value, size_t count, topsStream_t stream __dparm(0))

Fills the first sizeBytes bytes of the memory area pointed to by dest with the constant byte value value.

topsMemsetD8Async() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.

Parameters
  • dest[out] Data ptr to be filled

  • value[in] Constant value to be set

  • count[in] Number of values to be set

  • stream[in] - Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetD16(topsDeviceptr_t dest, unsigned short value, size_t count)

Fills the first sizeBytes bytes of the memory area pointed to by dest with the constant short value value.

Parameters
  • dest[out] Data ptr to be filled

  • value[in] Constant value to be set

  • count[in] Number of values to be set

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetD16Async (topsDeviceptr_t dest, unsigned short value, size_t count, topsStream_t stream __dparm(0))

Fills the first sizeBytes bytes of the memory area pointed to by dest with the constant short value value.

topsMemsetD16Async() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.

Parameters
  • dest[out] Data ptr to be filled

  • value[in] Constant value to be set

  • count[in] Number of values to be set

  • stream[in] - Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetD32(topsDeviceptr_t dest, int value, size_t count)

Fills the memory area pointed to by dest with the constant integer value for specified number of times.

Parameters
  • dest[out] Data being filled

  • value[in] Constant value to be set

  • count[in] Number of values to be set

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsMemsetAsync (void *dst, int value, size_t sizeBytes, topsStream_t stream __dparm(0))

Fills the first sizeBytes bytes of the memory area pointed to by dev with the constant byte value value.

topsMemsetAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.

Parameters
  • dst[out] Pointer to device memory

  • value[in] - Value to set for each byte of specified memory

  • sizeBytes[in] - Size in bytes to set

  • stream[in] - Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree

TOPS_PUBLIC_API topsError_t topsMemsetD32Async (topsDeviceptr_t dst, int value, size_t count, topsStream_t stream __dparm(0))

Fills the memory area pointed to by dev with the constant integer value for specified number of times.

topsMemsetD32Async() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.

Parameters
  • dst[out] Pointer to device memory

  • value[in] - Value to set for each byte of specified memory

  • count[in] - number of values to be set

  • stream[in] - Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree

TOPS_PUBLIC_API topsError_t topsMemGetInfo(size_t *free, size_t *total)

Query memory info.

Return snapshot of free memory, and total allocatable memory on the device.

Returns in *free a snapshot of the current free memory.

Warning

The free memory only accounts for memory allocated by this process and may be optimistic.

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemPtrGetInfo(void *ptr, size_t *size)

Query memory pointer info. Return size of the memory pointer.

Parameters
  • size[out] The size of memory pointer.

  • ptr[in] Pointer to memory for query.

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemGetAddressRange(topsDeviceptr_t *pbase, size_t *psize, topsDeviceptr_t dptr)

Get information on memory allocations.

Parameters
  • pbase[out] - Base pointer address

  • psize[out] - Size of allocation

  • dptr-[in] Device Pointer

Returns

topsSuccess, topsErrorInvalidDevicePointer

2.7. PeerToPeer

This section describes the PeerToPeer device memory access functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsDeviceCanAccessPeer(int *canAccess, int deviceId, int peerDeviceId)

Checks if peer/esl access between two devices is possible.

Parameters
  • deviceId[in] - device id

  • peerDeviceId[in] - peer device id

  • canAccess[out] - access between two devices, bit[7~0] : Each bit indicating corresponding port status: 1 link, 0 no-link. bit[15~8] : p2p link type: bit8 PCIe switch link, bit9 RCs link, bit10 Die to Die, all 0 means no-p2p-link. bit[23~16] :cluster as device type: 1 cluster as device.

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceEnablePeerAccess(int peerDeviceId, unsigned int flags)

Set peer/esl access property.

Parameters
  • peerDeviceId[in] - peer device id

  • flags[in] - access property

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceEnablePeerAccessRegion(int peerDeviceId, void *peerDevPtr, size_t size, void **devPtr)

Set peer access property for peer device’s region.

Parameters
  • peerDeviceId[in] - peer device id

  • peerDevPtr[in] - the start device ptr of the peer device’s region

  • size[in] - the size of the peer device’s region

  • devPtr[out] - the p2p mapped device address

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsDeviceDisablePeerAccessRegion(int peerDeviceId, void *peerDevPtr, size_t size)

destroy peer access property for peer device’s region.

Parameters
  • peerDeviceId[in] - peer device id

  • peerDevPtr[in] - the start device ptr of the peer device’s region

  • size[in] - the size of the peer device’s region

Returns

topsSuccess, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsMemcpyPeer(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes)

Copies memory from one device to memory on another device.

For topsMemcpyPeer, the copy is always performed by the current device (set by topsSetDevice). For multi-gcu or peer-to-peer configurations, it is recommended to set the current device to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument.

Parameters
  • dst[out] Data being copy to

  • dstDevice[in] Dst device id

  • src[in] Data being copy from

  • srcDevice[in] Src device id

  • sizeBytes[in] Data size in bytes

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, topsStream_t stream)

Copies memory from one device to memory on another device asynchronously.

For multi-gcu or peer-to-peer configurations, it is recommended to use a stream which is a attached to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument.

Parameters
  • dst[out] Data being copy to

  • dstDevice[in] Dst device id

  • src[in] Data being copy from

  • srcDevice[in] Src device id

  • sizeBytes[in] Data size in bytes

  • stream[in] Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemcpyPeerExt(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, topsTopologyMapType map, int port)

Copies memory from one device to memory on another device with special priority.

For topsMemcpyPeerExt, the copy is always performed by the current device (set by topsSetDevice). For multi-gcu or peer-to-peer configurations, it is recommended to set the current device to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument.

Parameters
  • dst[out] Data being copy to

  • dstDevice[in] Dst device id

  • src[in] Data being copy from

  • srcDevice[in] Src device id

  • sizeBytes[in] Data size in bytes

  • map[in] The link that is expected to pass

  • port[in] ESL port id

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

TOPS_PUBLIC_API topsError_t topsMemcpyPeerExtAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, topsTopologyMapType map, int port, topsStream_t stream)

Copies memory from one device to memory on another device asynchronously with special priority.

For topsMemcpyPeerExtAsync, the copy is always performed by the current device (set by topsSetDevice). For multi-gcu or peer-to-peer configurations, it is recommended to set the current device to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling topsDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument.

Parameters
  • dst[out] Data being copy to

  • dstDevice[in] Dst device id

  • src[in] Data being copy from

  • srcDevice[in] Src device id

  • sizeBytes[in] Data size in bytes

  • map[in] The link that is expected to pass

  • port[in] ESL port id

  • stream[in] Stream identifier

Returns

topsSuccess, topsErrorInvalidValue, #topsErrorMemoryFree, #topsErrorUnknown

2.8. Module

This section describes the module management functions of TOPS runtime API.

TOPS_PUBLIC_API topsError_t topsModuleLoad(topsModule_t *module, const char *fname)

Loads code object from file into a topsModule_t.

Parameters
  • fname[in]

  • module[out]

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext, topsErrorFileNotFound, topsErrorOutOfMemory, topsErrorSharedObjectInitFailed, topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsModuleUnload(topsModule_t module)

Frees the module.

Parameters

module[in]

Returns

topsSuccess, topsErrorInvalidValue module is freed and the code objects associated with it are destroyed

TOPS_PUBLIC_API topsError_t topsModuleGetFunction(topsFunction_t *function, topsModule_t module, const char *kname)

Function with kname will be extracted if present in module.

Parameters
  • module[in]

  • kname[in]

  • function[out]

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidContext, topsErrorNotInitialized, topsErrorNotFound,

TOPS_PUBLIC_API topsError_t topsFuncGetAttributes(struct topsFuncAttributes *attr, const void *func)

Find out attributes for a given function.

Parameters
  • attr[out]

  • func[in]

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidDeviceFunction

TOPS_PUBLIC_API topsError_t topsFuncGetAttribute(int *value, topsFunction_attribute attrib, topsFunction_t hfunc)

Find out a specific attribute for a given function.

Parameters
  • value[out]

  • attrib[in]

  • hfunc[in]

Returns

topsSuccess, topsErrorInvalidValue, topsErrorInvalidDeviceFunction

TOPS_PUBLIC_API topsError_t topsModuleLoadData(topsModule_t *module, const void *image)

builds module from code object which resides in host memory. Image is pointer to that location.

Parameters
  • image[in]

  • module[out]

Returns

topsSuccess, topsErrorNotInitialized, topsErrorOutOfMemory, topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsModuleLoadDataEx(topsModule_t *module, const void *image, unsigned int numOptions, topsJitOption *options, void **optionValues)

builds module from code object which resides in host memory. Image is pointer to that location. Options are not used. topsModuleLoadData is called.

Parameters
  • image[in]

  • module[out]

  • numOptions[in] Number of options

  • options[in] Options for JIT

  • optionValues[in] Option values for JIT

Returns

topsSuccess, topsErrorNotInitialized, topsErrorOutOfMemory, topsErrorNotInitialized

TOPS_PUBLIC_API topsError_t topsModuleLaunchKernel(topsFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, topsStream_t stream, void **kernelParams, void **extra)

launches kernel f with launch parameters and shared memory on stream with arguments passed to kernelparams or extra

Please note, TOPS does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32. So gridDim.x * blockDim.x, gridDim.y * blockDim.y and gridDim.z * blockDim.z are always less than 2^32.

Warning

kernellParams argument is not yet implemented in TOPS. Please use extra instead. Please refer to tops_porting_driver_api.md for sample usage.

Parameters
  • f[in] Kernel to launch.

  • gridDimX[in] X grid dimension specified as multiple of blockDimX.

  • gridDimY[in] Y grid dimension specified as multiple of blockDimY.

  • gridDimZ[in] Z grid dimension specified as multiple of blockDimZ.

  • blockDimX[in] X block dimensions specified in work-items

  • blockDimY[in] Y grid dimension specified in work-items

  • blockDimZ[in] Z grid dimension specified in work-items

  • sharedMemBytes[in] Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[in] Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

  • kernelParams[in]

  • extra[in] Pointer to kernel arguments. These are passed directly to the kernel and must be in the memory layout and alignment expected by the kernel.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsLaunchCooperativeKernel(const void *f, dim3 gridDim, dim3 blockDimX, void **kernelParams, size_t sharedMemBytes, topsStream_t stream)

launches kernel f with launch parameters and shared memory on stream with arguments passed to kernelparams or extra, where thread blocks can cooperate and synchronize as they execute

Please note, TOPS does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.

Parameters
  • f[in] Kernel to launch.

  • gridDim[in] Grid dimensions specified as multiple of blockDim.

  • blockDim[in] Block dimensions specified in work-items

  • kernelParams[in] A list of kernel arguments

  • sharedMemBytes[in] Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[in] Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue, topsErrorCooperativeLaunchTooLarge

2.9. Clang

This section describes the API to support the triple-chevron syntax.

TOPS_PUBLIC_API topsError_t topsConfigureCall (dim3 gridDim, dim3 blockDim, size_t sharedMem __dparm(0), topsStream_t stream __dparm(0))

Configure a kernel launch.

Please note, TOPS does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.

Parameters
  • gridDim[in] grid dimension specified as multiple of blockDim.

  • blockDim[in] block dimensions specified in work-items

  • sharedMem[in] Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[in] Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsSetupArgument(const void *arg, size_t size, size_t offset)

Set a kernel argument.

Parameters
  • arg[in] Pointer the argument in host memory.

  • size[in] Size of the argument.

  • offset[in] Offset of the argument on the argument stack.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsLaunchByPtr(const void *func)

Launch a kernel.

Parameters

func[in] Kernel to launch.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t __topsPushCallConfiguration (dim3 gridDim, dim3 blockDim, size_t sharedMem __dparm(0), topsStream_t stream __dparm(0))

Push configuration of a kernel launch.

Please note, TOPS does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.

Parameters
  • gridDim[in] grid dimension specified as multiple of blockDim.

  • blockDim[in] block dimensions specified in work-items

  • sharedMem[in] Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[in] Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t __topsPopCallConfiguration(dim3 *gridDim, dim3 *blockDim, size_t *sharedMem, topsStream_t *stream)

Pop configuration of a kernel launch.

Please note, TOPS does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.

Parameters
  • gridDim[out] grid dimension specified as multiple of blockDim.

  • blockDim[out] block dimensions specified in work-items

  • sharedMem[out] Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[out] Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

Returns

topsSuccess, topsInvalidDevice, topsErrorNotInitialized, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsLaunchKernel (const void *function_address, dim3 numBlocks, dim3 dimBlocks, void **args, size_t sharedMemBytes __dparm(0), topsStream_t stream __dparm(0))

C compliant kernel launch API.

Parameters
  • function_address[in] - kernel stub function pointer.

  • numBlocks[in] - number of blocks

  • dimBlocks[in] - dimension of a block

  • args[in] - kernel arguments

  • sharedMemBytes[in] - Amount of dynamic shared memory to allocate for this kernel. The TOPS-Clang compiler provides support for extern shared declarations.

  • stream[in] - Stream where the kernel should be dispatched. May be 0, in which case the default stream is used with associated synchronization rules.

Returns

topsSuccess, topsErrorInvalidValue, topsInvalidDevice

TOPS_PUBLIC_API topsError_t topsLaunchKernelExC(const topsLaunchConfig_t *config, const void *func, void **args)

C compliant kernel launch API.

Parameters
  • config[in] - Launch Configuration.

  • func[in] - Kernel to launch

  • args[in] - Array of pointers to kernel parameters

Returns

topsSuccess, topsErrorInvalidValue, topsInvalidDevice

2.10. Runtime

This section describes the runtime compilation functions of TOPS runtime API

enum topsrtcResult

Values:

enumerator TOPSRTC_SUCCESS
enumerator TOPSRTC_ERROR_OUT_OF_MEMORY
enumerator TOPSRTC_ERROR_PROGRAM_CREATION_FAILURE
enumerator TOPSRTC_ERROR_INVALID_INPUT
enumerator TOPSRTC_ERROR_INVALID_PROGRAM
enumerator TOPSRTC_ERROR_INVALID_OPTION
enumerator TOPSRTC_ERROR_COMPILATION
enumerator TOPSRTC_ERROR_BUILTIN_OPERATION_FAILURE
enumerator TOPSRTC_ERROR_NAME_EXPRESSION_NOT_VALID
enumerator TOPSRTC_ERROR_INTERNAL_ERROR
typedef enum topsrtcResult topsrtcResult
typedef struct _topsrtcProgram *topsrtcProgram
TOPS_PUBLIC_API const char *topsrtcGetErrorString(topsrtcResult result)

Returns a string message to describing the error which occurred.

See also

topsrtcResult

Warning

If the topsrtc result is defined, it will return “Invalid TOPSRTC error code”

Parameters

result[in] TOPSRTC API result code.

Returns

const char message string for the given topsrtcResult code.

TOPS_PUBLIC_API topsrtcResult topsrtcVersion(int *major, int *minor)

Sets the output parameters major and minor with the TOPSRTC version.

Parameters
  • major[out] TOPS Runtime Compilation major version number.

  • minor[out] TOPS Runtime Compilation minor version number.

TOPS_PUBLIC_API topsrtcResult topsrtcAddNameExpression(topsrtcProgram program, const char *name_expression)

Adds the given name exprssion to the runtime compilation program.

If const char pointer is NULL, it will return TOPSRTC_ERROR_INVALID_INPUT.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • name_expression[in] const char pointer to the name expression.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcCompileProgram(topsrtcProgram program, int num_options, const char **options)

Compiles the given runtime compilation program.

If the compiler failed to build the runtime compilation program, it will return TOPSRTC_ERROR_COMPILATION.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • num_options[in] number of compiler options.

  • options[in] compiler options as const array of strins.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcCreateProgram(topsrtcProgram *program, const char *source, const char *name, int num_headers, const char **headers, const char **include_names)

Creates an instance of topsrtcProgram with the given input parameters, and sets the output topsrtcProgram program with it.

Any invalide input parameter, it will return TOPSRTC_ERROR_INVALID_INPUT or TOPSRTC_ERROR_INVALID_PROGRAM.

If failed to create the program, it will return TOPSRTC_ERROR_PROGRAM_CREATION_FAILURE.

See also

topsrtcResult

Parameters
  • program[inout] runtime compilation program instance.

  • source[in] const char pointer to the program source.

  • name[in] const char pointer to the program name.

  • num_headers[in] number of headers.

  • headers[in] array of strings pointing to headers.

  • include_names[in] array of strings pointing to names included in program source.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcDestroyProgram(topsrtcProgram *program)

Destroys an instance of given topsrtcProgram.

If program is NULL, it will return TOPSRTC_ERROR_INVALID_INPUT.

See also

topsrtcResult

Parameters

program[in] runtime compilation program instance.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcGetLoweredName(topsrtcProgram program, const char *name_expression, const char **lowered_name)

Gets the lowered (mangled) name from an instance of topsrtcProgram with the given input parameters, and sets the output lowered_name with it.

If any invalide nullptr input parameters, it will return TOPSRTC_ERROR_INVALID_INPUT

If name_expression is not found, it will return TOPSRTC_ERROR_NAME_EXPRESSION_NOT_VALID

If failed to get lowered_name from the program, it will return TOPSRTC_ERROR_COMPILATION.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • name_expression[in] const char pointer to the name expression.

  • lowered_name[inout] const char array to the lowered (mangled) name.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcGetProgramLog(topsrtcProgram program, char *log)

Gets the log generated by the runtime compilation program instance.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • log[out] memory pointer to the generated log.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcGetProgramLogSize(topsrtcProgram program, size_t *log_size)

Gets the size of log generated by the runtime compilation program instance.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • log_size[out] size of generated log.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcGetCode(topsrtcProgram program, char *code)

Gets the pointer of compilation binary by the runtime compilation program instance.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • code[out] char pointer to binary.

Returns

TOPSRTC_SUCCESS

TOPS_PUBLIC_API topsrtcResult topsrtcGetCodeSize(topsrtcProgram program, size_t *codeSizeRet)

Gets the size of compilation binary by the runtime compilation program instance.

See also

topsrtcResult

Parameters
  • program[in] runtime compilation program instance.

  • code[out] the size of binary.

Returns

TOPSRTC_SUCCESS

2.11. Extension

This section describes the runtime Ext functions of TOPS runtime API

TOPS_PUBLIC_API topsError_t topsMemorySetDims(const void *devPtr, int64_t *dims, size_t dims_count)

Set dimension for device memory.

Limitation: devPtr only support allocated by topsMalloc*

Parameters
  • devPtr[in] The device memory to be set.

  • dims[in] The dimension to be set.

  • dims_count[in] The dimension size to be set.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMemoryGetDims(const void *devPtr, int64_t *dims, size_t *dims_count)

Get dimension of device memory.

Limitation: devPtr only support allocated by topsMalloc*

Parameters
  • devPtr[in] The device memory to be set.

  • dims[out] The dimension pointer list to be get.

  • dims_count[out] The dimension rank pointer list to be get.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsCreateExecutable(topsExecutable_t *exe, const void *bin, size_t size)

Create an executable with specified binary data and size.

Note: Ownership of the pointer to new executable is transferred to the caller. Caller need call topsDestroyExecutable to destroy this pointer when no longer use it.

Parameters
  • bin[in] The pointer to the binary data.

  • size[in] The size of the binary data.

  • exe[out] Pointer to get new executable.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsCreateExecutableFromFile(topsExecutable_t *exe, const char *filepath)

Create an executable with specified file.

Note: Ownership of the pointer to new executable is transferred to the caller. Caller need call topsDestroyExecutable to destroy this pointer when no longer use it.

Parameters
  • filepath[in] The name of binary file.

  • exe[out] Pointer to get new executable.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsDestroyExecutable(topsExecutable_t exe)

Destroy and clean up an executable.

Parameters

exe[in] Pointer to executable to destroy.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsCreateResource(topsResource_t *res, topsResourceRequest_t req)

Create a resource bundle with specified request. If device is in reset status, this method will retry until reset finish.

Note: Ownership of the pointer to new resource bundle is transferred to the caller. Caller need call topsDestroyResource to destroy this pointer when no longer use it.

Parameters
  • res[out] Pointer to get new resource bundle or nullptr if failed.

  • req[in] Requested resource of allocated resource bundle.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsCreateResourceForExecutable(topsResource_t *res, topsExecutable_t exe)

Create a new resource bundle with specified target resource. If device is in reset status, this method will retry until reset finish.

Note: Ownership of the pointer to new resource bundle is transferred to the caller. Caller need call topsDestroyResource to destroy this pointer when no longer use it.

Parameters
  • res[out] Pointer to get new resource bundle or nullptr if failed.

  • exe[in] Pointer to the executable to get target pointer which contains the requested resource.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsDestroyResource(topsResource_t res)

Destroy and clean up a resource bundle.

Parameters

res[in] Pointer to resource bundle to destroy.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsResourceBundleGetAttribute(topsResource_t res, topsResourceBundleInfoType_t type, uint64_t *data)

Query resource bundle attribute.

Parameters
  • res[in] Pointer to resource bundle to query.

  • type[in] Type to query.

  • data[out] Pointer to query output data.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMallocForResource(void **ptr, size_t size, topsResource_t res)

Allocate memory on the res_bundle affinity memory bank.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

Use topsFree to release ptr

Parameters
  • ptr[out] Pointer to the allocated memory

  • size[in] Requested memory size

  • res[in] res_bundle

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsLaunchExecutableV2(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Note: default use runExe with Operator mode (not support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableV3(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if users set output_dims/output_rank as no-zero, topsLaunchExecutableV3 will insert a blocking mode hostcallback operation to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableV4(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, void **stack_datas, size_t stacks_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if users set output_dims/output_rank as no-zero, topsLaunchExecutableV3 will insert a blocking mode hostcallback operation to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • stack_datas[in] stack_datas of executable.

  • stacks_count[in] stacks_datas count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutable(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if user set output_dims/output_rank no-zero, topsLaunchExecutable may synchronize to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableWithConstData(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, void **const_datas, size_t const_datas_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if user set output_dims/output_rank no-zero, topsLaunchExecutableWithConstData may synchronize to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • const_datas[in] Const_datas of executable.

  • const_datas_count[in] Const_datas count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableWithConstDataV2(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, void **const_datas, size_t const_datas_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Note: default use runExe with Operator mode (not support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • const_datas[in] Const_datas of executable.

  • const_datas_count[in] Const_datas count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableWithConstDataV3(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, void **const_datas, size_t const_datas_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if users set output_dims/output_rank as no-zero, topsLaunchExecutableWithConstDataV3 will insert a blocking mode hostcallback operation to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • const_datas[in] Const_datas of executable.

  • const_datas_count[in] Const_datas count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsLaunchExecutableWithConstDataV4(topsExecutable_t exe, topsResource_t res, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, int64_t *output_dims, size_t *output_rank, void **const_datas, size_t const_datas_count, void **stack_datas, size_t stacks_count, topsStream_t stream)

Asynchronously run a executable.

Run an executable with given inputs and outputs for training.

Limitation: for performance, output_dims/output_rank should be set to nullptr. if users set output_dims/output_rank as no-zero, topsLaunchExecutableWithConstDataV3 will insert a blocking mode hostcallback operation to get output_dims and output_rank

Note: default use runExe with Graph mode(support dynamic shape), and if res is nullptr, use default res_bundle

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • inputs[in] Inputs of executable.

  • input_count[in] Inputs count of executable.

  • input_dims[in] Inputs dims List of executable.

  • input_rank[in] Inputs dims rank of executable.

  • outputs[in] Outputs of executable.

  • output_count[in] Outputs count of executable.

  • output_dims[out] Outputs dims List of executable.

  • output_rank[out] Outputs dims rank of executable.

  • const_datas[in] Const_datas of executable.

  • const_datas_count[in] Const_datas count of executable.

  • stack_datas[in] stack_datas of executable.

  • stacks_count[in] stacks_datas count of executable.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetConstManagedData(topsExecutable_t exe, unsigned int *numOptions, char **name, void **address, uint64_t *size)

Get Const Managed Data.

Note: call twice, first get numOptions, Second get the rest only support hp off

Parameters
  • numOptions[out] Pointer to ConstManagedData array size.

  • exe[in] Pointer to executable object.

  • name[out] ConstManaged pair of name.

  • address[out] ConstManaged pair of address.

  • size[out] ConstManaged address pair of size.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetConstManagedDataV2(topsExecutable_t exe, unsigned int *numOptions, char **name, void **address, uint64_t *size, int64_t *uid, void **flag)

Get Const Managed Data.

Note: call twice, first get numOptions, Second get the rest support both hp off and hp on

Parameters
  • numOptions[out] Pointer to ConstManagedData array size.

  • exe[in] Pointer to executable object.

  • name[out] ConstManaged pair of name.

  • address[out] ConstManaged pair of address.

  • size[out] ConstManaged address pair of size.

  • uid[out] Constant uid

  • flag[out] Flag used by user to indicate constant partial h2d

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableUpdateConstantKey(topsExecutable_t exe)

Update Constant Section Key.

Note: must called before save executable to file

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableUpdateRuntimeResource(topsExecutable_t exe, topsResource_t res, topsStream_t stream)

Update Executable Runtime Resource.

Note: call after refit, will update constant partially by h2d

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableLoadConstData(topsExecutable_t exe, topsResource_t res, void **dev_ptr, size_t *dev_ptr_count)

Load constant data.

NOTE: is deprecated because of twice call

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • dev_ptr[out] Dev_mem of executable.

  • dev_ptr_count[out] Dev_mem count of executable.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableLoadConstDataV2(topsExecutable_t exe, topsResource_t res, void **dev_ptr)

Load constant data.

NOTE: call topsExecutableQueryInfo to get device mem count to initialize dev_ptr

Parameters
  • exe[in] Pointer to executable object.

  • res[in] Pointer to res_bundle object.

  • dev_ptr[out] Dev_mem of executable.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetRuntimeOutputShape(topsExecutable_t exe, topsShape_t *inputs_shape, topsShape_t *outputs_shape, bool *infer_success)

Get runtime dynamic output shape.

Note: flag indicate shape infer result, true for success and false for failed. for shape infer failed and legacy executable, return static output shape

Parameters
  • exe[in] Pointer to executable object.

  • inputs_shape[in] Pointer to input shape.

  • outputs_shape[out] Pointer to output shape.

  • infer_success[out] flag to indicate shape infer result.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetSubFuncInfo(topsExecutable_t exe, char **info_raw_data, size_t *info_size, int **param_info, int *param_count)

get excutable sub function information

Note: use only in refit stage 2, user should parse the Info_RawData to get detailed sub function information example: param_info[2, 1, 3, 2, 3, 1] means subfuncA need 2 inputs, 1 output; subfuncB need 3 inputs, 2 outputs; subfuncC need 3 inputs, 1 output

Parameters
  • exe[in] Pointer to executable object

  • Info_RawData[out] raw data of sub function information

  • Info_size[out] size of Info_RawData

  • param_info[out] count list of input and output for each sub function

  • param_count[out] total count of input and output

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetRefitFlag(topsExecutable_t exe, int *refit_flag)

get excutable refit flag

Parameters
  • exe[in] Pointer to executable object.

  • refit_flag[out] the flag indicate if this executable is refitable

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableCallSubFunc(topsExecutable_t exe, const char *func_name, void **inputs, size_t input_count, int64_t *input_dims, size_t *input_rank, void **outputs, size_t output_count, topsStream_t stream)

call sub function in executable

Limitation: inputs and outputs only support alloc by topsMalloc/topsHostMalloc

Note: use only in refit stage 2 for constant preprocessing

Parameters
  • exe[in] Pointer to executable object.

  • func_name[in] sub function name

  • inputs[in] Inputs of sub function.

  • input_count[in] Inputs count of sub function.

  • input_dims[in] Inputs dims List of sub function.

  • input_rank[in] Inputs dims rank of sub function.

  • outputs[in] Outputs of sub function.

  • output_count[in] Outputs count of sub function.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsDeviceSetResource(topsResource_t res)

Set default resource_bundle.

Limitation: set current thread global resource_bundle, set resource_bundle before all tops APIs

Parameters

res[in] Pointer to res_bundle object.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsConstBufferGet(uint64_t hash_key, void *init_data, size_t size, void **ptr, topsStream_t stream)

Get const buffer.

Get or init const buffer with special hash_key

Parameters
  • hash_key[in] value to const buffer special key.

  • init_data[in] Pointer to const raw buffer.

  • size[in] Size to const buffer size.

  • ptr[out] Pointer to get const buffer.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsConstBufferPut(uint64_t hash_key, topsStream_t stream)

Put const buffer.

Parameters
  • hash_key[in] value to const buffer special key.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableQueryInfo(topsExecutable_t exe, topsExecutableInfoType_t info_type, uint64_t *data)

Query executable info.

Limitation: user need to allocate memory for data

Parameters
  • exe[in] Pointer to executable object.

  • info_type[in] Type to query.

  • data[out] Pointer to query output data.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableQueryInfoV2(topsExecutable_t exe, topsExecutableInfoType_t info_type, int64_t *data)

Query executable info.

Limitation: user need to allocate memory for data

NOTE: bool infomation also return by data, 1:true, 0:false

Parameters
  • exe[in] Pointer to executable object.

  • info_type[in] Type to query.

  • data[out] Pointer to query output data.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableQueryInfoV3(topsExecutable_t exe, topsExecutableInfoType_t info_type, char **data)

Query executable string info.

Limitation: user need to allocate memory for data

Parameters
  • exe[in] Pointer to executable object.

  • info_type[in] Type to query.

  • data[out] Pointer to query output string info.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableQueryInputName(topsExecutable_t exe, int index, char **name)

Query executable input name.

Parameters
  • exe[in] Pointer to executable object.

  • index[in] Specify which input to query.

  • name[out] The name of the input.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableQueryOutputName(topsExecutable_t exe, int index, char **name)

Query executable output name.

Parameters
  • exe[in] Pointer to executable object.

  • index[in] Specify which output to query.

  • name[out] The name of the output.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableSaveToFile(topsExecutable_t exe, const char *path)

Save executable to a specified file.

Parameters
  • exe[in] Pointer to executable object.

  • path[in] The name of the file to be used to save executable.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExtMallocWithBank(void **ptr, size_t size, uint64_t bank)

Allocate memory on the specify memory bank.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

NOTE: default bank is 0

Use topsFree to release ptr

Parameters
  • ptr[out] Pointer to the allocated memory

  • size[in] Requested memory size

  • bank[in] memory bank

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsExtMallocWithBankV2(void **ptr, size_t size, uint64_t bank, unsigned int flags)

Allocate memory on the specify memory bank.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

NOTE: default bank is 0

Use topsFree to release ptr

Parameters
  • ptr[out] Pointer to the allocated memory

  • size[in] Requested memory size

  • bank[in] memory bank

  • flags[in] Type of memory allocation. flags only support topsDeviceMallocDefault/topsMallocTopDown/topsMallocForbidMergeMove

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsExtMallocWithAffinity(void **ptr, size_t size, uint64_t bank, unsigned int flags)

Allocate memory on the logical memory bank.

If size is 0, no memory is allocated, *ptr returns non-nullptr, and topsSuccess is returned.

NOTE: default bank is 0, the logical bank will be mapped to physical bank

Use topsFree to release ptr

Parameters
  • ptr[out] Pointer to the allocated memory

  • size[in] Requested memory size

  • bank[in] memory bank

  • flags[in] Type of memory allocation. flags only support topsDeviceMallocDefault/topsMallocTopDown/topsMallocForbidMergeMove

Returns

topsSuccess, #topsErrorOutOfMemory, topsErrorInvalidValue (bad context, null *ptr)

TOPS_PUBLIC_API topsError_t topsExtLaunchCooperativeKernelMultiCluster(const topsExtLaunchParams_t *launchParamsList, int numClusters, dim3 gridDim, dim3 blockDim, topsStream_t stream)

Asynchronously launch kernel.

Parameters
  • launchParamsList[in] Pointer to launch params List.

  • numClusters[in] Size to use cluster count.

  • gridDim[in] Grid Dimension to use.

  • blockDim[in] Block Dimension to use.

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExtSetProfileMeta(uint8_t *meta, uint32_t size, int64_t compilation_id)

Set profile meta data.

Parameters
  • Pointer[in] to profile meta data.

  • profile[in] meta data size in bytes.

  • compilation[in] id.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExtSetProfileMetas(uint32_t meta_cnt, uint8_t *metas[], uint32_t size[], int64_t compilation_id[])

Set profile meta data with array.

Parameters
  • meta[in] data count.

  • Pointer[in] to profile meta data array.

  • profile[in] meta data size array in bytes.

  • compilation[in] id array.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsScatterMemoryGetInfo(const void *dev_ptr, int index, int64_t *win_pos, size_t *win_size, int64_t *map_ctrl, size_t *map_size)

get scatter memory config: win_pos and map_ctrl.

This interface only works when dev_ptr is created with topsMallocForScatter and init with topsScatterSetSubMem.

Limitation: user need to allocate memory win_pos/map_ctrl array

Parameters
  • dev_ptr[in] Pointer to scatter memory.

  • index[in] Index to sub mem.

  • win_pos[in] Pointer to sub memory window position.

  • win_size[in] Pointer to sub memory window position size.

  • map_ctrl[in] Pointer to sub memory map ctrl.

  • map_size[in] Pointer to sub memory map ctrl size.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsScatterMemoryGetSubNum(const void *dev_ptr, size_t *size)

Get the sub memory number in a scatter memory.

This interface only works when dev_ptr is created with topsMallocScatter and init with topsScatterSetSubMem.

Parameters
  • dev_ptr[in] Pointer to scatter memory.

  • Size[in] Pointer to sub memory number size.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMallocScatter(void **ptr, size_t size)

Creates scatter memory on local device.

Creates device memory object for holding a list of scattered DeviceMemory objects. Memory is not actually allocated. User can call topsScatterPopulateSub to allocate sub device memory and topsScatterGetSubMem to query sub memory objects. Scatter DeviceMemory handles will be automatically processed by runtime API such like topsMemcpy, it can be viewed as a plain memory buffer.

Use topsFree to release ptr

A scatter DeviceMemory is invalid until it’s fully constructed with correctly splitted sub DeviceMemory objects both in size and dimension (all related DeviceMemory objects have invoked SetDims).

Parameters
  • ptr[in] Pointer to the allocated memory

  • size[in] Requested creation size in bytes.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsScatterPopulateSub(const void *dev_ptr, size_t bpe, uint64_t bank, int64_t *dims, int64_t *win_pos, int64_t *map_ctrl, size_t rank_size)

Populate scatter memory with sub-memory.

This interface only works when dev_ptr is created with topsMallocScatter API. Each invocation populates one sub-memory. The parent scatter memory owns this sub-memory, that all populated sub-memory are freed if parent is free. topsScatterGetSubMem WON’T retain the sub-memory.

Parameters
  • dev_ptr[in] Pointer to scatter memory.

  • bpe[in] memory bpe of single sub-memory.

  • bank[in] memory bank of single sub-memory.

  • dims[in] The dimension to be set.

  • win_pos[in] The anchor of window in the scatter memory holding this submemory.

  • map_ctrl[in] The dimension remap of reshaping. It’s a natural number up to rank.

  • rank_size[in] The size of map ctrl/dims/win_position, the max size is 8.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsScatterInplace(const void *dev_ptr, size_t sub_count, size_t bpe, uint64_t *bank_list, int64_t **dims_list, int64_t **win_pos_list, int64_t **map_ctrl_list, size_t rank_size)

Inplace scatter memory with sub-memory description.

This interface only works when dev_ptr is created with topsMallocScatter API. The parent scatter memory owns this sub-memory, that all populated sub-memory are freed if parent is free.

topsScatterGetSubMem WON’T retain the sub-memory.

Parameters
  • dev_ptr[in] Pointer to scatter memory, if dev_ptr has sub_memory, sub memory must be allocated by topsScatterPopulateSub

  • sub_count[in] Number of sub memory

  • bpe[in] memory bpe of single sub-memory.

  • bank_list[in] memory bank of sub-memory list.

  • dims_list[in] The dimension list to be set.

  • win_pos_list[in] The anchor of window list in the scatter memory holding this submemory.

  • map_ctrl_list[in] The dimension remap of reshaping list. It’s a natural number up to rank.

  • rank_size[in] The size of map ctrl/dims/win_position, the max size is 8.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsScatterGetSubMem(const void *dev_ptr, int index, void **sub_dptr)

Get a submemory from scatter memory .

This interface only works when DeviceMemory is created with topsMallocScatter API. In fact, user can hold the sub DeviceMemory instead of getting it from scatter.

Parameters
  • dev_ptr[in] Pointer to scatter memory.

  • index[in] The index returned by ScatterSetSubMemory.

  • sub_dptr[in] The submemory object that user set by topsScatterSetSubMem.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMemoryReduceAsync(void *lhs, void *rhs, void *result, enum topsReduceOpType op, enum topsReduceDataType dtype, uint32_t element_cnt, topsStream_t stream)

Asynchronously request reduce service.

Request reduce service on Device. The lhs, rhs and result should be on the same device

Parameters
  • lhs[in] Left Hand Side device memory address object.

  • rhs[in] Right Hand Side device memory object.

  • result[in] Result device memory object.

  • op[in] Reduce op type.

  • dtype[in] Reduce data type.

  • element_cnt[in] Count of data to do the calculation

  • stream[in] stream identifier.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMemCachePrefetch(topsDeviceptr_t dptr, size_t size)

Prefetch the contents of a device memory to L2 buffer. The API is available for GCU 3.0 product only, no effect for other products. A device memory range is defined through device memory pointer and size. A device memory range can be equal to original device memory or a sub range of original device memory. Upper-level software needs to guarantee sub memories have no overlap. Before prefetch, need to make sure the contents on device memory is ready.

Parameters
  • dptr[in] Global device pointer

  • size[in] Global size in bytes

Returns

topsSuccess, topsErrorInvalidValue, topsErrorMemoryAllocation

TOPS_PUBLIC_API topsError_t topsMemCacheInvalidate(topsDeviceptr_t dptr, size_t size)

Inavlidate L2 buffer cache data. The API is available for GCU 3.0 product only, no effect for other products. The API deletes L2 buffer cache data which it is prefetched through API topsMemCachePrefetch(). The original device memory keeps no change. A device memory range is defined through device memory pointer and size.

Parameters
  • dptr[out] Global device pointer

  • size[out] Global size in bytes

Returns

topsSuccess, topsErrorInvalidValue, topsErrorMemoryAllocation

TOPS_PUBLIC_API topsError_t topsMemCacheInvalidateAll(topsStream_t stream)

Invalidate all caches of device memory.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMemCacheFlushAll(topsStream_t stream)

Flush all caches of device memory.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExtGetMcAvailableMemSize(uint64_t mc, uint64_t *size)

Get available memory size for mc

Parameters
  • mc[in] mc index.

  • size[out] Pointer to available size for this mc.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetSectionCount(topsExecutable_t exe, topsExtExecutableSectionHeaderType_t sh_type, uint64_t *count)

Get section count of the executable

Note: call with sh_type = topsExtExecutableSectionHeaderType::SHT_LAST_TYPE will return total section counts of executable

Parameters
  • exe[in] Pointer to executable object.

  • sh_type[in] Section header type.

  • count[out] Pointer to count of the section.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetStackSize(topsExecutable_t exe, uint64_t *stack_count, uint64_t *stack_size)

Get stack size of the executable

Parameters
  • exe[in] Pointer to executable object.

  • stack[out] count pointer to count stack size

  • array[out] to return stack size for every bank.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsExecutableGetSectionInfo(topsExecutable_t exe, topsExtExecutableSectionHeaderType_t sh_type, int mc, topsExtExecutableSectionInfo_t *info)

Get specific section information of the executable

Parameters
  • exe[in] Pointer to executable object.

  • sh_type[in] Section header type.

  • mc[in] mc index.

  • info[out] section information.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsGetMemUsageInfo(size_t *current_used_bytes, size_t *peek_used_bytes, int bank_total)

Get memory usage information.

Parameters
  • current_used_bytes[out] Current size of used memory in bytes.

  • peak_used_bytes[out] Peak size of used memory in bytes.

  • bank_total[in] The total number of memory bank.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsGetAffinityBankList(int *affinity_bank_list, int bank_total)

Get memory bank list of affinity

Parameters
  • affinity_bank_list[out] Affinity bank list of memory.

  • bank_total[in] The total number of memory bank.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsGetAffinityBankListV2(topsResource_t res, int *affinity_bank_list, int bank_total)

Get memory bank list of affinity

Parameters
  • res[in] Pointer to res_bundle object.

  • affinity_bank_list[out] Affinity bank list of memory.

  • bank_total[in] The total number of memory bank.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsMemGetInfoExt(size_t *used_list, size_t *free_list, size_t *max_available_list, size_t *total_list, int bank_total)

Query memory usage info list.

Return list of free, used, max available, and total memory on the memory bank.

Parameters
  • used_list[out] Pointer to used memory list.

  • free_list[out] Pointer to free memory list.

  • max_available_list[out] Pointer to max_available memory list.

  • total_list[out] Pointer to total memory list.

  • bank_total[in] The total number of memory bank.

Returns

topsSuccess, topsErrorInvalidDevice, topsErrorInvalidValue

TOPS_PUBLIC_API topsError_t topsKernelSignalMaxNumGet(int *num)

get the maximum kernel signal number on current device.

Parameters

num[out] a pointer to the variable alloced by caller.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsKernelSignalAvailableNumGet(int *num)

get the available kernel signal number on current device.

Parameters

num[out] a pointer to the variable alloced by caller.

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsKernelSignalAlloc(int *handle)

get a kernel signal handle alloced by kmd on current device.

Parameters

handle[out] put output of handle on here

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsKernelSignalFree(int handle)

free a kernel signal handle alloced by kmd on current device.

Parameters

handle[in] the kernel signal handle

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsKernelSignalRead(int handle, uint32_t *value)

get a kernel signal value indicated by handle.

Parameters
  • handle[in] the kernel signal handle

  • value[out] put output of value on here

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsKernelSignalWrite(int handle, uint32_t value)

set a kernel signal value indicated by handle.

Parameters
  • handle[in] the kernel signal handle

  • value[in] the value of kernel signal

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsRoceCreateQueue(uint32_t port, topsDeviceptr_t sq_base_ptr, size_t sq_size, uint8_t sq_user, uint32_t *q_id)

create a roce sq.

Parameters
  • port[in] the ROCE port id

  • sq_base_ptr[in] the base address of the sq’s ring buffer

  • sq_size[in] the size of the sq’s ring buffer

  • sq_user[in] the user of the sq

  • q_id[out] the allocated sq id

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsRoceQueryQueue(uint32_t port, uint32_t q_id, uint64_t *mac, uint32_t *ip)

query the specified sq’s info.

Parameters
  • port[in] the ROCE port id

  • q_id[in] the sq id

  • mac[out] mac address binded with the sq

  • id[out] ip address binded with the sq

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsRoceBindQueuePair(uint32_t port, uint32_t q_id, uint32_t remote_q_id, uint64_t remote_mac, uint32_t remote_ip)

Bind a queue pair.

Parameters
  • port[in] the ROCE port id

  • q_id[in] the local sq id

  • remote_q_id[in] the remote sq id

  • remote_mac[in] the remote mac address binded with the remote sq

  • remote_ip[in] the remote ip address binded with the remote sq

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsRoceDeleteQueue(uint32_t port, uint32_t q_id)

Delete a sq.

Parameters
  • port[in] the ROCE port id

  • q_id[in] the local sq id

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsRoceWriteQueue(uint32_t port, uint32_t q_id, void *dst, void *src, size_t size)

emit a write request to sq for master mode.

Parameters
  • port[in] the ROCE port id

  • q_id[in] the local sq id

  • dst[in] destination device ptr

  • src[in] source device ptr

  • size[in] the write request size

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsGetGlobalRandomSeed(uint64_t *seed)

get global random seed.

Parameters

seed[out] global random seed

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsSetGlobalRandomSeed(uint64_t seed)

set global random seed.

Parameters

seed[in] global random seed

Returns

topsSuccess on success, or other on failure.

TOPS_PUBLIC_API topsError_t topsSetDeviceAndResourceReservation(int deviceId, topsResourceReservation_t *rsvd_res)

Set default device to be used for subsequent tops API calls from this thread.

Sets device as the default device for the calling host thread. Valid device id’s are 0… (topsGetDeviceCount()-1).

Many TOPS APIs implicitly use the “default device” :

  • Any device memory subsequently allocated from this host thread (using topsMalloc) will be allocated on device.

  • Any streams or events created from this host thread will be associated with device.

  • Any kernels launched from this host thread (using topsLaunchKernel) will be executed on device (unless a specific stream is specified, in which case the device associated with that stream will be used).

This function may be called from any host thread. Multiple host threads may use the same device. This function does no synchronization with the previous or new device, and has very little runtime overhead. Applications can use topsSetDevice to quickly switch the default device before making a TOPS runtime call which uses the default device.

The default device is stored in thread-local-storage for each thread. Thread-pool implementations may inherit the default device of the previous thread. A good practice is to always call topsSetDevice at the start of TOPS coding sequency to establish a known standard device.

Parameters
  • deviceId[in] Valid device in range 0…(topsGetDeviceCount()-1).

  • resRequest[in] Create a device by applying for a specified number of threads.

Returns

topsSuccess, topsErrorInvalidDevice, #topsErrorDeviceAlreadyInUse