2. API Function¶
2.1. Builtins¶
This section describes builtins of TopsCC API.
- __TOPS_BUILTIN_VAR __tops_builtin_subThreadIdx_t subThreadIdx
- __TOPS_BUILTIN_VAR __tops_builtin_threadDim_t threadDim
- __TOPS_BUILTIN_VAR __tops_builtin_threadIdx_t threadIdx
- __TOPS_BUILTIN_VAR __tops_builtin_blockIdx_t blockIdx
- __TOPS_BUILTIN_VAR __tops_builtin_blockDim_t blockDim
- __TOPS_BUILTIN_VAR __tops_builtin_gridDim_t gridDim
-
__device__ __forceinline__ tops_implicit_params_t *__tops_get_implicit_params_ptr()¶
Get the pointer to the
implicit parameter
.- Returns
A pointer to
tops_implicit_params_t
.
-
__device__ __forceinline__ tops_amos_sip_interface_t *__tops_get_amos_sip_params_ptr()¶
Get the pointer to the
amos_sip_interface
.- Returns
A pointer to
amos_sip_interface
.
-
__device__ __forceinline__ void *__tops_private_heap_pointer()¶
Get the pointer to the private heap.
- Returns
A pointer of
void*
to the private heap.
Get the pointer to the block shared memory.
- Returns
A pointer of
void*
to the block shared memory.
-
__device__ __forceinline__ void tops_halt()¶
Put SIP run state to be in HALT state.
- __dtu_movs_md31 () &0xFF000000) >> 24)) struct __tops_builtin_subThreadIdx_t
-
__constant__¶
The memory space specifier
__constant__
is used to declare a variable that is stored in the constant memory space, making it accessible to all threads within the grid.
-
__private__¶
The memory space specifier
__private__
is used to declare a variable that is stored in the private memory space of a thread and can only be accessed by the thread that owns it.
The memory space specifier
__shared__
is used to declare a variable that is stored in the the shared memory space of a block and can only be accessed by the threads within the block.
-
__local__¶
The memory space specifier
__local__
declares a variable that is allocated in the memory space of a single thread.
-
__device__¶
As a memory space specifier, __device__ declares a variable that is stored in global memory space and can be accessed by all the threads within the grid.
As an execution space specifier, __device__ declares specifies a function that can be executed on a device and is callable exclusively from device.
-
__global__¶
The execution space specifier
__global__
declares a function as a kernel, indicating that it will be executed on the device and can be called from the host. This function can be thought of as the entry point for the kernel, serving as the starting point for the computation on the device.
-
__cooperative__¶
The execution space specifier
__cooperative__
is used to declare a__global__
function to be executed in the cooperative mode. In cooperative mode, all blocks within the grid need to be executed simultaneously.
-
__thread_dims__(...)¶
Specify the thread dimensions.
-
__host__¶
The execution space specifier
__host__
declares a function that is executed on the host and can only be called from the host.
-
__sp__¶
The execution space specifier
__sp__
declares a function to be compiled by sp(riscv) backend. If not compiled with--tops-sp
flag,__sp__
just means__host__
.
__shared_dte__
is one of the DTE context types and is used to declare a block-level DTE context that is shared by the threads within the block.
-
__private_dte__¶
__private_dte__
is one of the DTE context types and is used to declare a block-level DTE context that is owned by the thread privately.
-
__scalar_only__¶
__scalar_only__
declares a function to be executed in a single scalar thread which can not execute vector machine instructions.
-
__mmu_pointer__¶
-
__noinline__¶
The function qualifier
__noinline__
can be used as a hint for the compiler not to inline the function if possible.
-
__forceinline__¶
The function qualifier
__forceinline__
can be used to force the compiler to inline the function.
-
__valigned__¶
The
__valigned__
indicates that the memory is aligned by 128 Bytes.
-
__block_tile__¶
The
__block_tile__
indicates that the memory auto tiled.
-
__KRT_ARCH__¶
The
__KRT_ARCH__
macro can be used to differentiate various code paths based on__GCU_ARCH__
.
-
AS5_TASK_NODE_T_PTR¶
-
AS5_DTE_CTX_T_PTR¶
-
threadIdx¶
The thread index is represented by three dimensions (x, y, and z) within the block, as denoted by
threadIdx.x
,threadIdx.y
, andthreadIdx.z
.
-
blockIdx¶
The block index is represented by three dimensions (x, y, and z) within the grid, as denoted by
blockIdx.x
,blockIdx.y
, andblockIdx.z
.
-
blockDim¶
The block dimensions determine the number of threads within a block, and are specified by three dimensions (x, y, and z), as indicated by
blockDim.x
,blockDim.y
, andblockDim.z
.
-
gridDim¶
The grid dimensions determine the number of blocks within a grid, and are specified by three dimensions (x, y, and z), as indicated by
gridDim.x
,gridDim.y
, andgridDim.z
.
-
struct __tops_builtin_threadDim_t¶
2.2. Event¶
This section describes the struct of event. An event consists of the status which can be waited and it is usually generated from a dte ctx trigger.
-
__device__ __forceinline__ void wait(event &e)¶
Wait for event to end. This function takes an lvalue reference.
- Parameters
e – a lvalue reference of tops::event
-
__device__ __forceinline__ void wait(event &&e)¶
Wait for event to end. This function takes an rvalue reference.
- Parameters
e – a rvalue reference of tops::event
-
struct tops::event¶
- #include <__tops_event.h>
An
event
consists of the status which can be waited and it is usually generated from the trigger of a DTE context.Public Functions
-
__device__ __forceinline__ void wait()¶
Blocks current thread and waits for the SIP mailbox bound to it.
- Parameters
None –
-
__device__ __forceinline__ void wait()¶
2.3. DTE¶
This section describes the data structures and functions related to DTE.
-
enum AddrSpace¶
The supported address space for
mdspan
.Values:
-
enumerator Global¶
-
enumerator Private¶
-
enumerator Local¶
-
enumerator Global¶
-
enum tops_directions¶
Data transmission direction on device.
Values:
-
enumerator TOPS_DTE_H2H¶
On CDTE,
TOPS_DTE_H2H
meansShared
toShared
. On SDTE, it meansPrivate
toPrivate
.
-
enumerator TOPS_DTE_H2L¶
On CDTE,
TOPS_DTE_H2L
meansShared
toGlobal
. On SDTE, it meansPrivate
toShared/Global
.
-
enumerator TOPS_DTE_L2H¶
On CDTE,
TOPS_DTE_L2H
meansGlobal
toShared
. On SDTE, it meansShared/Global
toPrivate
-
enumerator TOPS_DTE_L2L¶
On CDTE,
TOPS_DTE_L2L
meansGlobal
toGlobal
. On SDTE, it meansShared/Global
toShared/Global
.
-
enumerator TOPS_DTE_H2H¶
-
inline __device__ __forceinline__ dte_scope(tops_dte_ctx_t &dte)¶
To initialize a DTE context.
- Parameters
dte – a DTE context
-
inline __device__ __forceinline__ ~dte_scope()¶
To destroy a DTE context.
-
__device__ __forceinline__ void memcpy(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
-
__device__ __forceinline__ void memcpy(tops_dte_ctx_t &ctx, const mdspan_base &dst, int dst_offset, const mdspan_base &src, int src_offset, int size)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
dst_offset – The offset of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offset – The offset of the source buffer.
size – The byte size of the transferred data.
-
__device__ __forceinline__ void memcpy(tops_dte_ctx_t &ctx, void *dst, void *src, int size, int direction)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The address of the destination buffer.
src – The address of the source buffer.
size – The byte size of the transferred data.
direction – The direction of the linear copy operation. 3: High level to high level(H2H); 2: High level to low level(H2L); 1: Low level to high
-
__device__ __forceinline__ tops::event memcpy_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
- Returns
-
__device__ __forceinline__ tops::event memcpy_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, int dst_offset, const mdspan_base &src, int src_offset, int size)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
dst_offset – The offset of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offset – The offset of the source buffer.
size – The byte size of the transferred data.
- Returns
-
__device__ __forceinline__ tops::event memcpy_async(tops_dte_ctx_t &ctx, void *dst, void *src, int size, int direction)¶
Configures a DTE context to a linear copy operation context and triggers linear copy operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The address of the destination buffer.
src – The address of the source buffer.
size – The byte size of the transferred data.
direction – The direction of the linear copy operation. 3: High level to high level(H2H); 2: High level to low level(H2L); 1: Low level to high
- Returns
-
template<typename T = int>
__device__ __forceinline__ void slice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets, const T value = 0)¶ Configures a DTE context to a slice operation context and triggers slice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – An int array. The source offsets of slice, could be both postive and negative.
-
template<typename T = int>
__device__ __forceinline__ void slice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list, const T value = 0)¶ Configures a DTE context to a slice operation context and triggers slice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – An int array. The source offsets of slice, could be both postive and negative.
-
template<typename T = int>
__device__ __forceinline__ tops::event slice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets, const T value = 0)¶ Configures a DTE context to a slice operation context and triggers slice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – An int array. The source offsets of slice, could be both postive and negative.
- Returns
-
template<typename T = int>
__device__ __forceinline__ tops::event slice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list, const T value = 0)¶ Configures a DTE context to a slice operation context and triggers slice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – An int array. The source offsets of slice, could be both postive and negative.
- Returns
-
__device__ __forceinline__ void deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets)¶
Configures a DTE context to a deslice operation context and triggers deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – An int array. The destination offsets of deslice, could only be postive.
-
__device__ __forceinline__ void deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list)¶
Configures a DTE context to a deslice operation context and triggers deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – An int array. The destination offsets of deslice, could only be postive.
-
__device__ __forceinline__ tops::event deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets)¶
Configures a DTE context to a deslice operation context and triggers deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – An int array. The destination offsets of deslice, could only be postive.
- Returns
-
__device__ __forceinline__ tops::event deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list)¶
Configures a DTE context to a deslice operation context and triggers deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – An int array. The destination offsets of deslice, could only be postive.
- Returns
-
__device__ __forceinline__ void transpose(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *layout)¶
Configures a DTE context to a transpose operation context and triggers transpose operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
-
__device__ __forceinline__ void transpose(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> layout_list)¶
Configures a DTE context to a transpose operation context and triggers transpose operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout_list – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
-
__device__ __forceinline__ tops::event transpose_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *layout)¶
Configures a DTE context to a transpose operation context and triggers transpose operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
- Returns
-
__device__ __forceinline__ tops::event transpose_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> layout_list)¶
Configures a DTE context to a transpose operation context and triggers transpose operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout_list – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
- Returns
-
template<typename T>
__device__ __forceinline__ void pad(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T pad_value)¶ Configures a DTE context to a pad operation context and triggers pad operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
-
template<typename T>
__device__ __forceinline__ tops::event pad_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T pad_value)¶ Configures a DTE context to a pad operation context and triggers pad operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
-
template<typename T>
__device__ __forceinline__ void memset(tops_dte_ctx_t &ctx, const mdspan_base &dst, const T value)¶ Configures a DTE context to a memset operation context and triggers memset operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
value – The memset value.
-
template<typename T>
__device__ __forceinline__ tops::event memset_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const T value)¶ Configures a DTE context to a memset operation context and triggers memset operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
value – The memset value.
- Returns
-
__device__ __forceinline__ void mirror_lr(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a left/right mirror operation context and triggers left/right mirror operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
-
__device__ __forceinline__ tops::event mirror_lr_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a left/right mirror operation context and triggers left/right mirror operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
- Returns
-
__device__ __forceinline__ void mirror_tb(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a top/bottom mirror operation context and triggers top/bottom mirror operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
-
__device__ __forceinline__ tops::event mirror_tb_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a top/bottom mirror operation context and triggers top/bottom mirror operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
- Returns
-
__device__ __forceinline__ void broadcast(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a broadcast operation context and triggers broadcast operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
-
__device__ __forceinline__ tops::event broadcast_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src)¶
Configures a DTE context to a broadcast operation context and triggers broadcast operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
- Returns
-
__device__ __forceinline__ void slice_transpose(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets, const int *layout)¶
Configures a DTE context to a slice transpose operation context and triggers slice transpose operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – An int array. The source offsets of slice, could be both postive and negative.
layout – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
-
__device__ __forceinline__ void slice_transpose(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list, std::initializer_list<int> layout_list)¶
Configures a DTE context to a slice transpose operation context and triggers slice transpose operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – An int array. The source offsets of slice, could be both postive and negative.
layout_list – An int array. The dim-order transformation on each dim. Transpose_layout should be exclusive.
-
__device__ __forceinline__ tops::event slice_transpose_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *offsets, const int *layout)¶
Configures a DTE context to a slice transpose operation context and triggers slice transpose operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets – The source offsets of slice, could be both postive and negative. An int array.
layout – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
- Returns
-
__device__ __forceinline__ tops::event slice_transpose_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> offsets_list, std::initializer_list<int> layout_list)¶
Configures a DTE context to a slice transpose operation context and triggers slice transpose operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
offsets_list – The source offsets of slice, could be both postive and negative. An int array.
layout_list – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
- Returns
-
__device__ __forceinline__ void transpose_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *layout, const int *offsets)¶
Configures a DTE context to a transpose deslice operation context and triggers transpose deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
offsets – The destination offsets of deslice, could only be postive. An int array.
-
__device__ __forceinline__ void transpose_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> layout_list, std::initializer_list<int> offsets_list)¶
Configures a DTE context to a transpose deslice operation context and triggers transpose deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout_list – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
offsets_list – The destination offsets of deslice, could only be postive. An int array.
-
__device__ __forceinline__ tops::event transpose_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *layout, const int *offsets)¶
Configures a DTE context to a transpose deslice operation context and triggers transpose deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
offsets – The destination offsets of deslice, could only be postive. An int array.
- Returns
-
__device__ __forceinline__ tops::event transpose_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, std::initializer_list<int> layout_list, std::initializer_list<int> offsets_list)¶
Configures a DTE context to a transpose deslice operation context and triggers transpose deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
layout_list – The dim-order transformation on each dim. Transpose_layout should be exclusive. An int array.
offsets_list – The destination offsets of deslice, could only be postive. An int array.
- Returns
-
template<typename T>
__device__ __forceinline__ void slice_pad(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T pad_value)¶ Configures a DTE context to a slice padding operation context and triggers slice padding operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
void
-
template<typename T>
__device__ __forceinline__ tops::event slice_pad_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T pad_value)¶ Configures a DTE context to a slice padding operation context and triggers slice padding operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
-
__device__ __forceinline__ void slice_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, const int *dst_offsets)¶
Configures a DTE context to a slice deslice operation context and triggers slice deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
void
-
__device__ __forceinline__ tops::event slice_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, const int *dst_offsets)¶
Configures a DTE context to a slice deslice operation context and triggers slice deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
-
__device__ __forceinline__ void slice_broadcast(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape)¶
Configures a DTE context to a slice broadcast operation context and triggers slice broadcast operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
- Returns
void
-
__device__ __forceinline__ tops::event slice_broadcast_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape)¶
Configures a DTE context to a slice broadcast operation context and triggers slice broadcast operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
- Returns
-
__device__ __forceinline__ void slice_expand(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, int phase, int ratio)¶
Configures a DTE context to a slice expand operation context and triggers slice expand operation bound to a DTE context synchronously.
Warning
slice_expand from BPE1 to BPE2/BPE4 are not supported for SDTE.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
phase – The certain bytes selected from source element as destination element. The valid phase could be 0~3.
ratio – The ratio of data shrink. The valid ratio could be 0 or 1.
- Returns
void
-
__device__ __forceinline__ tops::event slice_expand_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *src_offsets, const unsigned int *slice_shape, int phase, int ratio)¶
Configures a DTE context to a slice expand operation context and triggers slice expand operation bound to a DTE context asynchronously.
Warning
slice_expand from BPE1 to BPE2/BPE4 are not supported for SDTE.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
src_offsets – The source offsets of slice, could be both postive and negative.
slice_shape – The slice buffer dims.
phase – The certain bytes selected from source element as destination element. The valid phase could be 0~3.
ratio – The ratio of data shrink. The valid ratio could be 0 or 1.
- Returns
-
__device__ __forceinline__ void shrink_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets, int phase, int ratio)¶
Configures a DTE context to a shrink deslice operation context and triggers shrink deslice operation bound to a DTE context synchronously.
Warning
shrink_deslice from BPE2/BPE4 to BPE1 are not supported for SDTE.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
phase – The certain bytes selected from source element as destination element. The valid phase could be 0~3.
ratio – The ratio of data shrink. The valid ratio could be 0 or 1.
- Returns
void
-
__device__ __forceinline__ tops::event shrink_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets, int phase, int ratio)¶
Configures a DTE context to a shrink deslice operation context and triggers shrink deslice operation bound to a DTE context asynchronously.
Warning
shrink_deslice from BPE2/BPE4 to BPE1 are not supported for SDTE.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
phase – The certain bytes selected from source element as destination element. The valid phase could be 0~3.
ratio – The ratio of data shrink. The valid ratio could be 0 or 1.
- Returns
-
template<typename T>
__device__ __forceinline__ void memset_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets, const T value)¶ Configures a DTE context to a filling deslice operation context and triggers filling deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
value – The memset value.
- Returns
void
-
template<typename T>
__device__ __forceinline__ tops::event memset_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets, const T value)¶ Configures a DTE context to a filling deslice operation context and triggers filling deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
value – The memset value.
- Returns
-
template<typename T>
__device__ __forceinline__ void mirror_lr_pad(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T value)¶ Configures a DTE context to a mirror_lr pad operation context and triggers mirror_lr pad operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
void
-
template<typename T>
__device__ __forceinline__ tops::event mirror_lr_pad_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T value)¶ Configures a DTE context to a mirror_lr pad operation context and triggers mirror_lr pad operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
-
template<typename T>
__device__ __forceinline__ void mirror_tb_pad(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T value)¶ Configures a DTE context to a mirror_tb pad operation context and triggers mirror_tb pad operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
void
-
template<typename T>
__device__ __forceinline__ tops::event mirror_tb_pad_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const unsigned int *pad_low, const unsigned int *pad_high, const unsigned int *pad_mid, const T value)¶ Configures a DTE context to a mirror_tb pad operation context and triggers mirror_tb pad operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
pad_low – Number of elements to be added on the lowest edge of each dim.
pad_high – Number of elements to be added on the highest edge of each dim.
pad_mid – Number of elements to be added between each two elements of each dim.
pad_value – The value to be padded.
- Returns
-
__device__ __forceinline__ void mirror_lr_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets)¶
Configures a DTE context to a mirror_lr deslice operation context and triggers mirror_lr deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
void
-
__device__ __forceinline__ tops::event mirror_lr_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets)¶
Configures a DTE context to a mirror_lr deslice operation context and triggers mirror_lr deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
-
__device__ __forceinline__ void mirror_tb_deslice(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets)¶
Configures a DTE context to a mirror_tb deslice operation context and triggers mirror_tb deslice operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
void
-
__device__ __forceinline__ tops::event mirror_tb_deslice_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const int *dst_offsets)¶
Configures a DTE context to a mirror_tb deslice operation context and triggers mirror_tb deslice operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dst_offsets – The destination offsets of deslice, could only be postive.
- Returns
-
template<typename T>
__device__ __forceinline__ void sub_sample(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const T dim_stride)¶ Configures a DTE context to a sub-sample operation context and triggers sub-sample operation bound to a DTE context synchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dim_stride – The stride of sub-sample operation.
- Returns
void
-
template<typename T>
__device__ __forceinline__ tops::event sub_sample_async(tops_dte_ctx_t &ctx, const mdspan_base &dst, const mdspan_base &src, const T dim_stride)¶ Configures a DTE context to a sub-sample operation context and triggers sub-sample operation bound to a DTE context asynchronously.
- Parameters
ctx – A DTE context.
dst – The mdspan of the address of the destination buffer.
src – The mdspan of the address of the source buffer.
dim_stride – The stride of sub-sample operation.
- Returns
-
__device__ __forceinline__ void wait()¶
Blocks current thread and waits for the SIP mailbox bound to it.
- Parameters
None –
-
__TOPS_ADJUST_ADDR()¶
-
__TOPS_SHARED_CDTE_COUNT¶
-
__TOPS_BLOCK_SHARED_EDTE_COUNT¶
-
__TOPS_LOCAL_EDTE_COUNT¶
-
__TOPS_PRIVATE_EDTE_COUNT¶
-
__TOPS_LOCALSHARED_SDTE_COUNT¶
-
__TOPS_EOF_BLOCK_SHARED_CDTE¶
-
__TOPS_EOF_SHARED_CDTE¶
-
__TOPS_EOF_PRIVATE_CDTE¶
-
__TOPS_EOF_LOCALSHARED_SDTE¶
-
__TOPS_EOF_BLOCK_SHARED_EDTE¶
-
__TOPS_EOF_SHARED_EDTE¶
-
__TOPS_EOF_PRIVATE_EDTE¶
-
__TOPS_GET_CDTE_VC_START(ENGID)¶
-
struct tops::mdspan_base¶
- #include <__tops_dte.h>
The struct
mdspan_base
defines the basic properties of objectmdspan
.Subclassed by tops::mdspan
Public Members
-
void *opaque¶
pointer to the array.
-
int rank¶
the number of dimensions of the array.
-
unsigned int shape[5]¶
the shape of the array.
-
unsigned long long total_size¶
the total size of the array.
-
int elem_bytes¶
the bytes of an element in the array.
-
int mc_id¶
the memory control id of the array.
-
void *opaque¶
-
struct tops::mdspan : public tops::mdspan_base¶
- #include <__tops_dte.h>
The struct
mdspan
is a data structure that embodies an array and incorporates additional properties, such as address space information, shape, and other attributes.Public Functions
-
__device__ __forceinline__ mdspan() = default¶
To create an mdspan without initilization User must call init to do initialization later.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ mdspan(AddrSpace as, T *p, Args... dims)¶ To create a mdspan.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(AddrSpace as, T *p, int (&dims)[n])¶ To create a mdspan.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(AddrSpace as, T *p, unsigned int (&dims)[n])¶ To create a mdspan.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ mdspan(AddrSpace as, int mid, T *p, Args... dims)¶ To create a mdspan.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(AddrSpace as, int mid, T *p, int (&dims)[n])¶ To create a mdspan.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(AddrSpace as, int mid, T *p, unsigned int (&dims)[n])¶ To create a mdspan.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ mdspan(T *p, Args... dims)¶ Create a mdspan with a device pointer and its shape.
- Parameters
p – A device pointer.
dims – The dimension of p.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(T *p, int (&dims)[n])¶ Create a mdspan with a device pointer and an array representing the shape.
- Parameters
p – A device pointer.
dims – An array representing the dimension of p.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(T *p, unsigned int (&dims)[n])¶ Create a mdspan with a device pointer and an array representing the shape.
- Parameters
p – A device pointer.
dims – An array representing the dimension of p.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ mdspan(int mid, T *p, Args... dims)¶ Create a mdspan with a device pointer, shape infomation and the id of mc.
- Parameters
mid – the id of machine controller.
p – A device pointer.
dims – The dimension of p.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(int mid, T *p, int (&dims)[n])¶ Create a mdspan with a device pointer, shape infomation and the id of mc.
- Parameters
mid – the id of machine controller.
p – A device pointer.
dims – An array representing the dimension of p.
-
template<typename T, int n>
inline __device__ __forceinline__ mdspan(int mid, T *p, unsigned int (&dims)[n])¶ Create a mdspan with a device pointer, shape infomation and the id of mc.
- Parameters
mid – the id of machine controller.
p – A device pointer.
dims – An array representing the dimension of p.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ T &get(Args... indices) const¶ Get an element from mdspan.
-
template<typename T, typename ...Args>
inline __device__ __forceinline__ void set(const T val, Args... indices)¶ Set value for element in mdspan.
-
__device__ __forceinline__ mdspan() = default¶
-
struct tops_dte_ctx_s¶
- #include <__tops_dte.h>
The struct for DTE context. It keeps the state of a virtual DMA engine.
First, users should initialize it by invoking
init
or usingdte_scope
.config_*
will configure what operation the current operation is.trigger
will run the current operation and return an event which can be waited until the current operation finished.
A context can only run one operation once. When compiled with
-DTOPS_ENABLE_DTE_CHECK
, the illegal behaviour of DTE will be asserted.Note
tops_dte_ctx_t
is an aligned version oftops_dte_ctx_s
.Public Functions
-
__attribute__((section(".text.__tops_dte_ctx_s_resident_"))) inline __device__ tops_dte_ctx_s()¶
DTE context constructor.
-
inline __device__ ~tops_dte_ctx_s()¶
DTE context destructor.
- __device__ __attribute__ ((section(".text.__tops_dte_ctx_init_resident_"))) void init(bool force
Initialize a DTE context.
2.4. Math¶
This section describes the API of math functions, including the API for both scalar and vector.
Scalar¶
This section describes the API for math functions of scalar.
-
__device__ __forceinline__ float sinf(float s)¶
Calculates the sine of a floating-point value.
- Parameters
s – The value in radians for which sine will be calculated.
- Returns
The sine of the input value.
-
__device__ __forceinline__ float cosf(float s)¶
Calculates the cosine of a floating-point value.
- Parameters
s – The value in radians for which cosine will be calculated.
- Returns
The cosine of the input value.
-
__device__ __forceinline__ float tanf(float s)¶
Calculates the tangent of a floating-point value.
- Parameters
s – The value in radians for which tangent will be calculated.
- Returns
The tangent of the input value.
-
__device__ __forceinline__ float asinf(float s)¶
Calculates the arcsine of a floating-point value.
- Parameters
s – The value for which the arcsine will be calculated.
- Returns
The arcsine of the input value in radians.
-
__device__ __forceinline__ float acosf(float s)¶
Calculates the arccosine of a floating-point value.
- Parameters
s – The value for which the arccosine will be calculated.
- Returns
The arccosine of the input value in radians.
-
__device__ __forceinline__ float atanf(float s)¶
Calculates the arctangent of a floating-point value.
- Parameters
s – The value for which the arctangent will be calculated.
- Returns
The arctangent of the input value in radians.
-
__device__ __forceinline__ float atan2f(float lhs, float rhs)¶
Calculates the arctangent of the quotient of two floating-point values.
- Parameters
lhs – The dividend value.
rhs – The divisor value.
- Returns
The arctangent of the quotient lhs/rhs in radians.
-
__device__ __forceinline__ float sinhf(float s)¶
Calculates the hyperbolic sine(sinh) of a floating-point value.
- Parameters
s – The value for which the hyperbolic sine will be calculated.
- Returns
The hyperbolic sine of the input value.
-
__device__ __forceinline__ float coshf(float s)¶
Calculates the hyperbolic cosine(cosh) of a floating-point value.
- Parameters
s – The value for which the hyperbolic cosine will be calculated.
- Returns
The hyperbolic cosine of the input value.
-
__device__ __forceinline__ float tanhf(float s)¶
Calculates the hyperbolic tangent(tanh) of a floating-point value.
- Parameters
s – The value for which the hyperbolic tangent will be calculated.
- Returns
The hyperbolic tangent of the input value.
-
__device__ __forceinline__ float asinhf(float s)¶
Calculates the inverse hyperbolic sine(asinh) of a floating-point value.
asinhf( \(\pm 0\)) returns \(\pm 0\).
asinhf( \(\pm \infty\)) return \(\pm \infty\).
- Parameters
s – The value for which the inverse hyperbolic sine will be calculated.
- Returns
The inverse hyperbolic sine of the input value.
-
__device__ __forceinline__ float acoshf(float s)¶
Calculates the inverse hyperbolic cosine(acosh) of a floating-point value.
acoshf( \( 1 \)) returns \( 0 \).
acoshf(
s
) returns NaN fors
outside interval [ \( - \infty\) ,1].acoshf( \( + \infty\)) return \( + \infty\).
- Parameters
s – The value for which the inverse hyperbolic cosine will be calculated.
- Returns
The inverse hyperbolic cosine of the input value.
-
__device__ __forceinline__ float atanhf(float s)¶
Calculates the inverse hyperbolic tangent(atanh) of a floating-point value.
atanhf( \(\pm 0\)) returns \(\pm 0\).
atanhf( \(\pm 1\)) return \(\pm \infty\).
atanhf(
s
) returns NaN fors
outside interval [-1,1].
- Parameters
s – The value for which the inverse hyperbolic tangent will be calculated.
- Returns
The inverse hyperbolic tangent of the input value.
-
__device__ __forceinline__ float expf(float s)¶
Calculates the exponential function (e^x) of a floating-point value.
- Parameters
s – The value for which the exponential function will be calculated.
- Returns
The exponential function of the input value.
-
__device__ __forceinline__ float expm1f(float s)¶
Calculates the exponential function minus one(e^x-1) of a floating-point value.
- Parameters
s – The value for which the exponential function minus one will be calculated.
- Returns
the exponential function minus one of the input value.
-
__device__ __forceinline__ float exp2f(float s)¶
Calculates 2 raised to the power of a floating-point value.
- Parameters
s – The exponent value.
- Returns
2 raised to the power of the input value.
-
__device__ __forceinline__ float logf(float s)¶
Calculates the natural logarithm (base e) of a floating-point value.
- Parameters
s – The value for which the natural logarithm will be calculated.
- Returns
The natural logarithm of the input value.
-
__device__ __forceinline__ float log1pf(float s)¶
Calculates the natural logarithm (base e) of a floating-point value plus one.
- Parameters
s – The value for which the natural logarithm puls one will be calculated.
- Returns
The natural logarithm of the input value plus one.
-
__device__ __forceinline__ float log2f(float s)¶
Calculates the base-2 logarithm of a floating-point value.
- Parameters
s – The value for which the base-2 logarithm will be calculated.
- Returns
The base-2 logarithm of the input value.
-
__device__ __forceinline__ float log10f(float s)¶
Calculates the base-10 logarithm of a floating-point value.
- Parameters
s – The value for which the base-10 logarithm will be calculated.
- Returns
The base-10 logarithm of the input value.
-
__device__ __forceinline__ float logbf(float s)¶
Calculate the integer part of the base-2 logarithm of the absolute value of a floating-point value.
logbf( \(\pm 0\)) returns \( - \infty\).
logbf( \(\pm \infty\)) return \( + \infty\).
- Parameters
s – The input floating-point value.
- Returns
The value of the integral part of log2(|s|).
-
__device__ __forceinline__ float floorf(float s)¶
Rounds the input value downward to the largest integral value that is not greater than the original value.
- Parameters
s – The input value.
- Returns
The largest integral value that is not greater than the original value.
-
__device__ __forceinline__ float ceilf(float s)¶
Rounds the input value upward to the smallest integral value that is not less than the original value.
- Parameters
s – Value whose ceil is returned.
- Returns
The smallest integral value that is not less than the original value.
-
__device__ __forceinline__ float roundf(float s)¶
Return the integral value nearest to x rounding halfway cases away from zero, regardless of the current rounding direction.
- Parameters
s – Value rounded to nearest.
- Returns
The value of a rounded to the nearest integral.
-
__device__ __forceinline__ float sigmoidf(float s)¶
Compute the sigmoid activation function.
\(\text{result} = \frac{1}{{e^{-s} + 1}}\)
- Parameters
s – The input value.
- Returns
The result of sigmoid function.
-
__device__ __forceinline__ float geluf(float s)¶
Compute the gelu activation function.
\(\text{result} = s \cdot 0.5 \cdot \left(1 + \text{erf}\left(\frac{s} {\sqrt{2}}\right)\right)\)
- Parameters
s – The input value.
- Returns
The result of gelu function.
-
__device__ __forceinline__ float softplusf(float s)¶
Compute the softplus activation function.
\(\text{result} = \log(1 + \exp(s))\)
- Parameters
s – The input value.
- Returns
The result of softplus function.
-
__device__ __forceinline__ float sqrtf(float s)¶
Computes the square root of a given floating-point number.
- Parameters
s – The floating-point number for which the square root is to be calculated.
- Returns
The square root of the input number.
-
__device__ __forceinline__ float rsqrtf(float s)¶
Computes the reciprocal square root of a given floating-point number.
This function calculates the reciprocal square root of the input floating-point number (1 / sqrt(x)) and returns the result.
- Parameters
s – The floating-point number for which the reciprocal square root is to be calculated.
- Returns
The reciprocal square root of the input number.
-
__device__ __forceinline__ float cbrtf(float s)¶
Computes the cube root of a given floating-point number.
- Parameters
s – The floating-point number for which the cube root is to be calculated.
- Returns
The cube root of the input number.
-
__device__ __forceinline__ float powf(float lhs, float rhs)¶
Computes the power of a given base raised to a given exponent (lhs ^ rhs).
- Parameters
lhs – The base value.
rhs – The exponent value.
- Returns
The result of base raised to the power of the exponent.
-
__device__ __forceinline__ float signf(float s)¶
Returns the sign of a given number.
This function determines the sign of the input number and returns: -1 if the number is negative, 0 if the number is zero, 1 if the number is positive.
- Parameters
s – The number for which the sign is to be determined.
- Returns
1 if the number is negative, 0 if the number is zero, 1 if the number is positive.
-
__device__ __forceinline__ float fmaxf(float lhs, float rhs)¶
Returns the maximum of two given floating-point numbers.
- Parameters
lhs – The first floating-point number.
rhs – The second floating-point number.
- Returns
The maximum value between the two input numbers.
-
__device__ __forceinline__ float fminf(float lhs, float rhs)¶
Returns the minimum of two given floating-point numbers.
- Parameters
lhs – The first floating-point number.
rhs – The second floating-point number.
- Returns
The minimum value between the two input numbers.
-
__device__ __forceinline__ float fabsf(float s)¶
Returns the absolute value of a given floating-point number.
- Parameters
s – The floating-point number for which the absolute value is to be calculated.
- Returns
The absolute value of the input number.
-
__device__ __forceinline__ float isfinitef(float s)¶
Checks if a given floating-point number is finite.
This function checks whether the input floating-point number is finite, meaning it is neither infinite nor a NaN (Not-a-Number) value.
- Parameters
s – The floating-point number to be checked.
- Returns
return 1 if the number is finite, 0 otherwise.
-
__device__ __forceinline__ float isnanf(float s)¶
Checks if a given floating-point number is NaN (Not-a-Number).
This function checks whether the input floating-point number is NaN, which represents an undefined or unrepresentable value.
- Parameters
s – The floating-point number to be checked.
- Returns
return 1 if the number is NaN, 0 otherwise.
-
__device__ __forceinline__ float copysignf(float lhs, float rhs)¶
Copies the sign of one floating-point number onto another.
This function returns a value with the magnitude of the first input number (magnitude of
lhs
) and the sign of the second input number (sign ofrhs
).- Parameters
lhs – The magnitude value.
rhs – The sign value.
- Returns
A value with the magnitude of
lhs
and the sign ofrhs
.
-
__device__ __forceinline__ float remf(float lhs, float rhs)¶
Compute single-precision floating-point remainder.
Compute single-precision floating-point remainder
r
of dividinglhs
byrhs
for nonzerorhs
. Thus \(r = lhs - n \cdot rhs\). The valuen
is the integer value nearest \(\frac{lhs}{rhs}\). In the case when \(\left| n - \frac{lhs}{rhs} \right| = \frac{1}{2}\), then evenn
value is chosen.remf(lhs, \(\pm 0\)) returns NaN.
remf( \(\pm \infty\) , rhs) return NaN.
remf(lhs, \(\pm \infty\)) return lhs for finite lhs.
Vector¶
This section describes the API for math functions of vector.
-
enum RoundingMode¶
Rounding mode for type conversion.
Note
RM_RZ: Rounds toward zero. RM_RN: Rounds to nearest even. RM_RZ_CLAMP: Rounds toward zero. For int8 result, clamp to [-127, 127] when overflows. RM_RN_CLAMP: Rounds to nearest even. For int8 result, clamp to [-127, 127] when overflows.
Values:
-
enumerator RM_DEFAULT¶
-
enumerator RM_RZ¶
-
enumerator RM_RN¶
-
enumerator RM_RZ_CLAMP¶
-
enumerator RM_RN_CLAMP¶
-
enumerator RM_DEFAULT¶
-
template<typename VT>
KRT_API int vlength()¶ Returns element number of the specified vector type.
Note
Supports all vector types. Be aware that the return value is the number of unified_scalar.
- Template Parameters
VT – vector type.
- Returns
Element number.
-
template<typename VT>
KRT_API int vlength(const VT &v)¶ Returns element number of the specified vector type.
Note
Supports all vector types. Be aware that the return value is the number of unified_scalar.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Element number.
-
template<typename VT>
KRT_API VT vzero()¶ Returns a vector with all bits set to zero.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Returns
A vector with all bits set to zero.
-
template<typename VT>
KRT_API VT vload(generic_ptr addr)¶ Loads a vector from the address.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
addr – Memory address.
- Returns
Vector loaded from the address.
-
template<typename VT>
KRT_API void vstore(const VT &v, generic_ptr addr)¶ Stores the vector to the address.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
addr – Memory address.
- Returns
None.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vbitcast(const FROM_VT &v)¶ Reinterprets the vector to another type.
Note
Two vector types can be bitcast to each other as long as their sizes are the same.
- Template Parameters
TO_VT – Reinterpreted type.
FROM_VT – Original type.
- Parameters
v – Vector.
- Returns
Reinterpreted vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vcast(const FROM_VT &v)¶ Converts the vector to another type.
Note
Supported vector type conversions: vchar -> vshort, vchar -> vint, vchar -> vhalf, vchar -> vbfloat, vchar -> vfloat, vuchar -> vushort, vuchar -> vuint, vuchar -> vhalf, vuchar -> vbfloat, vuchar -> vfloat, vshort -> vchar, vshort -> vint, vshort -> vhalf, vshort -> vbfloat, vshort -> vfloat, vushort -> vuchar, vushort -> vuint, vushort -> vhalf, vushort -> vbfloat, vushort -> vfloat, vint -> vchar, vint -> vshort, vint -> vhalf, vint -> vbfloat, vint -> vfloat, vuint -> vuchar, vuint -> vushort, vuint -> vhalf, vuint -> vbfloat, vuint -> vfloat, vhalf -> vchar, vhalf -> vshort, vhalf -> vint, vhalf -> vbfloat, vhalf -> vfloat, vbfloat -> vchar, vbfloat -> vshort, vbfloat -> vint, vbfloat -> vhalf, vbfloat -> vfloat, vfloat -> vchar, vfloat -> vshort, vfloat -> vint, vfloat -> vhalf, vfloat -> vbfloat.
- Template Parameters
TO_VT – Cast vector type.
FROM_VT – Original vector Type.
mode – Rounding Mode.
- Parameters
v – Original vector
- Returns
Cast vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vcastrn(const FROM_VT &v)¶ Casts floating vector to integral vector, floating vector to floating vector, integral vector to floating vector by rounding to nearest even.
Note
Supported vector type conversions: vhalf -> vchar, vhalf -> vshort, vhalf -> vint, vbfloat -> vchar, vbfloat -> vshort, vbfloat -> vint, vfloat -> vchar, vfloat -> vshort, vfloat -> vint. vhalf -> vbfloat vhalf -> vfloat vbfloat -> vhalf vbfloat -> vfloat vfloat -> vhalf vfloat -> vbfloat vchar -> vhalf vchar -> vbfloat vchar -> vfloat vuchar -> vhalf vuchar -> vbfloat vuchar -> vfloat vshort -> vhalf vshort -> vbfloat vshort -> vfloat vushort -> vhalf vushort -> vbfloat vushort -> vfloat vint -> vhalf vint -> vbfloat vint -> vfloat vuint -> vhalf vuint -> vbfloat vuint -> vfloat
- Template Parameters
TO_VT – Cast vector type.
FROM_VT – Original vector type.
- Parameters
v – Original vector.
- Returns
Cast vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vcastrz(const FROM_VT &v)¶ Casts floating vector to integral vector, floating vector to floating vector, integral vector to floating vector by rounding toward zero.
Note
Supported vector type conversions: vhalf -> vchar, vhalf -> vshort, vhalf -> vint, vbfloat -> vchar, vbfloat -> vshort, vbfloat -> vint, vfloat -> vchar, vfloat -> vshort, vfloat -> vint. vhalf -> vbfloat vhalf -> vfloat vbfloat -> vhalf vbfloat -> vfloat vfloat -> vhalf vfloat -> vbfloat vchar -> vhalf vchar -> vbfloat vchar -> vfloat vuchar -> vhalf vuchar -> vbfloat vuchar -> vfloat vshort -> vhalf vshort -> vbfloat vshort -> vfloat vushort -> vhalf vushort -> vbfloat vushort -> vfloat vint -> vhalf vint -> vbfloat vint -> vfloat vuint -> vhalf vuint -> vbfloat vuint -> vfloat
- Template Parameters
TO_VT – Cast vector type.
FROM_VT – Original vector type.
- Parameters
v – Original vector.
- Returns
Cast vector.
-
template<typename VT>
KRT_API VT vadd(const VT &lhs, const VT &rhs)¶ Adds two vectors per element: res[i] = lhs[i] + rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Augend vector.
rhs – Addend vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsub(const VT &lhs, const VT &rhs)¶ Subtracts two vectors per element: res[i] = lhs[i] - rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Minuend vector.
rhs – Subtrahend vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmul(const VT &lhs, const VT &rhs)¶ Multiplies two vectors per element: res[i] = lhs[i] * rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Multiplicand vector.
rhs – Multiplier vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vdiv(const VT &lhs, const VT &rhs)¶ Divides two vectors per element: res[i] = lhs[i] / rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Dividend vector.
rhs – Divisor vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vrem(const VT &lhs, const VT &rhs)¶ Computes remainder of two integral vectors divided by each other per element: res[i] = rem(lhs[i], rhs[i]).
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Dividend vector.
rhs – Divisor vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmod(const VT &lhs, const VT &rhs)¶ Computes modulus of two integral vectors divided by each other per element: res[i] = mod(lhs[i], rhs[i]).
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Dividend vector.
rhs – Divisor vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsign(const VT &v)¶ Returns sign of the vector per element: res[i] = sign(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmac(const VT &lhs, const VT &rhs, const VT &acc)¶ Computes lhs[i] * rhs[i] + acc[i].
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Multiplicand vector.
rhs – Multiplier vector.
acc – Addend vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmas(const VT &lhs, const VT &rhs, const VT &acc)¶ Computes -lhs[i] * rhs[i] + acc[i].
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Multiplicand vector.
rhs – Multiplier vector.
acc – Addend vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vimas(const VT &lhs, const VT &rhs, const VT &acc)¶ Computes lhs[i] * rhs[i] - acc[i].
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Multiplicand vector.
rhs – Multiplier vector.
acc – Subtrahend vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vcbrt(const VT &v)¶ Computes cubic root per element: res[i] = cbrt(v[i]).
Note
Implemented by formula: cbrt(x) = exp(log(x) / 3). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vtan(const VT &v)¶ Computes tangent per element: res[i] = tan(v[i]).
Note
Implemented by formula on gcu200/gcu210: tan(x) = sin(x) / cos(x). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vatan(const VT &v)¶ Computes arc tangent per element: res[i] = atan(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vatan2(const VT &lhs, const VT &rhs)¶ Computes arc tangent of lhs / rhs per element: res[i] = atan2(lhs[i] / rhs[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Dividend vector.
rhs – Divisor vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vtanh(const VT &v)¶ Computes hyperbolic tangent per element: res[i] = tanh(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vatanh(const VT &v)¶ Computes inverse hyperbolic tangent per element: res[i] = atanh(v[i]).
Note
Implemented by formula: atanh(x) = (log(1 + x) - log(1 - x)) / 2. Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsin(const VT &v)¶ Computes sine per element: res[i] = sin(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vasin(const VT &v)¶ Computes arc sine per element: res[i] = asin(v[i]).
Note
Implemented by formula on gcu200/gcu210: PI / 2 - acos(x) Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsinh(const VT &v)¶ Computes hyperbolic sine per element: res[i] = sinh(v[i]).
Note
Implemented by formula on gcu200/gcu210: sinh(x) = (exp(x) - exp(-x)) / 2. Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vasinh(const VT &v)¶ Computes inverse hyperbolic sine per element: res[i] = asinh(v[i]).
Note
Implemented by formula: asinh(x) = log(x + sqrt(1 + x * x)). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vcos(const VT &v)¶ Computes cosine per element: res[i] = cos(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vacos(const VT &v)¶ Computes arc cosine per element: res[i] = acos(v[i]).
Note
Implemented by formula on gcu200/gcu210: acos(x) = x >= 0 ? atan(sqrt((1 - x * x) / (x * x))) : PI - atan(sqrt((1 - x * x) / (x * x))). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vcosh(const VT &v)¶ Computes hyperbolic cosine per element: res[i] = cosh(v[i]).
Note
Implemented by formula on gcu200/gcu210: cosh(x) = (exp(x) + exp(-x)) / 2. Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vacosh(const VT &v)¶ Computes inverse hyperbolic cosine per element: res[i] = acosh(v[i]).
Note
Implemented by formula: acosh(x) = 2 * log(sqrt((x + 1) / 2)
sqrt((x - 1) / 2)). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vabs(const VT &v)¶ Computes absolute value per element: res[i] = abs(v[i]).
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vneg(const VT &v)¶ Changes sign of the vector per element: res[i] = -v[i].
Note
Supported vector types: vchar/vshort/vint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsqrt(const VT &v)¶ Computes square root per element: res[i] = sqrt(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vrsqrt(const VT &v)¶ Computes reciprocal square root per element: res[i] = rsqrt(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vfloor(const VT &v)¶ Rounds the input vector downward to the largest integral value that is not greater than the original value per element: res[i] = floor(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vceil(const VT &v)¶ Rounds the input vector upward to the smallest integral value that is not less than the input vector per element: res[i] = ceil(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vround(const VT &v)¶ Rounds to nearest integral value, with halfway cases rounded away from zero per element: res[i] = round(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vtrunc(const VT &v)¶ Rounds toward zero per element: res[i] = trunc(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vrint(const VT &v)¶ Rounds to nearest even integral value per element: res[i] = rint(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vexp(const VT &v)¶ Computes base-e exponential function per element: res[i] = e ^ v[i].
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vexpm1(const VT &v)¶ Computes base-e exponential minus one per element: res[i] = (e ^ v[i]) - 1.
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vexp2(const VT &v)¶ Computes binary (base-2) exponential function per element: res[i] = 2 ^ v[i].
Note
Implemented by formula: exp2(x) = exp(x * log(2)). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vlog(const VT &v)¶ Computes natural logarithm per element: res[i] = log(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vlog1p(const VT &v)¶ Computes natural logarithm of one plus the input value per element: res[i] = log(1 + v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vlog2(const VT &v)¶ Computes binary (base-2) logarithm per element: res[i] = log2(v[i]).
Note
Implemented by formula: log2(x) = log(x) / log(2). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vlog10(const VT &v)¶ Computes common (base-10) logarithm per element: res[i] = log10(v[i]).
Note
Implemented by formula: log10(x) = log(x) / log(10). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vlogb(const VT &v)¶ Computes the exponent of v per element, which is the integral part of vlog2 |v|: res[i] = logb(v[i]).
Note
Implemented by formula: logb(x) = floor(log(abs(x)) / log(2)). Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename INT_VT, typename VT>
KRT_API INT_VT vilogb(const VT &v)¶ Computes the exponent of v per element, which is the integral part of vlog2 |v|, and cast the result to integral type: res[i] = ilogb(v[i]).
Note
Implemented by formula: ilogb(x) = integer(logb(x)). Supported vector types: vhalf/vbfloat -> vshort, vfloat -> vint.
- Template Parameters
INT_VT – Integral vector type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vpower(const VT &lhs, const VT &rhs)¶ Computes lhs raised to the power rhs per element: res[i] = lhs[i] ^ rhs[i].
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Base vector.
rhs – Exponent vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vgelu(const VT &v)¶ Computes Gaussian Error Linear Unit (GELU) activation function per element: res[i] = gelu(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsoftplus(const VT &v)¶ Computes vlog(vexp(v) + 1) per element: res[i] = log((e ^ v[i]) + 1).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vsigmoid(const VT &v)¶ Computes sigmoid activation function per element: res[i] = sigmoid(v[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vdim(const VT &lhs, const VT &rhs)¶ Computes the positive difference between lhs and rhs per element: res[i] = lhs[i] > rhs[i] ? lhs[i] - rhs[i] : 0.
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vhypot(const VT &lhs, const VT &rhs)¶ Computes hypotenuse of a right-angled triangle whose legs are lhs and rhs per element: res[i] = sqrt((lhs[i] ^ 2) + (rhs[i] ^ 2)).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vcopysign(const VT &lhs, const VT &rhs)¶ Returns a vector with the magnitude of lhs and the sign of rhs per element: res[i] = copysign(lhs[i], rhs[i]).
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Magnitude vector
rhs – Sign vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT visnan(const VT &v)¶ Returns if the vector is NaN (Not-A-Number) per element: res[i] = isnan(v[i]).
Note
Supported vector types: vhalf/vbfloat -> vushort, vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT visfinite(const VT &v)¶ Returns if the vector is finite per element: res[i] = isfinite(v[i]).
Note
Supported vector types: vhalf/vbfloat -> vushort, vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmax(const VT &lhs, const VT &rhs)¶ Returns the largest of two vectors per element: res[i] = max(lhs[i], rhs[i]).
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vmin(const VT &lhs, const VT &rhs)¶ Returns the smallest of two vectors per element: res[i] = min(lhs[i], rhs[i]).
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint/vhalf/vbfloat/vfloat.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vand(const VT &lhs, const VT &rhs)¶ Computes bitwise AND per element: res[i] = lhs[i] & rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vor(const VT &lhs, const VT &rhs)¶ Computes bitwise OR per element: res[i] = lhs[i] | rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vxor(const VT &lhs, const VT &rhs)¶ Computes bitwise XOR per element: res[i] = lhs[i] ^ rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vnot(const VT &v)¶ Computes bitwise NOT per element: res[i] = ~v[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vshl(const VT &lhs, const VT &rhs)¶ Shifts lhs left per element, by rhs bits per element: res[i] = lhs[i] << rhs[i].
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector to be shifted.
rhs – Vector of shift bit num.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vshr(const VT &lhs, const VT &rhs)¶ Shifts lhs right per element, by rhs bits per element: res[i] = lhs[i] >> rhs[i]. For signed types, uses arithmetic shift. For unsigned types, uses logical shift.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
lhs – Vector to be shifted.
rhs – Vector of shift bits.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vshli(const VT &v, int bits)¶ Shifts vector left per element, by rhs bits: res[i] = v[i] << bits.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
bits – Shift bit num.
- Returns
Result vector.
-
template<typename VT>
KRT_API VT vshri(const VT &v, int bits)¶ Shifts vector right per element, by rhs bits. res[i] = v[i] >> bits. For signed types, uses arithmetic shift. For unsigned types, uses logical shift.
Note
Supported vector types: vchar/vuchar/vshort/vushort/vint/vuint.
- Template Parameters
VT – Vector type.
- Parameters
v – Vector.
bits – Shift bit num.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT veq(const VT &lhs, const VT &rhs)¶ Compares if two vectors are equal per element: res[i] = lhs[i] == rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT vne(const VT &lhs, const VT &rhs)¶ Compares if two vectors are not equal per element: res[i] = lhs[i] != rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT vlt(const VT &lhs, const VT &rhs)¶ Compares if lhs is less than rhs per element: res[i] = lhs[i] < rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT vle(const VT &lhs, const VT &rhs)¶ Compares if lhs is less than or equal to rhs per element: res[i] = lhs[i] <= rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT vgt(const VT &lhs, const VT &rhs)¶ Compares if lhs is greater than rhs per element: res[i] = lhs[i] > rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename BOOL_VT, typename VT>
KRT_API BOOL_VT vge(const VT &lhs, const VT &rhs)¶ Compares if lhs is greater than or equal to rhs per element: res[i] = lhs[i] >= rhs[i].
Note
Supported vector types: vchar/vuchar -> vuchar, vshort/vushort/vhalf/vbfloat -> vushort, vint/vuint/vfloat -> vuint.
- Template Parameters
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
VT – Vector type.
- Parameters
lhs – Vector.
rhs – Another vector.
- Returns
Result vector.
-
template<typename VT, typename BOOL_VT>
KRT_API VT vselect(const BOOL_VT &cond, const VT &lhs, const VT &rhs)¶ Select from two vectors by condition per element: res[i] = cond[i] ? lhs[i] : rhs[i].
Note
Supported vector types: <vchar/vuchar, vuchar>, <vshort/vushort/vhalf/vbfloat, vushort>, <vint/vuint/vfloat, vuint>.
- Template Parameters
VT – Vector type.
BOOL_VT – Bool vector type. Boolean vector is unsigned integral vector, whose element bit width is the same as the input vector. For TRUE value, all bits are set to 1, and for FALSE value, all bits are set to 0.
- Parameters
cond – Condition vector.
lhs – Candidate vector.
rhs – Another candidate vector.
- Returns
Result vector.
-
template<typename VT, typename T>
KRT_API VT vbroadcast(const T &val)¶ Returns a vector with all elements set to the specified value: res[i] = val. The scalar type should match the vector element type.
Note
Supported vector types: <vchar, char>, <vuchar, unsigned char>, <vshort, short>, <vushort, unsigned short>, <vint, int>, <vuint, unsigned int>, <vhalf, half>, <vbfloat, bfloat>, <vfloat, float>.
- Template Parameters
VT – Vector type.
T – Scalar type.
- Parameters
val – Scalar value.
- Returns
Result vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vpack2(const FROM_VT &v1, const FROM_VT &v2)¶ Packs two vectors into one vector. The values will be cast. And the element bit width of the source vector should be 2 times of the element bit width of the target vector.
Note
Supported vector types: vshort -> vchar, vushort -> vuchar, vint -> vshort, vint -> vhalf, vint -> vbfloat, vuint -> vushort, vuint -> vhalf, vuint -> vbfloat, vfloat -> vhalf, vfloat -> vbfloat, vfloat -> vshort, vhalf -> vchar, vbfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v1 – Source vector.
v2 – Another source vector.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vpack2rn(const FROM_VT &v1, const FROM_VT &v2)¶ Packs two floating vectors into one integral vector. The values will be cast by rounding to nearest even. And the element bit width of the source vector should be 2 times of the element bit width of the target vector.
Note
Supported vector types: vfloat -> vshort, vhalf -> vchar, vbfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v1 – Source vector.
v2 – Another source vector.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vpack2rz(const FROM_VT &v1, const FROM_VT &v2)¶ Packs two floating vectors into one integral vector. The values will be cast by rounding toward zero. And the element bit width of the source vector should be 2 times of the element bit width of the target vector.
Note
Supported vector types: vfloat -> vshort, vhalf -> vchar, vbfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v1 – Source vector.
v2 – Another source vector.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vpack4(const FROM_VT &v1, const FROM_VT &v2, const FROM_VT &v3, const FROM_VT &v4)¶ Packs four vectors into one vector. The values will be cast. And the element bit width of the source vector should be 4 times of the element bit width of the target vector.
Note
Supported vector types: vint -> vchar, vuint -> vuchar, vfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v1 – Source vector 1.
v2 – Source vector 2.
v3 – Source vector 3.
v4 – Source vector 4.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vpack4rn(const FROM_VT &v1, const FROM_VT &v2, const FROM_VT &v3, const FROM_VT &v4)¶ Packs four floating vectors into one integral vector. The values will be cast by rounding to nearest even. And the element bit width of the source vector should be 4 times of the element bit width of the target vector.
Note
Supported vector type: vfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v1 – Source vector 1.
v2 – Source vector 2.
v3 – Source vector 3.
v4 – Source vector 4.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vpack4rz(const FROM_VT &v1, const FROM_VT &v2, const FROM_VT &v3, const FROM_VT &v4)¶ Packs four floating vectors into one integral vector. The values will be cast by rounding toward zero. And the element bit width of the source vector should be 4 times of the element bit width of the target vector.
Note
Supported vector type: vfloat -> vchar.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v1 – Source vector 1.
v2 – Source vector 2.
v3 – Source vector 3.
v4 – Source vector 4.
- Returns
Packed vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vunpack0(const FROM_VT &v)¶ Unpacks a vector from another vector. The values will be cast. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the first half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the first quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vchar -> vshort, vchar -> vint, vchar -> vhalf, vchar -> vbfloat, vchar -> vfloat, vuchar -> vushort, vuchar -> vuint, vuchar -> vhalf, vuchar -> vbfloat, vuchar -> vfloat, vshort -> vint, vshort -> vfloat, vushort -> vuint, vushort -> vfloat, vhalf -> vint, vhalf -> vfloat, vbfloat -> vint, vbfloat -> vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vunpack0rn(const FROM_VT &v)¶ Unpacks an integral vector from a floating vector. The values will be cast by rounding to nearest even. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the first half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the first quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vhalf -> vint, vbfloat -> vint.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vunpack0rz(const FROM_VT &v)¶ Unpacks an integral vector from a floating vector. The values will be cast by rounding toward zero. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the first half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the first quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vhalf -> vint, vbfloat -> vint.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vunpack1(const FROM_VT &v)¶ Unpacks a vector from another vector. The values will be cast. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the second half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the second quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vchar -> vshort, vchar -> vint, vchar -> vhalf, vchar -> vbfloat, vchar -> vfloat, vuchar -> vushort, vuchar -> vuint, vuchar -> vhalf, vuchar -> vbfloat, vuchar -> vfloat, vshort -> vint, vshort -> vfloat, vushort -> vuint, vushort -> vfloat, vhalf -> vint, vhalf -> vfloat, vbfloat -> vint, vbfloat -> vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vunpack1rn(const FROM_VT &v)¶ Unpacks an integral vector from a floating vector. The values will be cast by rounding to nearest even. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the second half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the second quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vhalf -> vint, vbfloat -> vint.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vunpack1rz(const FROM_VT &v)¶ Unpacks an integral vector from a floating vector. The values will be cast by rounding toward zero. If the element bit width of the target vector is 2 times of the element bit width of the source vector, the second half of the elements will be selected. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the second quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vhalf -> vint, vbfloat -> vint.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vunpack2(const FROM_VT &v)¶ Unpacks a vector from another vector. The values will be cast. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the third quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vchar -> vint, vchar -> vfloat, vuchar -> vuint, vuchar -> vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vunpack3(const FROM_VT &v)¶ Unpacks a vector from another vector. The values will be cast. If the element bit width of the target vector is 4 times of the element bit width of the source vector, the fourth quarter of the elements will be selected. Other cases are not supported.
Note
Supported vector types: vchar -> vint, vchar -> vfloat, vuchar -> vuint, vuchar -> vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<int part, typename TO_VT, typename FROM_VT, RoundingMode mode = RM_DEFAULT>
KRT_API TO_VT vunpack(const FROM_VT &v)¶ Unpacks a vector from another vector. The values will be cast. Refers to vunpack0/vunpack1/vunpack2/vunpack3 for supported scenarios.
- Template Parameters
part – Which part to unpack.
TO_VT – Target vector type.
FROM_VT – Source vector type.
mode – Rounding mode.
- Parameters
v – Source vector.
- Returns
Unpacked vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vget_exponent(const FROM_VT &v)¶ Returns exponent part of the vector per element.
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Result vector.
-
template<typename TO_VT, typename FROM_VT>
KRT_API TO_VT vget_mantissa(const FROM_VT &v)¶ Returns mantissa part of the vector per element.
Note
Supported vector types: vhalf/vbfloat/vfloat.
- Template Parameters
TO_VT – Target vector type.
FROM_VT – Source vector type.
- Parameters
v – Source vector.
- Returns
Result vector.
-
template<typename T, typename VT>
KRT_API T vreduce_all(const VT &v)¶ Computes bitwise AND among all vector elements to a scalar.
Note
Supported types: vchar -> char, vuchar -> unsigned char, vshort -> short, vushort -> unsgined short, vint -> int, vuint -> unsigned int.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
-
template<typename T, typename VT>
KRT_API T vreduce_any(const VT &v)¶ Computes bitwise OR among all vector elements to a scalar.
Note
Supported types: vchar -> char, vuchar -> unsigned char, vshort -> short, vushort -> unsgined short, vint -> int, vuint -> unsigned int.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
-
template<typename T, typename VT>
KRT_API T vreduce_sum(const VT &v)¶ Sums all elements of the vector to a scalar.
Note
Supported types: vchar -> int, vuchar -> unsigned int, vshort -> int, vushort -> unsigned int, vint -> int, vuint -> unsigned int, vhalf -> float, vbfloat -> float, vfloat -> float.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
-
template<typename T, typename VT>
KRT_API T vreduce_min(const VT &v)¶ Returns the minimum element of the vector.
Note
Supported types: vchar -> char, vuchar -> unsigned char, vshort -> short, vushort -> unsigned short, vint -> int, vuint -> unsigned int, vhalf -> __fp16, vbfloat -> __bf16, vfloat -> float.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
-
template<typename T, typename VT>
KRT_API T vreduce_max(const VT &v)¶ Returns the maximum element of the vector.
Note
Supported types: vchar -> char, vuchar -> unsigned char, vshort -> short, vushort -> unsigned short, vint -> int, vuint -> unsigned int, vhalf -> __fp16, vbfloat -> __bf16, vfloat -> float.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
-
template<typename T, typename VT>
KRT_API T vreduce_mean(const VT &v)¶ Returns the mean value of all elements of the vector.
Note
Supported types: vchar -> int, vuchar -> unsigned int, vshort -> int, vushort -> unsigned int, vint -> int, vuint -> unsigned int, vhalf -> float, vbfloat -> float, vfloat -> float.
- Template Parameters
T – Scalar type.
VT – Vector type.
- Parameters
v – Vector.
- Returns
Reduced scalar.
2.5. Synchronization¶
This section describes functions related to synchronization.
-
__device__ __forceinline__ void __syncsubthreads()¶
Synchronizes all subthreads within a thread.
-
__device__ __forceinline__ void __syncthreads()¶
Synchronizes the threads in the current block.
-
__device__ __forceinline__ void __syncblocks()¶
Synchronizes all threads in the current grid.