1. 介绍¶
1.1. 简介¶
Enflame GCU 算子支持列表以支持TF/Pytorch XLA算子为主。
Enflame GCU 支持的算子语义可参考TensorFlow XLA operation semantics.
2. XLA 算子支持列表¶
2.1. AfterAll¶
- Semantics
AfterAll takes a variadic number of tokens and produces a single token. Tokens are primitive types which can be threaded between side-effecting operations to enforce ordering. AfterAll can be used as a join of tokens for ordering a operation after a set operations.
- Boundary
2.2. AllReduce¶
- Semantics
Performs a custom computation across replicas.
- Boundary
2.3. AlltoAll¶
- Semantics
AllToAll is a collective operation that sends data from all cores to all cores.
- Boundary
2.4. BatchNormGrad¶
- Semantics
Calculates gradients of batch norm.
- Boundary
Supported DataType: FP32/BF16/FP16.
Supported Format:NHWC/CHNW/NCHW
2.5. BatchNormInference¶
- Semantics
Normalizes an array across batch and spatial dimensions.
- Boundary
Supported DataType: FP32/BF16/FP16.
Supported Format:NHWC/CHNW/NCHW
2.6. BatchNormTraining¶
- Semantics
Normalizes an array across batch and spatial dimensions.
- Boundary
Supported DataType: FP32/BF16/FP16.
Supported Format:NHWC/CHNW/NCHW
2.7. BitcastConvertType¶
- Semantics
Similar to a tf.bitcast in TensorFlow, performs an element-wise bitcast operation from a data shape to a target shape. The dimensions must match, and the conversion is an element-wise one; e.g. s32 elements become f32 elements via bitcast routine. Bitcast is implemented as a low-level cast, so machines with different floating-point representations will give different results.
- Boundary
Supported DataType: U64/S64/FP32/U32/S32/BF16/FP16/U16/S16/U8/S8/PRED.
U64/S64 can be converted to U32/S32 only.
2.8. Broadcast¶
- Semantics
Adds dimensions to an array by duplicating the data in the array.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
2.9. BroadcastInDim¶
- Semantics
Expands the size and rank of an array by duplicating the data in the array.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
2.10. Call¶
- Semantics
Invokes a computation with the given arguments.
- Boundary
Constraints: The same as XLA_CPU implementations
2.11. Cholesky¶
- Semantics
Computes the Cholesky decomposition of a batch of symmetric (Hermitian) positive definite matrices.
- Boundary
2.12. Clamp¶
- Semantics
Clamps an operand to within the range between a minimum and maximum value.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.13. Collapse¶
- Semantics
Collapses dimensions of an array into one dimension.
- Boundary
2.14. CollectivePermute¶
- Semantics
CollectivePermute is a collective operation that sends and receives data cross replicas.
- Boundary
2.15. Concatenate¶
- Semantics
Concatenate composes an array from multiple array operands. The array is of the same rank as each of the input array operands (which must be of the same rank as each other) and contains the arguments in the order that they were specified.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
Only support source dimention size less than 65536.
2.16. Conditional¶
- Semantics
Executes true_computation if pred is true, false_computation if pred is false, and returns the result.
- Boundary
Constraints: The same as XLA_CPU implementations
2.17. Conv¶
- Semantics
As ConvWithGeneralPadding, but the padding is specified in a short-hand way as either SAME or VALID. SAME padding pads the input (lhs) with zeroes so that the output has the same shape as the input when not taking striding into account. VALID padding simply means no padding.
- Boundary
Supported DataType: FP32/BF16/FP16.
General constraints:
window size: R <= 13, S <=13
Stride H/W in [1, 2, 4]
LHS dilation H/W in [1, 2, 4]
RHS dilation H/W in [1, 2, 4]
2.18. ConvWithGeneralPadding¶
- Semantics
Computes a convolution of the kind used in neural networks. Here, a convolution can be thought of as a n-dimensional window moving across a n-dimensional base area and a computation is performed for each possible position of the window.
- Boundary
Same as Conv.
2.19. ConvertElementType¶
- Semantics
Similar to an element-wise static_cast in C++, performs an element-wise conversion operation from a data shape to a target shape. The dimensions must match, and the conversion is an element-wise one; e.g. s32 elements become f32 elements via an s32-to-f32 conversion routine.
- Boundary
Supported converting DataType:
source DataType: PRED
destimation DataType: U8, S32, F32
source DataType: S8
destination DataType: S8, S16, F16, BF16
source DataType: U8
destination DataType: U8, U16, F16, BF16, U32, S32, F32
source DataType: U16
destination DataType: U8, U16, U32, F32
source DataType: S16
destination DataType: S8, S16, S32, F32
source DataType: F16
destination DataType: S16, F16, S32, F32
source DataType: BF16
destination DataType: S16, BF16, S32, F32
source DataType: U32
destination DataType: U16, BF16, U32, F32, U64, S64
source DataType: S32
destination DataType: S16, BF16, S32, F32, U64, S64
source DataType: F32
destination DataType: S16, F16, BF16, U32, S32, F32
source DataType: U64
destination DataType: U32, S32
source DataType: S64
destination DataType: U32, S32
2.20. CrossReplicaSum¶
- Semantics
Performs AllReduce with a summation computation.
- Boundary
2.21. CustomCall¶
- Semantics
Call a user-provided function within a computation.
- Boundary
2.22. Dot¶
- Semantics
The operation performs sum of products over the second dimension of lhs (or the first if it has rank 1) and the first dimension of rhs. These are the “contracted” dimensions. The contracted dimensions of lhs and rhs must be of the same size. In practice, it can be used to perform dot products between vectors, vector/matrix multiplications or matrix/matrix multiplications.
- Boundary
Supported DataType: F32/BF16/FP16
2.23. DotGeneral¶
- Semantics
DotGeneral performs the sum of products over contracting dimensions specified in ‘dimension_numbers’.
- Boundary
Supported DataType: F32/BF16/FP16
2.24. DynamicSlice¶
- Semantics
DynamicSlice extracts a sub-array from the input array at dynamic start_indices. The size of the slice in each dimension is passed in size_indices, which specify the end point of exclusive slice intervals in each dimension: [start, start + size). The shape of start_indices must be rank == 1, with dimension size equal to the rank of operand.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Supported number of slice dimentions: less than 10 dims’ Slice
UpdateSlice dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.25. DynamicUpdateSlice¶
- Semantics
DynamicUpdateSlice generates a result which is the value of the input array operand, with a slice update overwritten at start_indices. The shape of update determines the shape of the sub-array of the result which is updated. The shape of start_indices must be rank == 1, with dimension size equal to the rank of operand.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Supported number of slice dimentions: less than 10 dims’ Slice
UpdateSlice dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.26. Element-wise binary arithmetic operations:Add¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.27. Element-wise binary arithmetic operations:Sub¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.28. Element-wise binary arithmetic operations:Mul¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.29. Element-wise binary arithmetic operations:Div¶
- Semantics
- Boundary
Supported DataType: FP32/BF16/FP16.
2.30. Element-wise binary arithmetic operations:Rem¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.31. Element-wise binary arithmetic operations:Max¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.32. Element-wise binary arithmetic operations:Min¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.33. Element-wise binary arithmetic operations:LogicalAnd¶
- Semantics
Logical AND
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.34. Element-wise binary arithmetic operations:LogicalOR¶
- Semantics
Logical OR
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.35. Element-wise comparison operations:Eq¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.36. Element-wise comparison operations:Ne¶
- Semantics
Not equal-to
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.37. Element-wise comparison operations:Ge¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.38. Element-wise comparison operations:Gt¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.39. Element-wise comparison operations:Le¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.40. Element-wise comparison operations:Lt¶
- Semantics
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.41. Element-wise unary functions:Abs¶
- Semantics
Element-wise abs x -> |x|.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.42. Element-wise unary functions:Ceil¶
- Semantics
Element-wise ceil x -> ⌈x⌉.
- Boundary
Supported DataType: FP32/FP16/BF16.
2.43. Element-wise unary functions:Cos¶
- Semantics
Element-wise cosine x -> cos(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
use an equivalent x in -pi ~ pi for better accuracy
2.44. Element-wise unary functions:Exp¶
- Semantics
Element-wise natural exponential x -> e^x.
- Boundary
Supported DataType: FP32/FP16/BF16.
2.45. Element-wise unary functions:Floor¶
- Semantics
Element-wise floor x -> ⌊x⌋.
- Boundary
Supported DataType: FP32/FP16.
2.46. Element-wise unary functions:IsFinite¶
- Semantics
Tests whether each element of operand is finite, i.e., is not positive or negative infinity, and is not NaN. Returns an array of PRED values with the same shape as the input, where each element is true if and only if the corresponding input element is finite.
- Boundary
Supported DataType: FP32/FP16/BF16.
2.47. Element-wise unary functions:Log¶
- Semantics
Element-wise natural logarithm x -> ln(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.48. Element-wise unary functions:LogicalNot¶
- Semantics
Element-wise logical not x -> !(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.49. Element-wise unary functions:PopulationCount¶
- Semantics
Computes the number of bits set in each element of operand.
- Boundary
2.50. Element-wise unary functions:Neg¶
- Semantics
Element-wise negation x -> -x.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.51. Element-wise unary functions:Sign¶
- Semantics
Element-wise sign operation x -> sgn(x)
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.52. Element-wise unary functions:Tanh¶
- Semantics
Element-wise hyperbolic tangent x -> tanh(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.53. Element-wise unary functions:sin¶
- Semantics
Element-wise hyperbolic tangent x -> sin(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
use an equivalent x in -pi ~ pi for better accuracy
2.54. Element-wise unary functions:atan2¶
- Semantics
Element-wise hyperbolic tangent x -> atan2(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.55. Element-wise unary functions:power¶
- Semantics
Element-wise hyperbolic tangent x, y -> power(x, y).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.56. Element-wise unary functions:rsqrt¶
- Semantics
Element-wise hyperbolic tangent x -> rsqrt(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.57. Element-wise unary functions:sqrt¶
- Semantics
Element-wise hyperbolic tangent x -> sqrt(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.58. Element-wise unary functions:round¶
- Semantics
Element-wise hyperbolic tangent x -> round(x).
- Boundary
Supported DataType: FP32/FP16/BF16.
2.59. Element-wise unary functions:shift_left¶
- Semantics
Element-wise hyperbolic tangent x, y -> x << y.
- Boundary
Supported DataType: U32/S32/U16/S16/U8/S8.
2.60. Element-wise unary functions:shift_right_arithmetic¶
- Semantics
Element-wise hyperbolic tangent x, y -> x >> y (arithmetic).
- Boundary
Supported DataType: U32/S32/U16/S16/U8/S8.
2.61. Element-wise unary functions:shift_right_logical¶
- Semantics
Element-wise hyperbolic tangent x, y -> x >> y (logical).
- Boundary
Supported DataType: U32/S32/U16/S16/U8/S8.
2.62. Fft¶
- Semantics
The XLA FFT operation implements the forward and inverse Fourier Transforms for real and complex inputs/outputs. Multidimensional FFTs on up to 3 axes are supported, except on TPU, where only a single axis is supported (please file a github issue if you require higher order).
- Boundary
2.63. Gather¶
- Semantics
The XLA gather operation stitches together several slices (each slice at a potentially different runtime offset) of an input array.
- Boundary
Supported DataType: C128/C64/F64/U64/S64/FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.64. GetDimensionSize¶
- Semantics
Returns the size of the given dimension of the operand. The operand must be array shaped.
- Boundary
2.65. GetTupleElement¶
- Semantics
Sets the dynamic size of XlaOp’s given dimension. The operand must be array shaped.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.66. Infeed¶
- Semantics
Reads a single data item from the implicit Infeed streaming interface of the device, interpreting the data as the given shape and its layout, and returns a XlaOp of the data. Multiple Infeed operations are allowed in a computation, but there must be a total order among the Infeed operations. For example, two Infeeds in the code below have a total order since there is a dependency between the while loops.
- Boundary
Constraints: The same as XLA_CPU implementations
2.67. Outfeed¶
- Semantics
Enqueues an outfeed instruction onto the computation. This instruction generates outgoing data transfers for the given data.
- Boundary
Constraints: The same as XLA_CPU implementations
2.68. lota¶
- Semantics
Builds a constant literal on device rather than a potentially large host transfer. Creates a rank 1 array of values starting at zero and incrementing by one. For floating-point types, the produced array is equivalent to ConvertElementType(Iota(…)) where the Iota is of integral type and the conversion is to the floating-point type.
- Boundary
Supported DataType: U32.
2.69. Map¶
- Semantics
Applies a scalar function over the given operands arrays, producing an array of the same dimensions where each element is the result of the mapped function applied to the corresponding elements in the input arrays.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.70. Pad¶
- Semantics
Expands the given operand array by padding around the array as well as between the elements of the array with the given padding_value. padding_config specifies the amount of edge padding and the interior padding for each dimension.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Only support source dimention size less than 65536.
Only support padded dimention size less than 2048.
2.71. Recv¶
- Semantics
Receives data of the given shape from a Send instruction in another computation that shares the same channel handle. Returns a XlaOp for the received data.
- Boundary
2.72. Reduce¶
- Semantics
Applies a reduction function to one or more arrays in parallel.
- Boundary
Supported DataType: FP32/BF16/FP16/Pred
only Support layout with (3,2,1,0)
2.73. ReducePrecision¶
- Semantics
Models the effect of converting floating-point values to a lower-precision format (such as IEEE-FP16) and back to the original format. The number of exponent and mantissa bits in the lower-precision format can be specified arbitrarily, although all bit sizes may not be supported on all hardware implementations.
- Boundary
Supported DataType: FP32/BF16/FP16
2.74. ReduceWindow¶
- Semantics
Applies a reduction function to all elements in each window of the input multi-dimensional array, producing an output multi-dimensional array with the same number of elements as the number of valid positions of the window. A pooling layer can be expressed as a ReduceWindow. Similar to Reduce, the applied computation is always passed the init_value on the left-hand side.
- Boundary
Supported DataType: F32/S32/BF16/F16
Rank 4
Computation: kMaximum
2.75. ReplicaId¶
- Semantics
Returns the unique ID (U32 scalar) of the replica.
- Boundary
2.76. Reshape¶
- Semantics
Reshapes the dimensions of an array into a new configuration.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Reshape dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.77. Rev¶
- Semantics
Reverses the order of elements in the operand array along the specified dimensions, generating an output array of the same shape. Each element of the operand array at a multidimensional index is stored into the output array at a transformed index. The multidimensional index is transformed by reversing the index in each dimension to be reversed (i.e., if a dimension of size N is one of the reversing dimensions, its index i is transformed into N - 1 - i).
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Only support source dimention size less than 65536.
2.78. RngNormal¶
- Semantics
Constructs an output of a given shape with random numbers generated following the N(μ,σ) normal distribution.
- Boundary
Supported DataType: F32/INT32/BF16/FP16.
Truncated normal is not supported
If seed=0, the timestamp will be used as the random seed.
2.79. RngUniform¶
- Semantics
Constructs an output of a given shape with random numbers generated following the uniform distribution over the interval [a,b).
- Boundary
Supported DataType: F32/INT32/BF16/FP16.
Truncated normal is not supported
If seed=0, the timestamp will be used as the random seed.
2.80. Scatter¶
- Semantics
The XLA scatter operation generates a result which is the value of the input array operand, with several slices (at indices specified by scatter_indices) updated with the values in updates using update_computation.
- Boundary
Supported DataType: C128/C64/F64/U64/S64/FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.81. Select¶
- Semantics
Constructs an output array from elements of two input arrays, based on the values of a predicate array.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
2.82. SelectAndScatter¶
- Semantics
This operation can be considered as a composite operation that first computes ReduceWindow on the operand array to select an element from each window, and then scatters the source array to the indices of the selected elements to construct an output array with the same shape as the operand array.
- Boundary
Supported Data Rank: 4 dimensions
Supported Select Computation: great and equal than
Supported Scatter Computation: add
For data format, NHWC/NCHW/HWNC/NHCW/CHNW:
Supported DataType: F32/S32/BF16/F16
Padding: kValid or kSame
window size: R <= 13, S <=13
Stride H/W in [1, 2, 4]
LHS dilation H/W in [1, 2, 4]
RHS dilation H/W in [1, 2, 4]
2.83. Send¶
- Semantics
Sends the given operand data to a Recv instruction in another computation that shares the same channel handle. Does not return any data.
- Boundary
2.84. Slice¶
- Semantics
Slicing extracts a sub-array from the input array. The sub-array is of the same rank as the input and contains the values inside a bounding box within the input array where the dimensions and indices of the bounding box are given as arguments to the slice operation.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Supported number of slice dimentions: less than 10 dims’ Slice
UpdateSlice dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.85. Sort¶
- Semantics
Sort the operands.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.86. Transpose¶
- Semantics
Permutes the operand dimensions with the given permutation, so ∀ i . 0 ≤ i < rank ⇒ input_dimensions[permutation[i]] = output_dimensions[i].
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Supported number of slice dimentions: less than 10 dims’ copy
Copy dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.87. TriangularSolve¶
- Semantics
Solves systems of linear equations with lower or upper triangular coefficient matrices by forward- or back-substitution. Broadcasting along leading dimensions, this routine solves one of the matrix systems op(a) * x = b, or x * op(a) = b, for the variable x, given a and b, where op(a) is either op(a) = a, or op(a) = Transpose(a), or op(a) = Conj(Transpose(a)).
- Boundary
2.88. Tuple¶
- Semantics
A tuple containing a variable number of data handles, each of which has its own shape.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8.
2.89. While¶
- Semantics
Sequentially executes the body until the condition fails.
- Boundary
Constraints: Stream cache can’t be enabled.
2.90. copy¶
- Semantics
Copy operands.
- Boundary
Supported DataType: FP32/U32/S32/BF16/FP16/U16/S16/U8/S8
Supported number of slice dimentions: less than 10 dims’ copy
Copy dimention size less than 65536 bytes is prefered, otherwise the performance will suffer.
2.91. Clz¶
- Semantics
Count leading zeros.
- Boundary
3. 附录¶
版本 |
描述 |
日期 |
V2.0 |
初版 |
2022年1月 |