1. User Guide¶

1.1. Core Concepts¶

TopsInference Workflow¶

When using TopsInference, only several steps are needed:

Create a Parser, and read your ONNX model to get Network.
Create an Optimizer, and build the Network into Engine.
Using the created engine to do inference.

2. TopsInference API Reference¶

2.1. Attribute¶

TopsInference.version¶

The version of TopsInference.

TopsInference.__version__

TopsInference.KDEFAULT¶

For default precision inference. Please reference to set_build_flag.

TopsInference.KFP16¶

For fp16 precision inference. Please reference to set_build_flag.

TopsInference.KFP16_MIX¶

For fp32 and fp16 precision inference. Please reference to set_build_flag.

TopsInference.KINT8_FP32_MIX¶

For fp32 and int8 precision inference. Please reference to set_build_flag.

TopsInference.KREFIT¶

For refit an engine. Please reference to set_build_flag.

TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST¶

When doing inference, input and output data is host memory.

TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE¶

When doing inference, input and output data is GCU device memory.

TopsInference.TIF_BOOL¶

TopsInference DataType: bool DataType.

TopsInference.TIF_INDEX¶

TopsInference DataType: index DataType.

TopsInference.TIF_INT8¶

TopsInference DataType: int8 DataType.

TopsInference.TIF_INT16¶

TopsInference DataType: int16 DataType.

TopsInference.TIF_INT32¶

TopsInference DataType: int32 DataType.

TopsInference.TIF_INT64¶

TopsInference DataType: int64 DataType.

TopsInference.TIF_UINT8¶

TopsInference DataType: uint8 DataType.

TopsInference.TIF_UINT16¶

TopsInference DataType: uint16 DataType.

TopsInference.TIF_UINT32¶

TopsInference DataType: uint32 DataType.

TopsInference.TIF_UINT64¶

TopsInference DataType: uint64 DataType.

TopsInference.TIF_FP16¶

TopsInference DataType: fp16 DataType.

TopsInference.TIF_FP32¶

TopsInference DataType: fp32 DataType.

TopsInference.TIF_BF16¶

TopsInference DataType: bf16 DataType.

TopsInference.TIF_INVALID¶

TopsInference DataType: invalid DataType.

TopsInference.CalibrationAlgoType¶

TopsInference Calibration Algorithm Type.

TopsInference.CalibrationAlgoType.KL_ENTROPY¶

TopsInference Calibration Algorithm Type: ENTROPY.

TopsInference.CalibrationAlgoType.MAX_MIN¶

TopsInference Calibration Algorithm Type: MAX_MIN.

TopsInference.CalibrationAlgoType.MAX_MIN_EMA¶

TopsInference Calibration Algorithm Type: MAX_MIN_EMA.

TopsInference.CalibrationAlgoType.PERCENTILE¶

TopsInference Calibration Algorithm Type: PERCENTILE.

TopsInference.CalibrationAlgoType.INVALID¶

Invalid TopsInference Calibration Algorithm Type.

2.2. Fuction¶

TopsInference.create_parser¶

This function can be used to create a parser object according the model_type.

TopsInference.create_parser(model_type:TopsInference.ONNX_MODEL) -> Optional[parser handle, None]

Parameters

model_type
input. The model_type can only be TopsInference.ONNX_MODEL.

Returns

A parser handle, or None.

Please reference to Parser.

TopsInference.create_optimizer¶

An optimizer object will be created by calling this function. Then the optimizer object will be used to build the parsed Network,

TopsInference.create_optimizer() -> Optional[optimizer handle]

Returns

An optimizer handle, or None.

Please reference to Optimizer.

TopsInference.load¶

An engine object will be created by calling this function. Then the engine object will be used to do inference.

TopsInference.load(engine_file:str) -> Optional[engine handle, None]

Parameters

engine_file
input. The engine_file saved last time when building, please see Optimizer.

Returns

An engine handle, or None. When an error occurs, it will raise an exception.

Please reference to Engine.

TopsInference.device¶

Use “with TopsInference.device(card_id, cluster_id):” to set device context. After execution of the with-block is finished, the set devices are released. It has same effect with TopsInference.set_deviceand TopsInference.release_device.

Under multi-thread condition, each sub-thread will exclusively utilize the claimed resource if device is called within the sub-thread.
if device() is called in main thread, not called in sub thread, the sub thread will share the cluster resources claimed by main thread.
if both main thread and sub thread claimed resource with device(), the resource claimed by sub thread is used within sub thread.
if some sub threads claims resource with device(), some does not, each sub thread individually follow the above rule 3 and 2 based on its resource claiming status.

TopsInference.device(card_id:int, cluster_id:Optional[int,list]) -> device_handle

Parameters

card_id
input. The card id.

cluster_id
input. The cluster id. Now, the maximum number is 6.

Generally, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.

i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.

cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.

Returns

A Device handle.

Attention

This handle is only effective in current scope, and it will be released automatically after jumping out of the scope.So you must finish everything in current scope. A same device can only be set once, and also, you must not set any other device in current scope with nested.

TopsInference.set_device¶

Specify the running device until release. set_device are isolated from each other under multi-process. Scope under different thread condition, please reference to TopsInference.device.

TopsInference.set_device(card_id:int, cluster_id:[int,list]) -> device_handle

Parameters

card_id
input. The card id.

cluster_id
input. The cluster id. Now, the maximum number is 6.

Generally now, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.

i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.

cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.

Returns

A Device handle.

Attention

This handle is always effective until you call release_device to destroy it. And the device handle must be released after not using again.A same device can only be set once, and also, you must not set any other device between set_device and release_device.

demo code:

handle = TopsInference.set_device(0, 0)
# TopsInference infer code
...
TopsInference.release_device(handle)

TopsInference.release_device¶

TopsInference.release_device(handle:device_handle)

Parameters

handle
input. The device handle to destroy.

Attention

If the device handle is created by calling set_device, then it must be released by calling release_device.

TopsInference.create_stream¶

Create a Stream to support run inference with async mode. Locating all operations on the same stream, you can run them with non-blocking mode. Then, you can call synchronize to wait to finish all operations.

TopsInference.create_stream() -> stream

TopsInference.mem_alloc¶

Allocate buffer on device.

TopsInference.mem_alloc(size:int) -> DeviceMemory

Parameters

size
input. The allocated buffer size.

Returns

A buffer object. This object is a DeviceMemory object. When allocated buffer size exceeds the maximum device memory, an exception will occur.

TopsInference.mem_free¶

Free buffer on device.

TopsInference.mem_free(ptr:buffer_ptr)

Parameters

ptr
input. The allocated buffer object.

TopsInference.mem_h2d_copy¶

Copy buffer from host to device with sync mode.

TopsInference.mem_h2d_copy(src:numpy.ndarray, dst:DeviceMemory, size:int)

Parameters

src
input. The source buffer object, which is a numpy.ndarray.

dst
input. The destination buffer object, which is allocated by calling mem_alloc.

size
input. The copied buffer size.

TopsInference.mem_d2h_copy¶

Copy buffer from device to host with sync mode.

TopsInference.mem_d2h_copy(src:DeviceMemory_buffer, dst:numpy.ndarray, size:int)

Parameters

src
input. The source buffer object, which is allocated by calling mem_alloc.

dst
input. The destination buffer object, which is a numpy.ndarray.

size
input. The copied buffer size.

TopsInference.mem_h2d_copy_async¶

Copy buffer from host to device with async mode.

TopsInference.mem_h2d_copy_async(src:numpy.ndarray, dst:DeviceMemory, size:int, stream:stream)

Parameters

src
input. The source buffer object, which is a numpy.ndarray.

dst
input. The destination buffer object, which is allocated by calling mem_alloc.

size
input. The copied buffer size.

stream
input. The stream handle on which the copy run, when calling this function, the stream must be specified. Please reference to Stream.

TopsInference.mem_d2h_copy_async¶

Copy buffer from device to host with async mode. Please reference to Stream.

TopsInference.mem_d2h_copy_async(src:DeviceMemory, dst:numpy.ndarray, size:int, stream:stream)

Parameters

src
input. The source buffer object, which is allocated by calling mem_alloc.

dst
input. The destination buffer object, which is a numpy.ndarray.

size
input. The copied buffer size.

stream
input. The stream handle of the copy runs. When calling this function, the stream must be specified. Please reference to Stream.

TopsInference.create_refitter¶

A refitter object will be created by calling this function. Then the refitter object will be used to do refit.

TopsInference.create_refitter(engine:Engine) -> Optional[refitter handle, None]

Parameters

engine
input. The engine object to be refitted.

Returns

An Refitter handle, or None. When an error occurs, it will raise an exception.

Please reference to Refitter.

2.3. Class¶

Parser¶

Please redirect to TopsInference.create_parserfor how to create a Parser object.

read¶

This function reads model(ONNX), returns a * Network handle, or raise exception when the input model has wrong data or un-supported operators.

Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.

read(self, model:str) -> Optional[network handle, None]

Parameters

model
input. The model file to parse.

Returns

A network handle, or None.

read_from_str¶

This function reads ONNX model from strings, returns a * Network handle, or raise exception when the input model has wrong data or un-supported operators.

Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.

read_from_str(model_data:str, model_size:int) -> Optional[network handle, None]

Parameters

model_data
input. The model strings to parse.

model_size
input. The model strings length to parse.

Returns

A network handle, or None.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_input_names¶

Set the input names before reading model, if the model has multi input nodes, all the names should be joined in list, such as [“a”,”b”].

set_input_names(node_name:Optional[list,str])

Parameters

node_name
input. The input names list.

set_input_dtypes¶

Set the input data type before reading model, if the model has multi input nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].

set_input_dtypes(node_dtype:list)

Parameters

node_dtype
input. The input TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.

set_input_shapes¶

Set the input shapes before reading model, when there are multi inputs, shapes should be joined in list, such as [[2, 3, 4], [6, 7, 8]].

set_input_shapes(node_shape:list)

Parameters

node_shape
input. The input shape list.

set_output_names¶

Set the output names before reading model, if the model has multi output names, all the names should be joined in list, such as [“a”,”b”].

set_output_names(node_name:Optional[list,str])

Parameters

node_name
input. The output names list.

set_output_dtypes¶

Set the output data type before reading model, if the model has multi output nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].

set_output_dtypes(node_dtype:list)

Parameters

node_dtype
input. The output TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.

Attention

set_input_names, set_output_names, set_input_shapes, set_input_dtypes, set_output_dtypes are used for setting attributes for current network. They must be called before reading model when parsing.

Layer¶

The layer definition, which constitutes the Network.

get_type¶

Get the layer type.

get_type() -> layer_type

Returns

The layer type, which can be:

TopsInference.TIF_DECONVOLUTION, which means deconvolution layer.

TopsInference.TIF_CONVOLUTION, which means convolution layer.

TopsInference.TIF_UNARY, which means unaryop operation layer.

TopsInference.TIF_TRANSCENDENTAL, which means transcendental layer.

TopsInference.TIF_ELEMENTWISE, which means elementwise opration layer.

TopsInference.TIF_SELECT, which means select layer.

TopsInference.TIF_POOLING, which means pooling layer.

TopsInference.TIF_BATCHNORM, which means batch normalization layer.

TopsInference.TIF_CONVERT, which means convert layer for converting between different data precision.

TopsInference.TIF_CONCAT, which means concat layer.

TopsInference.TIF_CONSTANT, which means constant layer.

TopsInference.TIF_SHUFFLE, which means shuffle layer.

TopsInference.TIF_ACTIVATION, which means activation layer.

TopsInference.TIF_ORDER, which means layer for sorting by a certain rule.

TopsInference.TIF_RNN, which means rnn layer.

TopsInference.TIF_GATHER, which means gather layer.

TopsInference.TIF_MATMUL, which means matmul layer.

TopsInference.TIF_COMPARE, which means compare layer.

TopsInference.TIF_CONDITION, which means condition layer.

TopsInference.TIF_NMS, which means non maximum suppression layer.

TopsInference.TIF_PAD, which means padding layer.

TopsInference.TIF_RANDOM, which means random generator layer.

TopsInference.TIF_REDUCE, which means reduce layer.

TopsInference.TIF_RESHAPE, which means reshape layer.

TopsInference.TIF_RESIZE, which means resize layer.

TopsInference.TIF_ROIALIGN, which means roi align layer, used in faster rcnn and r-fcn, etc.

TopsInference.TIF_SCATTER, which means scatter layer.

TopsInference.TIF_SIGMOID, which means sigmoid layer.

TopsInference.TIF_SLICE, which means slice layer.

TopsInference.TIF_TOPK, which means topk layer.

TopsInference.TIF_TRANSPOSE, which means transpose layer.

TopsInference.TIF_LOG_SOFTMAX, which means log softmax layer.

TopsInference.TIF_MVN, which means mean-variance normalization layer.

TopsInference.TIF_SOFTMAX, which means softmax layer.

TopsInference.TIF_UNKNOWN, which means unknown layer.

get_name¶

Get the layer name.

get_name() -> str

Returns

The layer name, the default name is “”.

set_precision¶

Set the layer precision. Works only in case of TopsInference.KFP16_MIX or TopsInference.KINT8_FP32_MIX.

set_precision(precision:Optional[TopsInference.TIF_FP32, TopsInference.TIF_FP16, TopsInference.TIF_INT8])

Parameters

precision

precision. The layer precision to be set.
In TopsInference.KFP16_MIX mode:

TopsInference.TIF_FP32

TopsInference.TIF_FP16

In TopsInference.KINT8_FP32_MIX mode:

TopsInference.TIF_FP32

TopsInference.TIF_INT8

get_precision¶

Get the layer precision.

get_precision() -> DataType

Returns

The layer precision.

reset_precision¶

Reset the layer precision to default precision.

reset_precision()

Network¶

The internal representation for ONNX model, please see ‘read’ function in Parser.

dump¶

Dump the network structure for debugging, the result will be printed to current terminal window.

dump()

get_layer_num¶

Get the layer number in current network.

get_layer_num() -> int

Returns

The layer number.

get_layer_by_index¶

Get the layer according to the index, the index must be less than the layer number.

get_layer_by_index(index:int) -> layer handle

Parameters

index
index. The index for getting layer, the index must be less than the layer number.

Returns

The Layer handle.

get_layer¶

Get the layer according to the layer name.

get_layer(name:str) -> layer handle

Parameters

name
input. The layer name.

Returns

The Layer handle.

Optimizer¶

Please redirect to TopsInference.create_optimizer for how to create an Optimizer object.

build¶

This function builds from a network to engine, which will be used to do inference.

build(network:network) -> Optional[engine handle, None]

Parameters

network
input. An internal model representation read by Parser .

Returns

An Engine handle, or None. When an error occurs, it will raise an exception.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_build_flag¶

This function sets a flag when building engine. A building flag is used to assign some features of current engine.

TopsInference.KDEFAULT For default model precision inference.

TopsInference.KFP16_MIX For fp16 & fp32 mixed precision inference.

TopsInference.KINT8_FP32_MIX For int8 & fp32 mixed precision inference.

TopsInference.KFP16 For fp16 precision inference.

TopsInference.KREFIT For enable refit an engine.

set_build_flag(flag:Optional[TopsInference.KFP16_MIX,TopsInference.KINT8_FP32_MIX,
                             TopsInference.KFP16,TopsInference.KDEFAULT,
                             TopsInference.KREFIT])

Parameters

flag
input. A building flag.

set_max_shape_range¶

Setting max shape of dynamic shape model input.

set_max_shape_range(max_shape_dims:list)

Parameters

max_shape_dims
input. A json list, when setting model max input shape, the json key must be “main”, and the length must be equal to the number of model input.

demo code:

max_shape_dim_setting = []
max_shape_dim = {}
max_shape_dim["main"] = [[100, 1, 900], [100, 1, 100]]
max_shape_dim_setting.append(max_shape_dim)
optimizer.set_max_shape_range(max_shape_dim_setting)

set_min_shape_range¶

Setting min shape of dynamic shape model input.

set_min_shape_range(min_shape_dims:list)

Parameters

min_shape_dims
input. A json list, when setting model min input shape, the json key must be “main”, and the length must be equal to the number of model input.

demo code:

min_shape_dim_setting = []
min_shape_dim = {}
min_shape_dim["main"] = [[100, 1, 900], [100, 1, 1]]
min_shape_dim_setting.append(min_shape_dim)
optimizer.set_min_shape_range(min_shape_dim_setting)

set_compile_options¶

set optimizer compile option.

set_compile_options(options:dict)

Parameters

arg options: dict of compile option.

demo code:

compile_options = {}
compile_options["max_dim_size"] = "65536"
compile_options['resource_mode'] = '1c12s'
optimizer = TopsInference.create_optimizer()
optimizer.set_compile_options(compile_options)

set_int8_calibrator¶

Setting int8 calibrator in KINT8_FP32_MIX mode.

set_int8_calibrator(calibrator:ICalibrator)

Parameters

calibrator
input. An Object of ICalibrator(IInt8EntropyCalibrator/IInt8MaxMinCalibrator/ IInt8MaxMinEMACalibrator/IInt8PercentCalibrator).

Engine¶

An Engine handle can be created by Optimizer , it can also be created by loading an existed engine file, please reference to TopsInference.load .

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

save_executable¶

This function saves an engine to local disk, which can be used for next time by loading.

save_executable(engine_file:str)

Parameters

engine_file
input. An engine file name to save.

demo code:

engine.save_executable("/path/to/you/file")
engine = TopsInference.load("/path/to/you/file")

run¶

This function can be used for doing inference.

run(input_tensor_list:list,
    output_tensor_list:list,
    buffer_type:Optional[TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST,TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE],
    py_stream=None:stream)

Parameters

input_tensor_list
input. Input tensor list.

output_tensor_list
input. Output tensor list.

buffer_type
input. Buffer type for input and output.

TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST indicates that input buffer and output buffer are on host.

TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE indicates that input buffer and output buffer are on device.

Attention

Now, mixed buffer type is not supported, e.g. IN_HOST_OUT_DEVICE or IN_DEVICE_OUT_HOST. When using host buffer type, the input or output buffer should be a numpy.ndarray.

py_stream
input. Used to do inference with async mode.

The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

When doing inference with buf_type equal to IN_HOST_OUT_HOST, the async mode is not supported now, it means that you must keep py_stream None as input.

run_with_batch¶

Mulit threads inference with different batches. User can use the run_with_batch to inference dynamic batch with the batch_size of any size specified. This method will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.

run_with_batch(sample_nums:int, input_list:list, **kwargs:[output_list,py_stream,buffer_type]) -> future

Parameters

sample_nums
input. The number of sample.

input_list
input. A list consturct by input.

kwargs
input. A map, and the key include such as output_list/py_stream/buffer_type.

if you create outputs before run_with_batch, you should set output_list=outputs.

if output_list is None, will auto allocate output_list.

py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

if you want to use D2D mode, buffer_type should set buffer_type=TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE.

Returns

A future object. Please reference to Future .

runV2¶

Inference with specified device(cluster) and dynamic shape model.

In static shape mode, runV2 also support dynamic batch. It will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.

runV2(input_list:list, **kwargs:[output_list,py_stream]) -> future

Parameters

input_list
input. A tensor or list consturct by input.

kwargs
input. A map, and the key include such as key output_list/py_stream.

if you create outputs before runV2, you should set output_list=outputs.

if output_list is None, will auto allocate output_list.

py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

Returns

A future object. Please reference to Future .

get_input_num¶

Get the input number of engine.

get_input_num() -> int

Returns

The net input number.

get_output_num¶

Get the output number of engine.

get_output_num() -> int

Returns

The net output number.

get_max_input_shape¶

Get the index-th input maximum shape(numpy.array) of engine.

get_max_input_shape(index:int) -> numpy.array

Parameters

index
index. The index for input, the index must be less than the input number.

Returns

The index-th input maximum shape.

get_max_output_shape¶

Get the index-th output maximum shape(numpy.array) of engine.

get_max_output_shape(index:int) -> numpy.array

Parameters

index
index. The index for output, the index must be less than the output number.

Returns

The index-th output maximum shape.

get_min_input_shape¶

Get the index-th input minimum shape(numpy.array) of engine.

get_min_input_shape(index:int) -> numpy.array

Parameters

index
index. The index for input, the index must be less than the input number.

Returns

The index-th input minimum shape.

get_input_shape¶

Get the index-th input shape(numpy.array) of engine.

get_input_shape(index:int) -> numpy.array

Parameters

index
index. The index for input, the index must be less than the input number.

Returns

The index-th real input shape.

get_output_shape¶

Get the index-th output shape(numpy.array) of engine.

get_output_shape(index:int) -> numpy.array

Parameters

index
index. The index for output, the index must be less than the output number.

Returns

The index-th real output shape.

get_input_dtype¶

Get the index-th input data type(TopsInference.DataType) of engine.

get_input_dtype(index:int) -> TopsInference.DataType

Parameters

index
index. The index for input, the index must be less than the input number.

Returns

The index-th input type.

get_output_dtype¶

Get the index-th output data type(TopsInference.DataType) of engine.

get_output_dtype(index:int) -> TopsInference.DataType

Parameters

index
index. The index for output, the index must be less than the output number.

Returns

The index-th output type.

get_input_name¶

Get the index-th input name(ONNX name) of engine.

get_input_name(index:int) -> str

Parameters

index
index. The index for input, the index must be less than the input number.

Returns

The index-th input layer name.

get_output_name¶

Get the index-th output name(ONNX name) of engine.

get_output_name(index:int) -> str

Parameters

index
index. The index for output, the index must be less than the output number.

Returns

The index-th output layer name.

get_device_memory_size¶

Get the memory size of gcu device runtime required.

get_device_memory_size() -> int

Returns

The engine’s memory size required. Return 0 if fail to get memory size.

Device¶

This class is used for Optimizer , Engine .

Before create_optimizer or load engine, the device info must be initialized/released by calling below functions:

TopsInference.device or TopsInference.set_device

TopsInference.release_device

Stream¶

This class is used for running with async mode. Please redirect to TopsInference.create_stream for how to create a Stream object.

synchronize¶

When you want to execute several operations on the same stream, you can call synchronize at the end of the last operation until all the operations have finished.

synchronize()

Future¶

Future provides a mechanism to access the result of asynchronous operations.

release¶

Release future.

release()

get¶

Get output data.

get() -> list[numpy.ndarray]

Returns

The output data.

status¶

Get output data status.

status() -> bool

Returns

If output data is ready, return true, otherwise return false.

wait¶

Wait until the output data is ready.

wait()

DeviceMemory¶

The GCU device memory buffer definition.

get_real_size¶

Get the device memory buffer real size.

get_real_size() -> int

set_shape¶

Set the shape of device memory buffer.

set_shape(shape:list) -> bool

get_real_shape¶

Get the device memory buffer real shape.

get_real_shape() -> list

ICalibrator¶

In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.

get_batch_size¶

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm¶

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch¶

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names
input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache¶

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache¶

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache
input. The calibration cache to write.

IInt8EntropyCalibrator¶

In KINT8_FP32_MIX mode, Entropy calibration chooses the tensor’s scale factor to optimize the quantized tensor’s information-theoretic content, and usually suppresses outliers in the distribution.

get_batch_size¶

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm¶

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch¶

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names
input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache¶

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache¶

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache
input. The calibration cache to write.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_op_precision¶

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name
input. Op name to be set.

dtype
input. Set the op to dtype precision.

set_op_algorithm¶

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name
input. Op name to be set.

algorithm
input. Set the op to calibration algorithm.

set_op_threshold¶

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name
input. Op name to be set.

threshold
input. Set the op to calibration threshold.

get_op_precision¶

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name
input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm¶

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name
input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold¶

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name
input. Op name to get.

Returns

The op calibration threshold.

IInt8MaxMinCalibrator¶

In KINT8_FP32_MIX mode, compared with max-min calibrator, this algorithm uses the ema-scale to adjust the threshold value.

get_batch_size¶

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm¶

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch¶

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names
input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache¶

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache¶

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache
input. The calibration cache to write.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_op_precision¶

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name
input. Op name to be set.

dtype
input. Set the op to dtype precision.

set_op_algorithm¶

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name
input. Op name to be set.

algorithm
input. Set the op to calibration algorithm.

set_op_threshold¶

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name
input. Op name to be set.

threshold
input. Set the op to calibration threshold.

get_op_precision¶

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name
input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm¶

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name
input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold¶

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name
input. Op name to get.

Returns

The op calibration threshold.

IInt8MaxMinEMACalibrator¶

In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.

get_batch_size¶

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm¶

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch¶

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names
input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache¶

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache¶

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache
input. The calibration cache to write.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_op_precision¶

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name
input. Op name to be set.

dtype
input. Set the op to dtype precision.

set_op_algorithm¶

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name
input. Op name to be set.

algorithm
input. Set the op to calibration algorithm.

set_op_threshold¶

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name
input. Op name to be set.

threshold
input. Set the op to calibration threshold.

get_op_precision¶

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name
input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm¶

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name
input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold¶

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name
input. Op name to get.

Returns

The op calibration threshold.

IInt8PercentCalibrator¶

In KINT8_FP32_MIX mode, this algorithm uses histogram percentile value as threshold value.

get_batch_size¶

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm¶

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch¶

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names
input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache¶

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache¶

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache
input. The calibration cache to write.

load_config¶

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file
input. The config file to load.

save_config¶

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file
input. The config file to save.

set_op_precision¶

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name
input. Op name to be set.

dtype
input. Set the op to dtype precision.

set_op_algorithm¶

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name
input. Op name to be set.

algorithm
input. Set the op to calibration algorithm.

set_op_threshold¶

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name
input. Op name to be set.

threshold
input. Set the op to calibration threshold.

get_op_precision¶

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name
input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm¶

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name
input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold¶

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name
input. Op name to get.

Returns

The op calibration threshold.

Refitter¶

Updates weights in an engine.

Please redirect to TopsInference.create_refitterfor how to create a Refitter object.

get_all_weights¶

Get names of all weights that could be refit.

get_all_weights() -> list

Returns

A list of layer names of the weights that could be refit.

get_missing_weights¶

Get names of missing weights.

get_missing_weights() -> list

Returns

A list of layer names of the weights need to be updated.

set_named_weights¶

Specify new weights of given name.

set_named_weights(name:str, weight:Optional[numpy.ndarray,Weights])

Parameters

name
input. The name of the layer to be updated.

weight
input. The new weight to update.

get_named_weights¶

Obtain weights of given name.

get_named_weights(name:str) -> Weights

Parameters

name
input. The name of the layer to get weights.

Returns

layer weights.

refit_engine¶

Updates associated engine.

refit_engine()

Weights¶

Weights used in IRefitter.

dtype¶

The data type of the weights.

weights.dtype

size¶

The size of weights.

weights.size

nbytes¶

The bytes of weights used.

weights.nbytes

3. Appendix¶

表 3.2 Revision History¶
Version	Description	Date
V2.0	Initial version	2022.01

1. User Guide¶

1.1. Core Concepts¶

TopsInference Workflow¶

2. TopsInference API Reference¶

2.1. Attribute¶

TopsInference.__version__¶

TopsInference.KDEFAULT¶

TopsInference.KFP16¶

TopsInference.KFP16_MIX¶

TopsInference.KINT8_FP32_MIX¶

TopsInference.KREFIT¶

TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST¶

TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE¶

TopsInference.TIF_BOOL¶

TopsInference.TIF_INDEX¶

TopsInference.TIF_INT8¶

TopsInference.TIF_INT16¶

TopsInference.TIF_INT32¶

TopsInference.TIF_INT64¶

TopsInference.TIF_UINT8¶

TopsInference.TIF_UINT16¶

TopsInference.TIF_UINT32¶

TopsInference.TIF_UINT64¶

TopsInference.TIF_FP16¶

TopsInference.TIF_FP32¶

TopsInference.TIF_BF16¶

TopsInference.TIF_INVALID¶

TopsInference.CalibrationAlgoType¶

TopsInference.CalibrationAlgoType.KL_ENTROPY¶

TopsInference.CalibrationAlgoType.MAX_MIN¶

TopsInference.CalibrationAlgoType.MAX_MIN_EMA¶

TopsInference.CalibrationAlgoType.PERCENTILE¶

TopsInference.CalibrationAlgoType.INVALID¶

2.2. Fuction¶

TopsInference.create_parser¶

TopsInference.create_optimizer¶

TopsInference.load¶

TopsInference.device¶

TopsInference.set_device¶

TopsInference.release_device¶

TopsInference.create_stream¶

TopsInference.mem_alloc¶

TopsInference.mem_free¶

TopsInference.mem_h2d_copy¶

TopsInference.mem_d2h_copy¶

TopsInference.mem_h2d_copy_async¶

TopsInference.mem_d2h_copy_async¶

TopsInference.create_refitter¶

2.3. Class¶

Parser¶

read¶

read_from_str¶

load_config¶

save_config¶

set_input_names¶

set_input_dtypes¶

set_input_shapes¶

set_output_names¶

set_output_dtypes¶

Layer¶

get_type¶

get_name¶

set_precision¶

get_precision¶

reset_precision¶

Network¶

dump¶

get_layer_num¶

get_layer_by_index¶

get_layer¶

Optimizer¶

build¶

load_config¶

save_config¶

set_build_flag¶

set_max_shape_range¶

set_min_shape_range¶

set_compile_options¶

set_int8_calibrator¶

Engine¶

TopsInference.version¶