1. User Guide

1.1. Core Concepts

TopsInference Workflow

When using TopsInference, only several steps are needed:

  • Create a Parser, and read your ONNX model to get Network.

  • Create an Optimizer, and build the Network into Engine.

  • Using the created engine to do inference.

2. TopsInference API Reference

2.1. Attribute

TopsInference.__version__

The version of TopsInference.

TopsInference.__version__

TopsInference.KDEFAULT

For default precision inference. Please reference to set_build_flag.

TopsInference.KFP16

For fp16 precision inference. Please reference to set_build_flag.

TopsInference.KFP16_MIX

For fp32 and fp16 precision inference. Please reference to set_build_flag.

TopsInference.KINT8_FP32_MIX

For fp32 and int8 precision inference. Please reference to set_build_flag.

TopsInference.KREFIT

For refit an engine. Please reference to set_build_flag.

TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST

When doing inference, input and output data is host memory.

TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE

When doing inference, input and output data is GCU device memory.

TopsInference.TIF_BOOL

TopsInference DataType: bool DataType.

TopsInference.TIF_INDEX

TopsInference DataType: index DataType.

TopsInference.TIF_INT8

TopsInference DataType: int8 DataType.

TopsInference.TIF_INT16

TopsInference DataType: int16 DataType.

TopsInference.TIF_INT32

TopsInference DataType: int32 DataType.

TopsInference.TIF_INT64

TopsInference DataType: int64 DataType.

TopsInference.TIF_UINT8

TopsInference DataType: uint8 DataType.

TopsInference.TIF_UINT16

TopsInference DataType: uint16 DataType.

TopsInference.TIF_UINT32

TopsInference DataType: uint32 DataType.

TopsInference.TIF_UINT64

TopsInference DataType: uint64 DataType.

TopsInference.TIF_FP16

TopsInference DataType: fp16 DataType.

TopsInference.TIF_FP32

TopsInference DataType: fp32 DataType.

TopsInference.TIF_BF16

TopsInference DataType: bf16 DataType.

TopsInference.TIF_INVALID

TopsInference DataType: invalid DataType.

TopsInference.CalibrationAlgoType

TopsInference Calibration Algorithm Type.

TopsInference.CalibrationAlgoType.KL_ENTROPY

TopsInference Calibration Algorithm Type: ENTROPY.

TopsInference.CalibrationAlgoType.MAX_MIN

TopsInference Calibration Algorithm Type: MAX_MIN.

TopsInference.CalibrationAlgoType.MAX_MIN_EMA

TopsInference Calibration Algorithm Type: MAX_MIN_EMA.

TopsInference.CalibrationAlgoType.PERCENTILE

TopsInference Calibration Algorithm Type: PERCENTILE.

TopsInference.CalibrationAlgoType.INVALID

Invalid TopsInference Calibration Algorithm Type.

2.2. Fuction

TopsInference.create_parser

This function can be used to create a parser object according the model_type.

TopsInference.create_parser(model_type:TopsInference.ONNX_MODEL) -> Optional[parser handle, None]

Parameters

model_type

input. The model_type can only be TopsInference.ONNX_MODEL.

Returns

A parser handle, or None.

Please reference to Parser.

TopsInference.create_optimizer

An optimizer object will be created by calling this function. Then the optimizer object will be used to build the parsed Network,

TopsInference.create_optimizer() -> Optional[optimizer handle]

Returns

An optimizer handle, or None.

Please reference to Optimizer.

TopsInference.load

An engine object will be created by calling this function. Then the engine object will be used to do inference.

TopsInference.load(engine_file:str) -> Optional[engine handle, None]

Parameters

engine_file

input. The engine_file saved last time when building, please see Optimizer.

Returns

An engine handle, or None. When an error occurs, it will raise an exception.

Please reference to Engine.

TopsInference.device

Use “with TopsInference.device(card_id, cluster_id):” to set device context. After execution of the with-block is finished, the set devices are released. It has same effect with TopsInference.set_deviceand TopsInference.release_device.

  1. Under multi-thread condition, each sub-thread will exclusively utilize the claimed resource if device is called within the sub-thread.

  2. if device() is called in main thread, not called in sub thread, the sub thread will share the cluster resources claimed by main thread.

  3. if both main thread and sub thread claimed resource with device(), the resource claimed by sub thread is used within sub thread.

  4. if some sub threads claims resource with device(), some does not, each sub thread individually follow the above rule 3 and 2 based on its resource claiming status.

TopsInference.device(card_id:int, cluster_id:Optional[int,list]) -> device_handle

Parameters

card_id

input. The card id.

cluster_id

input. The cluster id. Now, the maximum number is 6.

Generally, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.

i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.

cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.

Returns

A Device handle.

Attention

This handle is only effective in current scope, and it will be released automatically after jumping out of the scope.So you must finish everything in current scope. A same device can only be set once, and also, you must not set any other device in current scope with nested.

TopsInference.set_device

Specify the running device until release. set_device are isolated from each other under multi-process. Scope under different thread condition, please reference to TopsInference.device.

TopsInference.set_device(card_id:int, cluster_id:[int,list]) -> device_handle

Parameters

card_id

input. The card id.

cluster_id

input. The cluster id. Now, the maximum number is 6.

Generally now, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.

i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.

cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.

Returns

A Device handle.

Attention

This handle is always effective until you call release_device to destroy it. And the device handle must be released after not using again.A same device can only be set once, and also, you must not set any other device between set_device and release_device.

demo code:

handle = TopsInference.set_device(0, 0)
# TopsInference infer code
...
TopsInference.release_device(handle)

TopsInference.release_device

TopsInference.release_device(handle:device_handle)

Parameters

handle

input. The device handle to destroy.

Attention

If the device handle is created by calling set_device, then it must be released by calling release_device.

TopsInference.create_stream

Create a Stream to support run inference with async mode. Locating all operations on the same stream, you can run them with non-blocking mode. Then, you can call synchronize to wait to finish all operations.

TopsInference.create_stream() -> stream

TopsInference.mem_alloc

Allocate buffer on device.

TopsInference.mem_alloc(size:int) -> DeviceMemory

Parameters

size

input. The allocated buffer size.

Returns

A buffer object. This object is a DeviceMemory object. When allocated buffer size exceeds the maximum device memory, an exception will occur.

TopsInference.mem_free

Free buffer on device.

TopsInference.mem_free(ptr:buffer_ptr)

Parameters

ptr

input. The allocated buffer object.

TopsInference.mem_h2d_copy

Copy buffer from host to device with sync mode.

TopsInference.mem_h2d_copy(src:numpy.ndarray, dst:DeviceMemory, size:int)

Parameters

src

input. The source buffer object, which is a numpy.ndarray.

dst

input. The destination buffer object, which is allocated by calling mem_alloc.

size

input. The copied buffer size.

TopsInference.mem_d2h_copy

Copy buffer from device to host with sync mode.

TopsInference.mem_d2h_copy(src:DeviceMemory_buffer, dst:numpy.ndarray, size:int)

Parameters

src

input. The source buffer object, which is allocated by calling mem_alloc.

dst

input. The destination buffer object, which is a numpy.ndarray.

size

input. The copied buffer size.

TopsInference.mem_h2d_copy_async

Copy buffer from host to device with async mode.

TopsInference.mem_h2d_copy_async(src:numpy.ndarray, dst:DeviceMemory, size:int, stream:stream)

Parameters

src

input. The source buffer object, which is a numpy.ndarray.

dst

input. The destination buffer object, which is allocated by calling mem_alloc.

size

input. The copied buffer size.

stream

input. The stream handle on which the copy run, when calling this function, the stream must be specified. Please reference to Stream.

TopsInference.mem_d2h_copy_async

Copy buffer from device to host with async mode. Please reference to Stream.

TopsInference.mem_d2h_copy_async(src:DeviceMemory, dst:numpy.ndarray, size:int, stream:stream)

Parameters

src

input. The source buffer object, which is allocated by calling mem_alloc.

dst

input. The destination buffer object, which is a numpy.ndarray.

size

input. The copied buffer size.

stream

input. The stream handle of the copy runs. When calling this function, the stream must be specified. Please reference to Stream.

TopsInference.create_refitter

A refitter object will be created by calling this function. Then the refitter object will be used to do refit.

TopsInference.create_refitter(engine:Engine) -> Optional[refitter handle, None]

Parameters

engine

input. The engine object to be refitted.

Returns

An Refitter handle, or None. When an error occurs, it will raise an exception.

Please reference to Refitter.

2.3. Class

Parser

Please redirect to TopsInference.create_parserfor how to create a Parser object.

read

This function reads model(ONNX), returns a * Network handle, or raise exception when the input model has wrong data or un-supported operators.

Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.

read(self, model:str) -> Optional[network handle, None]

Parameters

model

input. The model file to parse.

Returns

A network handle, or None.

read_from_str

This function reads ONNX model from strings, returns a * Network handle, or raise exception when the input model has wrong data or un-supported operators.

Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.

read_from_str(model_data:str, model_size:int) -> Optional[network handle, None]

Parameters

model_data

input. The model strings to parse.

model_size

input. The model strings length to parse.

Returns

A network handle, or None.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_input_names

Set the input names before reading model, if the model has multi input nodes, all the names should be joined in list, such as [“a”,”b”].

set_input_names(node_name:Optional[list,str])

Parameters

node_name

input. The input names list.

set_input_dtypes

Set the input data type before reading model, if the model has multi input nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].

set_input_dtypes(node_dtype:list)

Parameters

node_dtype

input. The input TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.

set_input_shapes

Set the input shapes before reading model, when there are multi inputs, shapes should be joined in list, such as [[2, 3, 4], [6, 7, 8]].

set_input_shapes(node_shape:list)

Parameters

node_shape

input. The input shape list.

set_output_names

Set the output names before reading model, if the model has multi output names, all the names should be joined in list, such as [“a”,”b”].

set_output_names(node_name:Optional[list,str])

Parameters

node_name

input. The output names list.

set_output_dtypes

Set the output data type before reading model, if the model has multi output nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].

set_output_dtypes(node_dtype:list)

Parameters

node_dtype

input. The output TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.

Attention

set_input_names, set_output_names, set_input_shapes, set_input_dtypes, set_output_dtypes are used for setting attributes for current network. They must be called before reading model when parsing.

Layer

The layer definition, which constitutes the Network.

get_type

Get the layer type.

get_type() -> layer_type

Returns

The layer type, which can be:

  • TopsInference.TIF_DECONVOLUTION, which means deconvolution layer.

  • TopsInference.TIF_CONVOLUTION, which means convolution layer.

  • TopsInference.TIF_UNARY, which means unaryop operation layer.

  • TopsInference.TIF_TRANSCENDENTAL, which means transcendental layer.

  • TopsInference.TIF_ELEMENTWISE, which means elementwise opration layer.

  • TopsInference.TIF_SELECT, which means select layer.

  • TopsInference.TIF_POOLING, which means pooling layer.

  • TopsInference.TIF_BATCHNORM, which means batch normalization layer.

  • TopsInference.TIF_CONVERT, which means convert layer for converting between different data precision.

  • TopsInference.TIF_CONCAT, which means concat layer.

  • TopsInference.TIF_CONSTANT, which means constant layer.

  • TopsInference.TIF_SHUFFLE, which means shuffle layer.

  • TopsInference.TIF_ACTIVATION, which means activation layer.

  • TopsInference.TIF_ORDER, which means layer for sorting by a certain rule.

  • TopsInference.TIF_RNN, which means rnn layer.

  • TopsInference.TIF_GATHER, which means gather layer.

  • TopsInference.TIF_MATMUL, which means matmul layer.

  • TopsInference.TIF_COMPARE, which means compare layer.

  • TopsInference.TIF_CONDITION, which means condition layer.

  • TopsInference.TIF_NMS, which means non maximum suppression layer.

  • TopsInference.TIF_PAD, which means padding layer.

  • TopsInference.TIF_RANDOM, which means random generator layer.

  • TopsInference.TIF_REDUCE, which means reduce layer.

  • TopsInference.TIF_RESHAPE, which means reshape layer.

  • TopsInference.TIF_RESIZE, which means resize layer.

  • TopsInference.TIF_ROIALIGN, which means roi align layer, used in faster rcnn and r-fcn, etc.

  • TopsInference.TIF_SCATTER, which means scatter layer.

  • TopsInference.TIF_SIGMOID, which means sigmoid layer.

  • TopsInference.TIF_SLICE, which means slice layer.

  • TopsInference.TIF_TOPK, which means topk layer.

  • TopsInference.TIF_TRANSPOSE, which means transpose layer.

  • TopsInference.TIF_LOG_SOFTMAX, which means log softmax layer.

  • TopsInference.TIF_MVN, which means mean-variance normalization layer.

  • TopsInference.TIF_SOFTMAX, which means softmax layer.

  • TopsInference.TIF_UNKNOWN, which means unknown layer.

get_name

Get the layer name.

get_name() -> str

Returns

The layer name, the default name is “”.

set_precision

Set the layer precision. Works only in case of TopsInference.KFP16_MIX or TopsInference.KINT8_FP32_MIX.

set_precision(precision:Optional[TopsInference.TIF_FP32, TopsInference.TIF_FP16, TopsInference.TIF_INT8])

Parameters

precision
precision. The layer precision to be set.

In TopsInference.KFP16_MIX mode:

  • TopsInference.TIF_FP32

  • TopsInference.TIF_FP16

In TopsInference.KINT8_FP32_MIX mode:

  • TopsInference.TIF_FP32

  • TopsInference.TIF_INT8

get_precision

Get the layer precision.

get_precision() -> DataType

Returns

The layer precision.

reset_precision

Reset the layer precision to default precision.

reset_precision()

Network

The internal representation for ONNX model, please see ‘read’ function in Parser.

dump

Dump the network structure for debugging, the result will be printed to current terminal window.

dump()

get_layer_num

Get the layer number in current network.

get_layer_num() -> int

Returns

The layer number.

get_layer_by_index

Get the layer according to the index, the index must be less than the layer number.

get_layer_by_index(index:int) -> layer handle

Parameters

index

index. The index for getting layer, the index must be less than the layer number.

Returns

The Layer handle.

get_layer

Get the layer according to the layer name.

get_layer(name:str) -> layer handle

Parameters

name

input. The layer name.

Returns

The Layer handle.

Optimizer

Please redirect to TopsInference.create_optimizer for how to create an Optimizer object.

build

This function builds from a network to engine, which will be used to do inference.

build(network:network) -> Optional[engine handle, None]

Parameters

network

input. An internal model representation read by Parser .

Returns

An Engine handle, or None. When an error occurs, it will raise an exception.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_build_flag

This function sets a flag when building engine. A building flag is used to assign some features of current engine.

  • TopsInference.KDEFAULT For default model precision inference.

  • TopsInference.KFP16_MIX For fp16 & fp32 mixed precision inference.

  • TopsInference.KINT8_FP32_MIX For int8 & fp32 mixed precision inference.

  • TopsInference.KFP16 For fp16 precision inference.

  • TopsInference.KREFIT For enable refit an engine.

set_build_flag(flag:Optional[TopsInference.KFP16_MIX,TopsInference.KINT8_FP32_MIX,
                             TopsInference.KFP16,TopsInference.KDEFAULT,
                             TopsInference.KREFIT])

Parameters

flag

input. A building flag.

set_max_shape_range

Setting max shape of dynamic shape model input.

set_max_shape_range(max_shape_dims:list)

Parameters

max_shape_dims

input. A json list, when setting model max input shape, the json key must be “main”, and the length must be equal to the number of model input.

demo code:

max_shape_dim_setting = []
max_shape_dim = {}
max_shape_dim["main"] = [[100, 1, 900], [100, 1, 100]]
max_shape_dim_setting.append(max_shape_dim)
optimizer.set_max_shape_range(max_shape_dim_setting)

set_min_shape_range

Setting min shape of dynamic shape model input.

set_min_shape_range(min_shape_dims:list)

Parameters

min_shape_dims

input. A json list, when setting model min input shape, the json key must be “main”, and the length must be equal to the number of model input.

demo code:

min_shape_dim_setting = []
min_shape_dim = {}
min_shape_dim["main"] = [[100, 1, 900], [100, 1, 1]]
min_shape_dim_setting.append(min_shape_dim)
optimizer.set_min_shape_range(min_shape_dim_setting)

set_compile_options

set optimizer compile option.

set_compile_options(options:dict)

Parameters

arg options

dict of compile option.

demo code:

compile_options = {}
compile_options["max_dim_size"] = "65536"
compile_options['resource_mode'] = '1c12s'
optimizer = TopsInference.create_optimizer()
optimizer.set_compile_options(compile_options)

set_int8_calibrator

Setting int8 calibrator in KINT8_FP32_MIX mode.

set_int8_calibrator(calibrator:ICalibrator)

Parameters

calibrator

input. An Object of ICalibrator(IInt8EntropyCalibrator/IInt8MaxMinCalibrator/ IInt8MaxMinEMACalibrator/IInt8PercentCalibrator).

Engine

An Engine handle can be created by Optimizer , it can also be created by loading an existed engine file, please reference to TopsInference.load .

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

save_executable

This function saves an engine to local disk, which can be used for next time by loading.

save_executable(engine_file:str)

Parameters

engine_file

input. An engine file name to save.

demo code:

engine.save_executable("/path/to/you/file")
engine = TopsInference.load("/path/to/you/file")

run

This function can be used for doing inference.

run(input_tensor_list:list,
    output_tensor_list:list,
    buffer_type:Optional[TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST,TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE],
    py_stream=None:stream)

Parameters

input_tensor_list

input. Input tensor list.

output_tensor_list

input. Output tensor list.

buffer_type

input. Buffer type for input and output.

  • TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST indicates that input buffer and output buffer are on host.

  • TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE indicates that input buffer and output buffer are on device.

Attention

Now, mixed buffer type is not supported, e.g. IN_HOST_OUT_DEVICE or IN_DEVICE_OUT_HOST. When using host buffer type, the input or output buffer should be a numpy.ndarray.

py_stream

input. Used to do inference with async mode.

The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

When doing inference with buf_type equal to IN_HOST_OUT_HOST, the async mode is not supported now, it means that you must keep py_stream None as input.

run_with_batch

Mulit threads inference with different batches. User can use the run_with_batch to inference dynamic batch with the batch_size of any size specified. This method will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.

run_with_batch(sample_nums:int, input_list:list, **kwargs:[output_list,py_stream,buffer_type]) -> future

Parameters

sample_nums

input. The number of sample.

input_list

input. A list consturct by input.

kwargs

input. A map, and the key include such as output_list/py_stream/buffer_type.

if you create outputs before run_with_batch, you should set output_list=outputs.

if output_list is None, will auto allocate output_list.

py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

if you want to use D2D mode, buffer_type should set buffer_type=TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE.

Returns

A future object. Please reference to Future .

runV2

Inference with specified device(cluster) and dynamic shape model.

In static shape mode, runV2 also support dynamic batch. It will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.

runV2(input_list:list, **kwargs:[output_list,py_stream]) -> future

Parameters

input_list

input. A tensor or list consturct by input.

kwargs

input. A map, and the key include such as key output_list/py_stream.

if you create outputs before runV2, you should set output_list=outputs.

if output_list is None, will auto allocate output_list.

py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .

Please reference to Stream .

Returns

A future object. Please reference to Future .

get_input_num

Get the input number of engine.

get_input_num() -> int

Returns

The net input number.

get_output_num

Get the output number of engine.

get_output_num() -> int

Returns

The net output number.

get_max_input_shape

Get the index-th input maximum shape(numpy.array) of engine.

get_max_input_shape(index:int) -> numpy.array

Parameters

index

index. The index for input, the index must be less than the input number.

Returns

The index-th input maximum shape.

get_max_output_shape

Get the index-th output maximum shape(numpy.array) of engine.

get_max_output_shape(index:int) -> numpy.array

Parameters

index

index. The index for output, the index must be less than the output number.

Returns

The index-th output maximum shape.

get_min_input_shape

Get the index-th input minimum shape(numpy.array) of engine.

get_min_input_shape(index:int) -> numpy.array

Parameters

index

index. The index for input, the index must be less than the input number.

Returns

The index-th input minimum shape.

get_input_shape

Get the index-th input shape(numpy.array) of engine.

get_input_shape(index:int) -> numpy.array

Parameters

index

index. The index for input, the index must be less than the input number.

Returns

The index-th real input shape.

get_output_shape

Get the index-th output shape(numpy.array) of engine.

get_output_shape(index:int) -> numpy.array

Parameters

index

index. The index for output, the index must be less than the output number.

Returns

The index-th real output shape.

get_input_dtype

Get the index-th input data type(TopsInference.DataType) of engine.

get_input_dtype(index:int) -> TopsInference.DataType

Parameters

index

index. The index for input, the index must be less than the input number.

Returns

The index-th input type.

get_output_dtype

Get the index-th output data type(TopsInference.DataType) of engine.

get_output_dtype(index:int) -> TopsInference.DataType

Parameters

index

index. The index for output, the index must be less than the output number.

Returns

The index-th output type.

get_input_name

Get the index-th input name(ONNX name) of engine.

get_input_name(index:int) -> str

Parameters

index

index. The index for input, the index must be less than the input number.

Returns

The index-th input layer name.

get_output_name

Get the index-th output name(ONNX name) of engine.

get_output_name(index:int) -> str

Parameters

index

index. The index for output, the index must be less than the output number.

Returns

The index-th output layer name.

get_device_memory_size

Get the memory size of gcu device runtime required.

get_device_memory_size() -> int

Returns

The engine’s memory size required. Return 0 if fail to get memory size.

Device

This class is used for Optimizer , Engine .

Before create_optimizer or load engine, the device info must be initialized/released by calling below functions:

Stream

This class is used for running with async mode. Please redirect to TopsInference.create_stream for how to create a Stream object.

synchronize

When you want to execute several operations on the same stream, you can call synchronize at the end of the last operation until all the operations have finished.

synchronize()

Future

Future provides a mechanism to access the result of asynchronous operations.

release

Release future.

release()

get

Get output data.

get() -> list[numpy.ndarray]

Returns

The output data.

status

Get output data status.

status() -> bool

Returns

If output data is ready, return true, otherwise return false.

wait

Wait until the output data is ready.

wait()

DeviceMemory

The GCU device memory buffer definition.

get_real_size

Get the device memory buffer real size.

get_real_size() -> int

set_shape

Set the shape of device memory buffer.

set_shape(shape:list) -> bool

get_real_shape

Get the device memory buffer real shape.

get_real_shape() -> list

ICalibrator

In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.

get_batch_size

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names

input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache

input. The calibration cache to write.

IInt8EntropyCalibrator

In KINT8_FP32_MIX mode, Entropy calibration chooses the tensor’s scale factor to optimize the quantized tensor’s information-theoretic content, and usually suppresses outliers in the distribution.

get_batch_size

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names

input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache

input. The calibration cache to write.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_op_precision

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name

input. Op name to be set.

dtype

input. Set the op to dtype precision.

set_op_algorithm

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name

input. Op name to be set.

algorithm

input. Set the op to calibration algorithm.

set_op_threshold

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name

input. Op name to be set.

threshold

input. Set the op to calibration threshold.

get_op_precision

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name

input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name

input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name

input. Op name to get.

Returns

The op calibration threshold.

IInt8MaxMinCalibrator

In KINT8_FP32_MIX mode, compared with max-min calibrator, this algorithm uses the ema-scale to adjust the threshold value.

get_batch_size

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names

input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache

input. The calibration cache to write.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_op_precision

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name

input. Op name to be set.

dtype

input. Set the op to dtype precision.

set_op_algorithm

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name

input. Op name to be set.

algorithm

input. Set the op to calibration algorithm.

set_op_threshold

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name

input. Op name to be set.

threshold

input. Set the op to calibration threshold.

get_op_precision

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name

input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name

input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name

input. Op name to get.

Returns

The op calibration threshold.

IInt8MaxMinEMACalibrator

In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.

get_batch_size

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names

input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache

input. The calibration cache to write.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_op_precision

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name

input. Op name to be set.

dtype

input. Set the op to dtype precision.

set_op_algorithm

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name

input. Op name to be set.

algorithm

input. Set the op to calibration algorithm.

set_op_threshold

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name

input. Op name to be set.

threshold

input. Set the op to calibration threshold.

get_op_precision

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name

input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name

input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name

input. Op name to get.

Returns

The op calibration threshold.

IInt8PercentCalibrator

In KINT8_FP32_MIX mode, this algorithm uses histogram percentile value as threshold value.

get_batch_size

Get the batch size used for calibration batches.

get_batch_size() -> int

get_algorithm

Get the algorithm used by this calibrator.

get_algorithm() -> CalibrationAlgoType

get_batch

Get a batch of input for calibration. The batch size of the input must match the batch size.

get_batch(names:list) -> list

Parameters

names

input. The names of the network inputs for each object in the bindings array.

Returns

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.

read_calibration_cache

Load a calibration cache. Reading a cache is just like reading any other file in Python.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

read_calibration_cache() -> Optional[cache object, None]

Returns

A cache object or None if there is no data.

demo code:

def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()

write_calibration_cache

Save a calibration cache. Writing a cache is just like writing any other buffer in Python.

write_calibration_cache(cache)

Parameters

cache

input. The calibration cache to write.

load_config

Loading parameters from a config file, which is saved before by calling save_config.

load_config(config_file:str)

Parameters

config_file

input. The config file to load.

save_config

Saving a config file to disk, for using conveniently next time by calling load_config.

save_config(config_file:str)

Parameters

config_file

input. The config file to save.

set_op_precision

Set the op precision used for calibration.

set_op_precision(op_name:str, dtype:TopsInference.DataType)

Parameters

op_name

input. Op name to be set.

dtype

input. Set the op to dtype precision.

set_op_algorithm

Set the op algorithm used for calibration.

set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)

Parameters

op_name

input. Op name to be set.

algorithm

input. Set the op to calibration algorithm.

set_op_threshold

Set the op threshold used for calibration.

set_op_threshold(op_name:str, threshold:float)

Parameters

op_name

input. Op name to be set.

threshold

input. Set the op to calibration threshold.

get_op_precision

Get the op precision used for calibration.

get_op_precision(op_name:str) -> TopsInference.DataType

Parameters

op_name

input. Op name to get.

Returns

The op dtype precision.

get_op_algorithm

Get the op algorithm used for calibration.

get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType

Parameters

op_name

input. Op name to get.

Returns

The op calibration algorithm.

get_op_threshold

Get the op threshold used for calibration.

get_op_threshold(op_name:str) -> float

Parameters

op_name

input. Op name to get.

Returns

The op calibration threshold.

Refitter

Updates weights in an engine.

Please redirect to TopsInference.create_refitterfor how to create a Refitter object.

get_all_weights

Get names of all weights that could be refit.

get_all_weights() -> list

Returns

A list of layer names of the weights that could be refit.

get_missing_weights

Get names of missing weights.

get_missing_weights() -> list

Returns

A list of layer names of the weights need to be updated.

set_named_weights

Specify new weights of given name.

set_named_weights(name:str, weight:Optional[numpy.ndarray,Weights])

Parameters

name

input. The name of the layer to be updated.

weight

input. The new weight to update.

get_named_weights

Obtain weights of given name.

get_named_weights(name:str) -> Weights

Parameters

name

input. The name of the layer to get weights.

Returns

layer weights.

refit_engine

Updates associated engine.

refit_engine()

Weights

Weights used in IRefitter.

dtype

The data type of the weights.

weights.dtype

size

The size of weights.

weights.size

nbytes

The bytes of weights used.

weights.nbytes

3. Appendix

表 3.2 Revision History

Version

Description

Date

V2.0

Initial version

2022.01