1. User Guide¶
1.1. Core Concepts¶
TopsInference Workflow¶
When using TopsInference, only several steps are needed:
2. TopsInference API Reference¶
2.1. Attribute¶
TopsInference.__version__¶
The version of TopsInference.
TopsInference.__version__
TopsInference.KDEFAULT¶
For default precision inference. Please reference to set_build_flag.
TopsInference.KFP16¶
For fp16 precision inference. Please reference to set_build_flag.
TopsInference.KFP16_MIX¶
For fp32 and fp16 precision inference. Please reference to set_build_flag.
TopsInference.KINT8_FP32_MIX¶
For fp32 and int8 precision inference. Please reference to set_build_flag.
TopsInference.KREFIT¶
For refit an engine. Please reference to set_build_flag.
TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST¶
When doing inference, input and output data is host memory.
TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE¶
When doing inference, input and output data is GCU device memory.
TopsInference.TIF_BOOL¶
TopsInference DataType: bool DataType.
TopsInference.TIF_INDEX¶
TopsInference DataType: index DataType.
TopsInference.TIF_INT8¶
TopsInference DataType: int8 DataType.
TopsInference.TIF_INT16¶
TopsInference DataType: int16 DataType.
TopsInference.TIF_INT32¶
TopsInference DataType: int32 DataType.
TopsInference.TIF_INT64¶
TopsInference DataType: int64 DataType.
TopsInference.TIF_UINT8¶
TopsInference DataType: uint8 DataType.
TopsInference.TIF_UINT16¶
TopsInference DataType: uint16 DataType.
TopsInference.TIF_UINT32¶
TopsInference DataType: uint32 DataType.
TopsInference.TIF_UINT64¶
TopsInference DataType: uint64 DataType.
TopsInference.TIF_FP16¶
TopsInference DataType: fp16 DataType.
TopsInference.TIF_FP32¶
TopsInference DataType: fp32 DataType.
TopsInference.TIF_BF16¶
TopsInference DataType: bf16 DataType.
TopsInference.TIF_INVALID¶
TopsInference DataType: invalid DataType.
TopsInference.CalibrationAlgoType¶
TopsInference Calibration Algorithm Type.
TopsInference.CalibrationAlgoType.KL_ENTROPY¶
TopsInference Calibration Algorithm Type: ENTROPY.
TopsInference.CalibrationAlgoType.MAX_MIN¶
TopsInference Calibration Algorithm Type: MAX_MIN.
TopsInference.CalibrationAlgoType.MAX_MIN_EMA¶
TopsInference Calibration Algorithm Type: MAX_MIN_EMA.
TopsInference.CalibrationAlgoType.PERCENTILE¶
TopsInference Calibration Algorithm Type: PERCENTILE.
TopsInference.CalibrationAlgoType.INVALID¶
Invalid TopsInference Calibration Algorithm Type.
2.2. Fuction¶
TopsInference.create_parser¶
This function can be used to create a parser object according the model_type.
TopsInference.create_parser(model_type:TopsInference.ONNX_MODEL) -> Optional[parser handle, None]
Parameters
- model_type
input. The model_type can only be TopsInference.ONNX_MODEL
Returns
A parser handle, or None.
Please reference to Parser.
TopsInference.create_optimizer¶
An optimizer object will be created by calling this function. Then the optimizer object will be used to build the parsed Network,
TopsInference.create_optimizer() -> Optional[optimizer handle]
Returns
An optimizer handle, or None.
Please reference to Optimizer.
TopsInference.load¶
An engine object will be created by calling this function. Then the engine object will be used to do inference.
TopsInference.load(engine_file:str) -> Optional[engine handle, None]
Parameters
- engine_file
input. The engine_file saved last time when building, please see Optimizer.
Returns
An engine handle, or None. When an error occurs, it will raise an exception.
Please reference to Engine.
TopsInference.device¶
Use “with TopsInference.device(card_id, cluster_id):” to set device context. After execution of the with-block is finished, the set devices are released. It has same effect with TopsInference.set_deviceand TopsInference.release_device.
- Under multi-thread condition, each sub-thread will exclusively utilize the claimed resource if device is called within the sub-thread. 
- if device() is called in main thread, not called in sub thread, the sub thread will share the cluster resources claimed by main thread. 
- if both main thread and sub thread claimed resource with device(), the resource claimed by sub thread is used within sub thread. 
- if some sub threads claims resource with device(), some does not, each sub thread individually follow the above rule 3 and 2 based on its resource claiming status. 
TopsInference.device(card_id:int, cluster_id:Optional[int,list]) -> device_handle
Parameters
- card_id
input. The card id.
- cluster_id
input. The cluster id. Now, the maximum number is 6.
Generally, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.
i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.
cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.
Returns
A Device handle.
Attention
This handle is only effective in current scope, and it will be released automatically after jumping out of the scope.So you must finish everything in current scope. A same device can only be set once, and also, you must not set any other device in current scope with nested.
TopsInference.set_device¶
Specify the running device until release. set_device are isolated from each other under multi-process. Scope under different thread condition, please reference to TopsInference.device.
TopsInference.set_device(card_id:int, cluster_id:[int,list]) -> device_handle
Parameters
- card_id
input. The card id.
- cluster_id
input. The cluster id. Now, the maximum number is 6.
Generally now, i10 cluster_id is a list with max length 4, it can be [0], [0, 1], [0, 1, 2, 3] or any other range from 0 to 4.
i20 cluster_id is a list with max length 6, it can be [0], [0, 1], [0, 1, 2, 3, 4, 5] or any other range from 0 to 6.
cluster_id can also be conveniently set to -1 to delegate [0, 1, 2, 3, 4, 5], when you need to run inference with 6 clusters.
Returns
A Device handle.
Attention
This handle is always effective until you call release_device to destroy it. And the device handle must be released after not using again.A same device can only be set once, and also, you must not set any other device between set_device and release_device.
demo code:
handle = TopsInference.set_device(0, 0)
# TopsInference infer code
...
TopsInference.release_device(handle)
TopsInference.release_device¶
TopsInference.release_device(handle:device_handle)
Parameters
- handle
input. The device handle to destroy.
Attention
If the device handle is created by calling set_device, then it must be released by calling release_device.
TopsInference.create_stream¶
Create a Stream to support run inference with async mode. Locating all operations on the same stream, you can run them with non-blocking mode. Then, you can call synchronize to wait to finish all operations.
TopsInference.create_stream() -> stream
TopsInference.mem_alloc¶
Allocate buffer on device.
TopsInference.mem_alloc(size:int) -> DeviceMemory
Parameters
- size
input. The allocated buffer size.
Returns
A buffer object. This object is a DeviceMemory object. When allocated buffer size exceeds the maximum device memory, an exception will occur.
TopsInference.mem_free¶
Free buffer on device.
TopsInference.mem_free(ptr:buffer_ptr)
Parameters
- ptr
input. The allocated buffer object.
TopsInference.mem_h2d_copy¶
Copy buffer from host to device with sync mode.
TopsInference.mem_h2d_copy(src:numpy.ndarray, dst:DeviceMemory, size:int)
Parameters
- src
input. The source buffer object, which is a numpy.ndarray.
- dst
input. The destination buffer object, which is allocated by calling mem_alloc.
- size
input. The copied buffer size.
TopsInference.mem_d2h_copy¶
Copy buffer from device to host with sync mode.
TopsInference.mem_d2h_copy(src:DeviceMemory_buffer, dst:numpy.ndarray, size:int)
Parameters
- src
input. The source buffer object, which is allocated by calling mem_alloc.
- dst
input. The destination buffer object, which is a numpy.ndarray.
- size
input. The copied buffer size.
TopsInference.mem_h2d_copy_async¶
Copy buffer from host to device with async mode.
TopsInference.mem_h2d_copy_async(src:numpy.ndarray, dst:DeviceMemory, size:int, stream:stream)
Parameters
- src
input. The source buffer object, which is a numpy.ndarray.
- dst
input. The destination buffer object, which is allocated by calling mem_alloc.
- size
input. The copied buffer size.
- stream
input. The stream handle on which the copy run, when calling this function, the stream must be specified. Please reference to Stream.
TopsInference.mem_d2h_copy_async¶
Copy buffer from device to host with async mode. Please reference to Stream.
TopsInference.mem_d2h_copy_async(src:DeviceMemory, dst:numpy.ndarray, size:int, stream:stream)
Parameters
- src
input. The source buffer object, which is allocated by calling mem_alloc.
- dst
input. The destination buffer object, which is a numpy.ndarray.
- size
input. The copied buffer size.
- stream
input. The stream handle of the copy runs. When calling this function, the stream must be specified. Please reference to Stream.
TopsInference.create_refitter¶
A refitter object will be created by calling this function. Then the refitter object will be used to do refit.
TopsInference.create_refitter(engine:Engine) -> Optional[refitter handle, None]
Parameters
- engine
input. The engine object to be refitted.
Returns
An Refitter handle, or None. When an error occurs, it will raise an exception.
Please reference to Refitter.
2.3. Class¶
Parser¶
Please redirect to TopsInference.create_parserfor how to create a Parser object.
read¶
This function reads model(ONNX), returns a * Network handle,
or raise exception when the input model has wrong data or un-supported operators.
Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.
read(self, model:str) -> Optional[network handle, None]
Parameters
- model
input. The model file to parse.
Returns
A network handle, or None.
read_from_str¶
This function reads ONNX model from strings, returns a * Network handle,
or raise exception when the input model has wrong data or un-supported operators.
Before reading from model, set_input_names, set_input_dtypes, set_input_shapes, set_output_names, set_output_dtypes should be called to set relative attributes for building current network.
read_from_str(model_data:str, model_size:int) -> Optional[network handle, None]
Parameters
- model_data
input. The model strings to parse.
- model_size
input. The model strings length to parse.
Returns
A network handle, or None.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_input_names¶
Set the input names before reading model, if the model has multi input nodes, all the names should be joined in list, such as [“a”,”b”].
set_input_names(node_name:Optional[list,str])
Parameters
- node_name
input. The input names list.
set_input_dtypes¶
Set the input data type before reading model, if the model has multi input nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].
set_input_dtypes(node_dtype:list)
Parameters
- node_dtype
input. The input TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.
set_input_shapes¶
Set the input shapes before reading model, when there are multi inputs, shapes should be joined in list, such as [[2, 3, 4], [6, 7, 8]].
set_input_shapes(node_shape:list)
Parameters
- node_shape
input. The input shape list.
set_output_names¶
Set the output names before reading model, if the model has multi output names, all the names should be joined in list, such as [“a”,”b”].
set_output_names(node_name:Optional[list,str])
Parameters
- node_name
input. The output names list.
set_output_dtypes¶
Set the output data type before reading model, if the model has multi output nodes, all the data types should be joined in list, such as [TopsInference.TIF_FP32,TopsInference.TIF_FP32].
set_output_dtypes(node_dtype:list)
Parameters
- node_dtype
input. The output TopsInference DataType list. Including TopsInference.TIF_FP32 and TopsInference.TIF_FP16 etc.
Attention
set_input_names, set_output_names, set_input_shapes, set_input_dtypes, set_output_dtypes are used for setting attributes for current network. They must be called before reading model when parsing.
Layer¶
The layer definition, which constitutes the Network.
get_type¶
Get the layer type.
get_type() -> layer_type
Returns
The layer type, which can be:
TopsInference.TIF_DECONVOLUTION, which means deconvolution layer.
TopsInference.TIF_CONVOLUTION, which means convolution layer.
TopsInference.TIF_UNARY, which means unaryop operation layer.
TopsInference.TIF_TRANSCENDENTAL, which means transcendental layer.
TopsInference.TIF_ELEMENTWISE, which means elementwise opration layer.
TopsInference.TIF_SELECT, which means select layer.
TopsInference.TIF_POOLING, which means pooling layer.
TopsInference.TIF_BATCHNORM, which means batch normalization layer.
TopsInference.TIF_CONVERT, which means convert layer for converting between different data precision.
TopsInference.TIF_CONCAT, which means concat layer.
TopsInference.TIF_CONSTANT, which means constant layer.
TopsInference.TIF_SHUFFLE, which means shuffle layer.
TopsInference.TIF_ACTIVATION, which means activation layer.
TopsInference.TIF_ORDER, which means layer for sorting by a certain rule.
TopsInference.TIF_RNN, which means rnn layer.
TopsInference.TIF_GATHER, which means gather layer.
TopsInference.TIF_MATMUL, which means matmul layer.
TopsInference.TIF_COMPARE, which means compare layer.
TopsInference.TIF_CONDITION, which means condition layer.
TopsInference.TIF_NMS, which means non maximum suppression layer.
TopsInference.TIF_PAD, which means padding layer.
TopsInference.TIF_RANDOM, which means random generator layer.
TopsInference.TIF_REDUCE, which means reduce layer.
TopsInference.TIF_RESHAPE, which means reshape layer.
TopsInference.TIF_RESIZE, which means resize layer.
TopsInference.TIF_ROIALIGN, which means roi align layer, used in faster rcnn and r-fcn, etc.
TopsInference.TIF_SCATTER, which means scatter layer.
TopsInference.TIF_SIGMOID, which means sigmoid layer.
TopsInference.TIF_SLICE, which means slice layer.
TopsInference.TIF_TOPK, which means topk layer.
TopsInference.TIF_TRANSPOSE, which means transpose layer.
TopsInference.TIF_LOG_SOFTMAX, which means log softmax layer.
TopsInference.TIF_MVN, which means mean-variance normalization layer.
TopsInference.TIF_SOFTMAX, which means softmax layer.
TopsInference.TIF_UNKNOWN, which means unknown layer.
get_name¶
Get the layer name.
get_name() -> str
Returns
The layer name, the default name is “”.
set_precision¶
Set the layer precision. Works only in case of TopsInference.KFP16_MIX or TopsInference.KINT8_FP32_MIX.
set_precision(precision:Optional[TopsInference.TIF_FP32, TopsInference.TIF_FP16, TopsInference.TIF_INT8])
Parameters
- precision
- precision. The layer precision to be set.
In TopsInference.KFP16_MIX mode - TopsInference.TIF_FP32 - TopsInference.TIF_FP16 In TopsInference.KFP16_MIX mode - TopsInference.TIF_FP32 - TopsInference.TIF_INT8
get_precision¶
Get the layer precision.
get_precision() -> DataType
Returns
The layer precision.
reset_precision¶
Reset the layer precision to default precision.
reset_precision()
Network¶
The internal representation for ONNX model model, please see ‘read’ function in Parser
dump¶
Dump the network structure for debugging, the result will be printed to current terminal window.
dump()
get_layer_num¶
Get the layer number in current network.
get_layer_num() -> int
Returns
The layer number.
get_layer_by_index¶
Get the layer according to the index, the index must be less than the layer number.
get_layer_by_index(index:int) -> layer handle
Parameters
- index
index. The index for getting layer, the index must be less than the layer number.
Returns
The Layer handle.
get_layer¶
Get the layer according to the layer name.
get_layer(name:str) -> layer handle
Parameters
- name
input. The layer name.
Returns
The Layer handle.
Optimizer¶
Please redirect to TopsInference.create_optimizer for how to create an Optimizer object.
build¶
This function builds from a network to engine, which will be used to do inference.
build(network:network) -> Optional[engine handle, None]
Parameters
- network
input. An internal model representation read by Parser .
Returns
An Engine handle, or None. When an error occurs, it will raise an exception.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_build_flag¶
This function sets a flag when building engine. A building flag is used to assign some features of current engine.
TopsInference.KDEFAULT For default model precision inference.
TopsInference.KFP16_MIX For fp16 & fp32 mixed precision inference.
TopsInference.KINT8_FP32_MIX For int8 & fp32 mixed precision inference.
TopsInference.KFP16 For fp16 precision inference.
TopsInference.KREFIT For enable refit an engine.
set_build_flag(flag:Optional[TopsInference.KFP16_MIX,TopsInference.KINT8_FP32_MIX,
                             TopsInference.KFP16,TopsInference.KDEFAULT,
                             TopsInference.KREFIT])
Parameters
- flag
input. A building flag.
set_max_shape_range¶
Setting max shape of dynamic shape model input.
set_max_shape_range(max_shape_dims:list)
Parameters
- max_shape_dims
input. A json list, when setting model max input shape, the json key must be “main”, and the length must be equal to the number of model input.
demo code:
max_shape_dim_setting = []
max_shape_dim = {}
max_shape_dim["main"] = [[100, 1, 900], [100, 1, 100]]
max_shape_dim_setting.append(max_shape_dim)
optimizer.set_max_shape_range(max_shape_dim_setting)
set_min_shape_range¶
Setting min shape of dynamic shape model input.
set_min_shape_range(min_shape_dims:list)
Parameters
- min_shape_dims
input. A json list, when setting model min input shape, the json key must be “main”, and the length must be equal to the number of model input.
demo code:
min_shape_dim_setting = []
min_shape_dim = {}
min_shape_dim["main"] = [[100, 1, 900], [100, 1, 1]]
min_shape_dim_setting.append(min_shape_dim)
optimizer.set_min_shape_range(min_shape_dim_setting)
set_compile_options¶
set optimizer compile option.
set_compile_options(options:dict)
Parameters
- arg options
- dict of compile option. 
demo code:
compile_options = {}
compile_options["max_dim_size"] = "65536"
compile_options['resource_mode'] = '1c12s'
optimizer = TopsInference.create_optimizer()
optimizer.set_compile_options(compile_options)
set_int8_calibrator¶
Setting int8 calibrator in KINT8_FP32_MIX mode.
set_int8_calibrator(calibrator:ICalibrator)
Parameters
- calibrator
input. An Object of ICalibrator(IInt8EntropyCalibrator/IInt8MaxMinCalibrator/IInt8MaxMinEMACalibrator/IInt8PercentCalibrator).
Engine¶
An Engine handle can be created by Optimizer , it can also be created by loading an existed engine file, please reference to TopsInference.load .
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
save_executable¶
This function saves an engine to local disk, which can be used for next time by loading.
save_executable(engine_file:str)
Parameters
- engine_file
input. An engine file name to save.
demo code:
engine.save_executable("/path/to/you/file")
engine = TopsInference.load("/path/to/you/file")
run¶
This function can be used for doing inference.
run(input_tensor_list:list,
    output_tensor_list:list,
    buffer_type:Optional[TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST,TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE],
    py_stream=None:stream)
Parameters
- input_tensor_list
input. Input tensor list.
- output_tensor_list
input. Output tensor list.
- buffer_type
input. Buffer type for input and output.
TopsInference.TIF_ENGINE_RSC_IN_HOST_OUT_HOST indicates that input buffer and output buffer are on host.
TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE indicates that input buffer and output buffer are on device.
Attention
Now, mixed buffer type is not supported, e.g. IN_HOST_OUT_DEVICE or IN_DEVICE_OUT_HOST. When using host buffer type, the input or output buffer should be a numpy.ndarray.
- py_stream
input. Used to do inference with async mode.
The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .
Please reference to Stream .
When doing inference with buf_type equal to IN_HOST_OUT_HOST, the async mode is not supported now, it means that you must keep py_stream None as input.
run_with_batch¶
Mulit threads inference with different batches. User can use the run_with_batch to inference dynamic batch with the batch_size of any size specified. This method will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.
run_with_batch(sample_nums:int, input_list:list, **kwargs:[output_list,py_stream,buffer_type]) -> future
Parameters
- sample_nums
input. The number of sample.
- input_list
input. A list consturct by input.
- kwargs
input. A map, and the key include such as output_list/py_stream/buffer_type.
if you create outputs before run_with_batch, you should set output_list=outputs.
if output_list is None, will auto allocate output_list.
py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .
Please reference to Stream .
if you want to use D2D mode, buffer_type should set buffer_type=TopsInference.TIF_ENGINE_RSC_IN_DEVICE_OUT_DEVICE.
Returns
A future object. Please reference to Future .
runV2¶
Inference with specified device(cluster) and dynamic shape model.
In static shape mode, runV2 also support dynamic batch. It will automatically split or merge the input data along the zero-th axis (the batch direction) into N parts through the enqueue operation. N parts of data are stored in the queue and concurrently inferred by a resource pool with M Clusters at the same time, and the results are automatically spliced after the inference is completed.
runV2(input_list:list, **kwargs:[output_list,py_stream]) -> future
Parameters
- input_list
input. A tensor or list consturct by input.
- kwargs
input. A map, and the key include such as key output_list/py_stream.
if you create outputs before runV2, you should set output_list=outputs.
if output_list is None, will auto allocate output_list.
py_stream: The default value is None when run with sync mode. A stream can be created by TopsInference.create_stream .
Please reference to Stream .
Returns
A future object. Please reference to Future .
get_input_num¶
Get the input number of engine.
get_input_num() -> int
Returns
The net input number.
get_output_num¶
Get the output number of engine.
get_output_num() -> int
Returns
The net output number.
get_max_input_shape¶
Get the index-th input maximum shape(numpy.array) of engine.
get_max_input_shape(index:int) -> numpy.array
Parameters
- index
index. The index for input, the index must be less than the input number.
Returns
The index-th input maximum shape.
get_max_output_shape¶
Get the index-th output maximum shape(numpy.array) of engine.
get_max_output_shape(index:int) -> numpy.array
Parameters
- index
index. The index for output, the index must be less than the output number.
Returns
The index-th output maximum shape.
get_min_input_shape¶
Get the index-th input minimum shape(numpy.array) of engine.
get_min_input_shape(index:int) -> numpy.array
Parameters
- index
index. The index for input, the index must be less than the input number.
Returns
The index-th input minimum shape.
get_input_shape¶
Get the index-th input shape(numpy.array) of engine.
get_input_shape(index:int) -> numpy.array
Parameters
- index
index. The index for input, the index must be less than the input number.
Returns
The index-th real input shape.
get_output_shape¶
Get the index-th output shape(numpy.array) of engine.
get_output_shape(index:int) -> numpy.array
Parameters
- index
index. The index for output, the index must be less than the output number.
Returns
The index-th real output shape.
get_input_dtype¶
Get the index-th input data type(TopsInference.DataType) of engine.
get_input_dtype(index:int) -> TopsInference.DataType
Parameters
- index
index. The index for input, the index must be less than the input number.
Returns
The index-th input type.
get_output_dtype¶
Get the index-th output data type(TopsInference.DataType) of engine.
get_output_dtype(index:int) -> TopsInference.DataType
Parameters
- index
index. The index for output, the index must be less than the output number.
Returns
The index-th output type.
get_input_name¶
Get the index-th input name(ONNX name) of engine.
get_input_name(index:int) -> str
Parameters
- index
index. The index for input, the index must be less than the input number.
Returns
The index-th input layer name.
get_output_name¶
Get the index-th output name(ONNX name) of engine.
get_output_name(index:int) -> str
Parameters
- index
index. The index for output, the index must be less than the output number.
Returns
The index-th output layer name.
get_device_memory_size¶
Get the memory size of gcu device runtime required.
get_device_memory_size() -> int
Returns
The engine’s memory size required. Return 0 if fail to get memory size.
Device¶
This class is used for Optimizer , Engine .
Before create_optimizer or load engine, the device info must be initialized/released by calling below functions:
or,
Stream¶
This class is used for running with async mode. Please redirect to TopsInference.create_stream for how to create a Stream object.
synchronize¶
When you want to execute several operations on the same stream, you can call synchronize at the end of the last operation until all the operations have finished.
synchronize()
Future¶
Future provides a mechanism to access the result of asynchronous operations.
release¶
Release future.
release()
get¶
Get output data.
get() -> list[numpy.ndarray]
Returns
The output data.
status¶
Get output data status.
status() -> bool
Returns
If output data is ready, return true, otherwise return false.
wait¶
Wait until the output data is ready.
wait()
DeviceMemory¶
The GCU device memory buffer definition.
get_real_size¶
Get the device memory buffer real size.
get_real_size() -> int
set_shape¶
Set the shape of device memory buffer.
set_shape(shape:list) -> bool
get_real_shape¶
Get the device memory buffer real shape.
get_real_shape() -> list
ICalibrator¶
In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.
get_batch_size¶
Get the batch size used for calibration batches.
get_batch_size() -> int
get_algorithm¶
Get the algorithm used by this calibrator.
get_algorithm() -> CalibrationAlgoType
get_batch¶
Get a batch of input for calibration. The batch size of the input must match the batch size.
get_batch(names:list) -> list
Parameters
- names
input. The names of the network inputs for each object in the bindings array.
Returns
A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.
read_calibration_cache¶
Load a calibration cache. Reading a cache is just like reading any other file in Python.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
read_calibration_cache() -> Optional[cache object, None]
Returns
A cache object or None if there is no data.
demo code:
def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()
write_calibration_cache¶
Save a calibration cache. Writing a cache is just like writing any other buffer in Python.
write_calibration_cache(cache)
Parameters
- cache
input. The calibration cache to write.
IInt8EntropyCalibrator¶
In KINT8_FP32_MIX mode, Entropy calibration chooses the tensor’s scale factor to optimize the quantized tensor’s information-theoretic content, and usually suppresses outliers in the distribution.
get_batch_size¶
Get the batch size used for calibration batches.
get_batch_size() -> int
get_algorithm¶
Get the algorithm used by this calibrator.
get_algorithm() -> CalibrationAlgoType
get_batch¶
Get a batch of input for calibration. The batch size of the input must match the batch size.
get_batch(names:list) -> list
Parameters
- names
input. The names of the network inputs for each object in the bindings array.
Returns
A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.
read_calibration_cache¶
Load a calibration cache. Reading a cache is just like reading any other file in Python.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
read_calibration_cache() -> Optional[cache object, None]
Returns
A cache object or None if there is no data.
demo code:
def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()
write_calibration_cache¶
Save a calibration cache. Writing a cache is just like writing any other buffer in Python.
write_calibration_cache(cache)
Parameters
- cache
input. The calibration cache to write.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_op_precision¶
Set the op precision used for calibration.
set_op_precision(op_name:str, dtype:TopsInference.DataType)
Parameters
- op_name
input. Op name to be set.
- dtype
input. Set the op to dtype precision.
set_op_algorithm¶
Set the op algorithm used for calibration.
set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)
Parameters
- op_name
input. Op name to be set.
- algorithm
input. Set the op to calibration algorithm.
set_op_threshold¶
Set the op threshold used for calibration.
set_op_threshold(op_name:str, threshold:float)
Parameters
- op_name
input. Op name to be set.
- threshold
input. Set the op to calibration threshold.
get_op_precision¶
Get the op precision used for calibration.
get_op_precision(op_name:str) -> TopsInference.DataType
Parameters
- op_name
input. Op name to get.
Returns
The op dtype precision.
get_op_algorithm¶
Get the op algorithm used for calibration.
get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType
Parameters
- op_name
input. Op name to get.
Returns
The op calibration algorithm.
get_op_threshold¶
Get the op threshold used for calibration.
get_op_threshold(op_name:str) -> float
Parameters
- op_name
input. Op name to get.
Returns
The op calibration threshold.
IInt8MaxMinCalibrator¶
In KINT8_FP32_MIX mode, compared with max-min calibrator, this algorithm uses the ema-scale to adjust the threshold value.
get_batch_size¶
Get the batch size used for calibration batches.
get_batch_size() -> int
get_algorithm¶
Get the algorithm used by this calibrator.
get_algorithm() -> CalibrationAlgoType
get_batch¶
Get a batch of input for calibration. The batch size of the input must match the batch size.
get_batch(names:list) -> list
Parameters
- names
input. The names of the network inputs for each object in the bindings array.
Returns
A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.
read_calibration_cache¶
Load a calibration cache. Reading a cache is just like reading any other file in Python.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
read_calibration_cache() -> Optional[cache object, None]
Returns
A cache object or None if there is no data.
demo code:
def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()
write_calibration_cache¶
Save a calibration cache. Writing a cache is just like writing any other buffer in Python.
write_calibration_cache(cache)
Parameters
- cache
input. The calibration cache to write.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_op_precision¶
Set the op precision used for calibration.
set_op_precision(op_name:str, dtype:TopsInference.DataType)
Parameters
- op_name
input. Op name to be set.
- dtype
input. Set the op to dtype precision.
set_op_algorithm¶
Set the op algorithm used for calibration.
set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)
Parameters
- op_name
input. Op name to be set.
- algorithm
input. Set the op to calibration algorithm.
set_op_threshold¶
Set the op threshold used for calibration.
set_op_threshold(op_name:str, threshold:float)
Parameters
- op_name
input. Op name to be set.
- threshold
input. Set the op to calibration threshold.
get_op_precision¶
Get the op precision used for calibration.
get_op_precision(op_name:str) -> TopsInference.DataType
Parameters
- op_name
input. Op name to get.
Returns
The op dtype precision.
get_op_algorithm¶
Get the op algorithm used for calibration.
get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType
Parameters
- op_name
input. Op name to get.
Returns
The op calibration algorithm.
get_op_threshold¶
Get the op threshold used for calibration.
get_op_threshold(op_name:str) -> float
Parameters
- op_name
input. Op name to get.
Returns
The op calibration threshold.
IInt8MaxMinEMACalibrator¶
In KINT8_FP32_MIX mode, TopsInference provides multiple different calibrators that calculate the scale in different ways.
get_batch_size¶
Get the batch size used for calibration batches.
get_batch_size() -> int
get_algorithm¶
Get the algorithm used by this calibrator.
get_algorithm() -> CalibrationAlgoType
get_batch¶
Get a batch of input for calibration. The batch size of the input must match the batch size.
get_batch(names:list) -> list
Parameters
- names
input. The names of the network inputs for each object in the bindings array.
Returns
A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.
read_calibration_cache¶
Load a calibration cache. Reading a cache is just like reading any other file in Python.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
read_calibration_cache() -> Optional[cache object, None]
Returns
A cache object or None if there is no data.
demo code:
def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()
write_calibration_cache¶
Save a calibration cache. Writing a cache is just like writing any other buffer in Python.
write_calibration_cache(cache)
Parameters
- cache
input. The calibration cache to write.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_op_precision¶
Set the op precision used for calibration.
set_op_precision(op_name:str, dtype:TopsInference.DataType)
Parameters
- op_name
input. Op name to be set.
- dtype
input. Set the op to dtype precision.
set_op_algorithm¶
Set the op algorithm used for calibration.
set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)
Parameters
- op_name
input. Op name to be set.
- algorithm
input. Set the op to calibration algorithm.
set_op_threshold¶
Set the op threshold used for calibration.
set_op_threshold(op_name:str, threshold:float)
Parameters
- op_name
input. Op name to be set.
- threshold
input. Set the op to calibration threshold.
get_op_precision¶
Get the op precision used for calibration.
get_op_precision(op_name:str) -> TopsInference.DataType
Parameters
- op_name
input. Op name to get.
Returns
The op dtype precision.
get_op_algorithm¶
Get the op algorithm used for calibration.
get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType
Parameters
- op_name
input. Op name to get.
Returns
The op calibration algorithm.
get_op_threshold¶
Get the op threshold used for calibration.
get_op_threshold(op_name:str) -> float
Parameters
- op_name
input. Op name to get.
Returns
The op calibration threshold.
IInt8PercentCalibrator¶
In KINT8_FP32_MIX mode, this algorithm uses histogram percentile value as threshold value.
get_batch_size¶
Get the batch size used for calibration batches.
get_batch_size() -> int
get_algorithm¶
Get the algorithm used by this calibrator.
get_algorithm() -> CalibrationAlgoType
get_batch¶
Get a batch of input for calibration. The batch size of the input must match the batch size.
get_batch(names:list) -> list
Parameters
- names
input. The names of the network inputs for each object in the bindings array.
Returns
A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration.
read_calibration_cache¶
Load a calibration cache. Reading a cache is just like reading any other file in Python.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not match the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
read_calibration_cache() -> Optional[cache object, None]
Returns
A cache object or None if there is no data.
demo code:
def read_calibration_cache(self):
    # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
    if os.path.exists(self.cache_file):
        with open(self.cache_file, "rb") as f:
            return f.read()
write_calibration_cache¶
Save a calibration cache. Writing a cache is just like writing any other buffer in Python.
write_calibration_cache(cache)
Parameters
- cache
input. The calibration cache to write.
load_config¶
Loading parameters from a config file, which is saved before by calling save_config.
load_config(config_file:str)
Parameters
- config_file
input. The config file to load.
save_config¶
Saving a config file to disk, for using conveniently next time by calling load_config.
save_config(config_file:str)
Parameters
- config_file
input. The config file to save.
set_op_precision¶
Set the op precision used for calibration.
set_op_precision(op_name:str, dtype:TopsInference.DataType)
Parameters
- op_name
input. Op name to be set.
- dtype
input. Set the op to dtype precision.
set_op_algorithm¶
Set the op algorithm used for calibration.
set_op_algorithm(op_name:str, algorithm:TopsInference.CalibrationAlgoType)
Parameters
- op_name
input. Op name to be set.
- algorithm
input. Set the op to calibration algorithm.
set_op_threshold¶
Set the op threshold used for calibration.
set_op_threshold(op_name:str, threshold:float)
Parameters
- op_name
input. Op name to be set.
- threshold
input. Set the op to calibration threshold.
get_op_precision¶
Get the op precision used for calibration.
get_op_precision(op_name:str) -> TopsInference.DataType
Parameters
- op_name
input. Op name to get.
Returns
The op dtype precision.
get_op_algorithm¶
Get the op algorithm used for calibration.
get_op_algorithm(op_name:str) -> TopsInference.CalibrationAlgoType
Parameters
- op_name
input. Op name to get.
Returns
The op calibration algorithm.
get_op_threshold¶
Get the op threshold used for calibration.
get_op_threshold(op_name:str) -> float
Parameters
- op_name
input. Op name to get.
Returns
The op calibration threshold.
Refitter¶
Updates weights in an engine.
Please redirect to TopsInference.create_refitterfor how to create a Refitter object.
get_all_weights¶
Get names of all weights that could be refit.
get_all_weights() -> list
Returns
A list of layer names of the weights that could be refit.
get_missing_weights¶
Get names of missing weights.
get_missing_weights() -> list
Returns
A list of layer names of the weights need to be updated.
set_named_weights¶
Specify new weights of given name.
set_named_weights(name:str, weight:Optional[numpy.ndarray,Weights])
Parameters
- name
input. The name of the layer to be updated.
- weight
input. The new weight to update.
get_named_weights¶
Obtain weights of given name.
get_named_weights(name:str) -> Weights
Parameters
- name
input. The name of the layer to get weights.
Returns
layer weights.
refit_engine¶
Updates associated engine.
refit_engine()
Weights¶
Weights used in IRefitter.
dtype¶
The data type of the weights.
weights.dtype
size¶
The size of weights.
weights.size
nbytes¶
The bytes of weights used.
weights.nbytes
3. Appendix¶
| Version | Description | Date | 
|---|---|---|
| V2.0 | Initial version | 2022.01 |