5.2. TopsIDEAS gcu debug¶

描述¶

在模型整网mismatch时，通过和onnxruntime-cpu的结果进行逐层比较，遍历并找到gcu计算错误的最小子图。

注：本工具在模型较大时可能耗时很长。

命令行¶

使用方法¶

usage: topsideas gcu debug [-h] --input_onnx INPUT_ONNX [--inputs INPUT_META [INPUT_META ...]] [--input_value_range INPUT_VALUE_RANGE [INPUT_VALUE_RANGE ...]]
                           [--fp16] [--int8] [--calibration_cache CALIBRATION_CACHE] [--device DEVICE] [--cluster CLUSTER] [--fp32_layers [FP32_LAYERS ...]]
                           [--min_shapes MIN_SHAPES [MIN_SHAPES ...]] [--max_shapes MAX_SHAPES [MAX_SHAPES ...]] [--resource_mode RESOURCE_MODE]
                           [--compile_options COMPILE_OPTIONS] [--save_path SAVE_PATH] [--rtol RTOL] [--atol ATOL] [--ntol NTOL] [--cos_sim COS_SIM]
                           [--mode {model,quick,linear}] [--log_path LOG_PATH] [--inputs_npz INPUTS_NPZ] [--seed SEED] [--try_all]

参数¶

:::{table} topsideas gcu debug 参数列表
:widths: 8 32 20 45

short	long	default	help
`-h`	`--help`		show this help message and exit
	`--input_onnx`	`None`	Provide original onnx file.
	`--inputs`	`[]`	Overwrite input shapes or data type. Format: –inputs NAME:SHAPE:DTYPE. For example: –inputs input1 input2:[1,3,224,224]:float32 input3:int32 input4:[]. If omitted, uses the current model inputs
	`--input_value_range`	`[]`	Overwrite input random value range. Format: –input_value_range NAME:[MIN,MAX]
	`--fp16`		Enable fp16 mix precision, only works when using topsinference backend.
	`--int8`		Enable int8 quantization, only works when using topsinference backend.
	`--calibration_cache`	`None`	`None`
	`--device`	`0`	Device id
	`--cluster`	`[0]`	Cluster id
	`--fp32_layers`	`None`	Set layers as fp32 (topsinference)
	`--min_shapes`	`[]`	Min input shapes. Format: –min_shapes NAME:SHAPE. For example: –min_shapes input1:[1,3,224,224] input2:[1,3,224,224]
	`--max_shapes`	`[]`	Max input shapes. Format: –max_shapes NAME:SHAPE. For example: –max_shapes input1:[1,3,224,224] input2:[1,3,224,224]
	`--resource_mode`	`None`	TopsInference compile option, see TopsInference docs for more info
	`--compile_options`	`{}`	TopsInference compile option, see TopsInference docs for more info
	`--save_path`	`./test_cases`	Path to save fail onnx subgraph test cases
	`--rtol`	`0.01`	Relative tolerance
	`--atol`	`0.01`	Absolute tolerance
	`--ntol`	`0`	Mismatch number tolerance, eg. ntol=0.01 means 1%% mismatch is allowed
	`--cos_sim`	`0`	Use cosine similarity instead of numerical tolerance
	`--mode`	`linear`	Iteration mode, choices are [‘model’, ‘quick’, ‘linear’]:’model’ means run inference on whole model and compare model output only;’quick’ means first add all tensors to output and check, then find backward from failed tensors;’linear’ means first add each node from input in dfs order until fail, then delete each node from input until pass;
	`--log_path`	`None`	Path to save inference logs
	`--inputs_npz`	`None`	Path to saved real sampleN_inputs.npz
	`--seed`	`None`	Set the seed to generate random data, defaults to None which uses time as seedWill be ignored if –inputs_npz is given
	`--try_all`		Search won’t stop when mismatch found, but until model outputs. Only supports –mode=linear and quick

:::

示例¶

topsideas gcu debug --input_onnx Inception-v1.onnx --fp16 --inputs input:[4,3,224,224] --cos_sim=0.9999

控制台会输出对onnx子图遍历、推理的过程，并显示该子图是否存在数值计算错误：
:::{figure-md} 子图数值计算错误

子图数值计算错误 :::

遍历完成后，找到的错误子图和对应的输入输出数据会被存放到文件夹中，结构如下：
:::{figure-md} 保存结构示例

保存结构示例 :::

例如这是Inception-v1.onnx的一个可能的错误子图：
:::{figure-md} 错误子图示例

错误子图示例 :::