1. 简介
TopsRider v3.3.112版本适用于S60系列设备。下述新增/修改特性和问题修复部分是相对于上一次发布TopsRider v3.2.204的变更
2. 功能优化
2.1 新增/修改基本特性
- TopsRider run包中增加libtorch_gcu 、yunchang、Tensorflow_gcu 2.9/2.13及TopsGraph2 beta版本安装包
- TopsRider run包移除Torch 2.1版本
- Tensorflow_gcu支持Tanh和Pow算子的支持
- Tensorflow_gcu支持输出内存放在device上的接口
- Tensorflow_gcu 支持session run stream 接口
- Tensorflow_gcu支持Resourse Gather op
- Xdit升级到0.3.2,支持CogVideox-5B的i2v多卡并行
- vLLM框架升级到0.6.1.post2,支持lora
- TopsCompressor GCU量化w8a16支持gptq方案
- TopsGraph2 beta版本支持 opensora模型
- ffmpeg tar包增加符号表安装包
- 默认关闭低功耗LP2
- 驱动支持OS增加TLinux 4.2
- TopsPlatform 1.2.4.9:
- SSM FW通过firewall增强对sip exception的保护
- 驱动对MQD多进程高并发场景下的锁进行优化,提升系统在多任务并发情况下的稳定性,并优化了MQD数量超限后的异常处理流程
- 驱动增加进程权限检查,避免不合法的空指针、野指针导致的内核重启
- 为避免用户使用错误,去除掉efsmi –dmon以及–pmon页面显示的Logic ID
- Runtime提供自定义异步分配能力相关API,以及event重构以提升event record相关性能
- runtime新增核函数启动过程中对资源参数的检查
- TopsProfiler新增支持-delay和-duration选项,可按需动态控制profiling启停
- TopsVisualProfiler新增时间线缩略图功能
- TopsProfiler新增支持-timeunit选项,可控制console输出的GCU事件的时间单位
- Topsprof使能operator profiling时默认不再收集kernel隐式参数信息 (如果需要该信息可使用topsprof -enable-extra-activities runtime/kernel_meta选项来使能)
2.2 新支持模型
2.2.1 LLM-W8A8
模型名称 | 框架 | 数据类型 | 卡数 |
qwen1.5-32B | vLLM 0.6.1.post2 | w8a8c8 | 2 |
llama3.1-70B | vLLM 0.6.1.post2 | w8a8c8 | 4 |
deepseek-moe-16b-base | vLLM 0.6.1.post2 | w8a8c8 | 1 |
baichuan2-13B | vLLM 0.6.1.post2 | w8a8c8 | 1 |
2.2.2 多模态
模型名称 | 框架 | 数据类型 | 卡数 |
lama3-llava-next-8b-hf | vLLM 0.6.1.post2 | FP16 | 1 |
Phi-3-vision-128k-instruct | vLLM 0.6.1.post2 | BF16 | 1 |
Qwen2-VL-7B-Instruct-GPTQ-Int4 | vLLM 0.6.1.post2 | w4a16 | 1 |
Qwen2-VL-2B-Instruct | vLLM 0.6.1.post2 | FP16 | 1 |
MiniCPM-V 2.6 | vLLM 0.6.1.post2 | FP16 | 1 |
2.2.3 视频生成
模型名称 | 框架 | 数据类型 | 卡数 |
CogVideox-5b-t2v | PyTorch2.3+xdit0.3.3 | FP16 | 1/2/4 |
CogVideox-5B-i2v 多卡并行 | PyTorch2.3+xdit0.3.3 | BF16 | 1/2 |
2.2.4 传统模型
模型名称 | 框架 | 数据类型 | 卡数 |
OCR | topsInference | FP16 | 1 |
3 API变更信息
相对于上一次Topsrider v3.2.204版本 , runtime、算子、hlir_build的API变更信息如下,具体API内容请参考对应API手册。
3.1 Runtime API 变更
3.1.1 新增API
- typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceGetDefaultMemPool
- typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceSetMemPool
3.1.2 删除API
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAlloc
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAvailableNumGet
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalFree
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalMaxNumGet
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalRead
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalWrite
3.2 算子API变更
3.2.1 新增API
- namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActTwice
- namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool3d
- namespace:topsaten TOPSATEN_EXPORT topsatenCorrcoef
- namespace:topsaten TOPSATEN_EXPORT topsatenFftC2c
- namespace:topsaten TOPSATEN_EXPORT topsatenFftR2c
- namespace:topsaten TOPSATEN_EXPORT topsatenIm2col
- namespace:topsaten TOPSATEN_EXPORT topsatenIstft
- namespace:topsaten TOPSATEN_EXPORT topsatenLinalgMatrixExp
- namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3d
- namespace:topsaten TOPSATEN_EXPORT topsatenScatterAdd
- namespace:topsaten TOPSATEN_EXPORT topsatenStft
- namespace:topsaten TOPSATEN_EXPORT topsatenThnnFusedLstmCell
- namespace:topsexts TOPSATEN_EXPORT topsextsAttnLayerFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsBottleneckNoBN
- namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusionV2
- namespace:topsexts TOPSATEN_EXPORT topsextsConv2dActivationDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConv2dDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConvAddActivationDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2DtS
- namespace:topsexts TOPSATEN_EXPORT topsextsConvScaledBiasActivation
- namespace:topsexts TOPSATEN_EXPORT topsextsFusedBiasAct
- namespace:topsexts TOPSATEN_EXPORT topsextsInplaceAbn
- namespace:topsexts TOPSATEN_EXPORT topsextsMaxPool2d
- namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAddRelu
- namespace:topsexts TOPSATEN_EXPORT topsextsMultiHeadAttentionFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsMultiLayerPerceptronFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsResizeConvBiasActivationFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsSubAddDivMulAddClampSubAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsTemperature
- namespace:topsexts TOPSATEN_EXPORT topsextsTemperatureSoftmax
- namespace:topsexts TOPSATEN_EXPORT topsextsUpfirdn2d
3.2.2 更新的API
- namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenBaddbmmActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenLinearActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenLinearQuantActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenMatmulActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenMedian
- namespace:topsaten TOPSATEN_EXPORT topsatenMultiHeadAttentionFusion
- namespace:topsaten TOPSATEN_EXPORT topsatenMultiLayerPerceptronFusion
- namespace:topsaten TOPSATEN_EXPORT topsatenNanmedian
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScatter
- namespace:topsaten TOPSATEN_EXPORT topsatenTop_k_top_p
- namespace:topsaten TOPSATEN_EXPORT topsatenTrapz
3.3 Hilr_build
3.3.1 删除API
- namespace:builder::Op AddQuant
- namespace:builder::Op AllReduce
- namespace:builder::Op AveragePoolQuant
- namespace:builder::Op ConcatQuant
- namespace:builder::Op ConvBias
- namespace:builder::Op ConvQuant
- namespace:builder::Op ConvTransposeQuant
- namespace:builder::Op DeformConv
- namespace:builder::Op DequantizeLinearQuant
- namespace:builder::Op DotGeneralBias
- namespace:builder::Op DotGeneralBiasQuant
- namespace:builder::Op GeneralSplit
- namespace:builder::Op GlobalAveragePoolQuant
- namespace:builder::Op LayerNormInference
- namespace:builder::Op MulQuant
- namespace:builder::Op PartialReduce
- namespace:builder::Op QuantConvert
- namespace:builder::Op QuantizeLinearQuant
- namespace:builder::shared_ptr SequentialMergeBuilder
- namespace:builder::Op SubQuant
4.FW信息
FW | 版本号 |
S60 SSM FW | Boot FW 33.6.5,Runtime FW 33.6.5.28 |
AP | 1.1.6 |
SP | 2.11.88 |
VPU | 3.1.5 |
5. 组件信息
5.1 TopsRider run包组件信息
- topsplatform
- TopsPlatform_1.2.4.12-ee064d_deb_amd64.run
- tgi
- ai_framework/text-generation-inference/src
- sentence-transformers
- sentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl
- vllm
- vllm-0.6.1.post2+gcu.3.2.20241230-cp38-abi3-linux_x86_64.whl
- topscompressor
- topscompressor-3.3.20241224-py3.8-none-any.whl
- topscompressor-3.3.20241224-py3.10-none-any.whl
- yunchang
- yunchang-0.3.6+gcu.3.2.20241212-py3-none-any.whl
- xfuser
- xfuser-0.3.3+gcu.3.1.20241212-py3.10-none-any.whl
- topsideas
- topsideas-3.2.20241115-cp38-cp38-linux_x86_64.whl
- topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl
- onnxruntime_gcu
- onnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl
- tops-extension
- tops_extension-3.2.20241219-cp310-cp310-linux_x86_64.whl
- tensorflow_2.13
- tensorflow_gcu-2.13.1+3.3.0-cp38-cp38-linux_x86_64.whl
- tensorflow_gcu-2.13.1+3.3.0-cp310-cp310-linux_x86_64.whl
- tensorflow_2.9
- tensorflow_gcu-2.9.0+3.3.0-cp38-cp38-linux_x86_64.whl
- tensorflow_gcu-2.9.0+3.3.0-cp310-cp310-linux_x86_64.whl
- paddle-custom-gcu
- paddle_custom_gcu-3.0.0b1+3.3.0-cp310-cp310-linux_x86_64.whl
- onnxruntime_gcu
- onnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl
- xformers
- xformers-0.0.25+gcu.3.2.20241220-cp310-cp310-linux_x86_64.whl
- tops-extension
- tops_extension-3.2.20241219-cp38-cp38-linux_x86_64.whl
- topsgraph
- topsgraph_0.1.20241124-1_amd64.deb
- topsfactor
- topsfactor_3.3.112-1_amd64.deb
- topsaten
- topsaten_3.2.20241227-1_amd64.deb
- tops-sdk
- tops-sdk_3.3.112-1_amd64.deb
- tops-inference
- tops-inference_3.3.112-1_amd64.deb
- eccl
- eccl_3.1.20241213-1_amd64.deb
- eccl-tests
- eccl-tests_3.1.20241213-1_amd64.deb
- topsgraph-py
- topsgraph-0.1.20241124-py3.10-none-any.whl
- xformers
- xformers-0.0.25+gcu.3.2.20241220-cp38-cp38-linux_x86_64.whl
- TopsInference
- TopsInference-3.3.112-py3.10-none-any.whl
- TopsInference-3.3.112-py3.8-none-any.whl
- torch-gcu-2
- torch_gcu-2.3.0_3.2.3_x86_64.run
- fast-diffusers
- fast_diffusers-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
- fast_diffusers-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
- fast-diffusers-utils
- fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
- fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
- libtorch
- ai_framework/torch_gcu/libtorch_gcu
5.2 TopsRider run 包外的组件信息
- ffmpeg-gcu
- ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64.deb
- ffmpeg-gcu-1.2.3.7-20241120-n4.4-1.x86_64.rpm
- ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64-dbgsym.ddeb
- TopsVisualProfiler
- TopsVisualProfiler_1.2.4.12-ee064d_win64.zip
- Application run
- TopsRider_i3x_3.3.112_application.run
5.3 TopsRider ddeb 包组件信息
- TopsRider_3.3.112_ddeb_amd64.run
- eccl_3.1.20241213-1_amd64-dbgsym.ddeb
- eccl-tests_3.1.20241213-1_amd64-dbgsym.ddeb
- topsaten_3.2.20241227-1_amd64-dbgsym.ddeb
- topscv_1.2.2.15-20241112-1_amd64-dbgsym.ddeb
- topsfactor_3.3.112-1_amd64-dbgsym.ddeb
- tops-inference_3.3.112-1_amd64-dbgsym.ddeb
- TopsPlatform_1.2.4.12-ee064d_ddeb_amd64.run
- tops-sdk_3.3.112-1_amd64-dbgsym.ddeb
6. 操作系统和Python支持
6.1 适配说明
- Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
- Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host
6.2 操作系统支持列表
操作系统名称 | 架构 | 内核版本 | GCC | GLIBC | 说明 |
Ubuntu 20.04.z(z<=5) | x86 | 5.4 & 5.11 & 5.13 & 5.15 | 9.3 | 2.31 | Host & Docker |
Ubuntu 22.04.z (z<=1) | x86 | 5.15 | 11.2 | 2.35 | Host & Docker |
Kylin v10 | x86 | 4.19.0 | 7.3 | 2.28 | 仅驱动在 Host 上已适配 |
UOS 20 Server | x86 | 4.19.0 | 7.3 | 2.28 | |
OpenEular | X86 | 5.10.0 | 10.3.1 | 2.34 | |
龙蜥 8.2 QU2 | X86 | 4.18.0 | 8.3.1 | 2.28 | |
龙蜥 8.6 | X86 | 4.19.90 | 7.3.0 | 2.28 | |
TLinux 4.2 | X86 | 6.6.30 | 12.3.1 | 2.38 |
6.3 Python 支持版本
Python 3.8(TopsInference 推理模型),Python 3.10
7. 文档更新
7.1 增加文档
《TopsGraph Python API参考》《paddle-custom-gcu3.0用户使用手册》《paddle-custom-gcu3.0算子支持列表》《Node Problem Detector用户使用手册》
7.2 删除文档
《torch_GCU2.1用户使用手册》《Library Kernel API参考》
8. 使用限制
ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
vllm : llama2 7b、llama2 13b、
典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert