TopsRider v3.3.112 版本适用于 S60 系列设备。下述新增/修改特性和问题修复部分是相对于上一次 发布 Topsrider v3.2.204 的变更。
2. 功能优化
2.1 新增/修改基本特性
- TopsRidervrun包中增加libtorch_gcu、yunchang、Tensorflow_gcu 2.9/2.13及TopsGraph2 beta版本安装包
- TopsRiderrun包移除Torch 2.1版本
- Tensorflow_gcu支持Tanh和Pow算子的支持
- Tensorflow_gcu支持输出内存放在device上的接口
- Tensorflow_gcu支持session run stream接口
- Tensorflow_gcu支持Resourse Gatherop
- Xdit升级到0.3.2,支持CogVideox-5B的i2v多卡并行
- vLLM框架升级到0.6.1.post2,支持lora
- TopsCompressorGCU量化w8a16支持gptq方案
- TopsGraph2beta版本支持opensora模型
- ffmpegvtar包增加符号表安装包
- TopsPlatform 1.2.4.9:
- SSM FW通过firewall增强对sipexception的保护
- 驱动对MQD多进程高并发场景下的锁进行优化,提升系统在多任务并发情况下
- 的稳定性,并优化了MQD数量超限后的异常处理流程
- 驱动增加进程权限检查,避免不合法的空指针、野指针导致的内核重启
- 为避免用户使用错误,去除掉efsmi–dmon以及–pmon页面显示的LogicID
- Runtime提供自定义异步分配能力相关API,以及event重构以提升eventrecord相关性能
- runtime新增核函数启动过程中对资源参数的检查
- TopsProfiler新增支持–delay和–duration选项,可按需动态控制profiling启停
- TopsVisualProfiler新增时间线缩略图功能
- TopsProfiler新增支持–timeunit选项,可控制console输出的GCU事件的时间单位
- Topsprof 使能 operator profiling 时默认不再收集 kernel 隐式参数信息 (如果需要 该信息可使用 topsprof –enable-extra-activities runtime/kernel_meta 选项来使能)
- 默认关闭低功耗 LP2
- 驱动支持 OS 增加 TLinux 4.2
2.2 新支持模型
2.2.1 LLM
模型名称 | 框架 | 数据类型 | 卡数 |
qwen1.5-32B | vLLM 0.6.1.post2 | w8a8c8 | 2 |
llama3.1-70B | vLLM 0.6.1.post2 | w8a8c8 | 4 |
deepseek-moe-16b-base | vLLM 0.6.1.post2 | w8a8c8 | 1 |
baichuan2-13B | vLLM 0.6.1.post2 | w8a8c8 | 1 |
2.2.2 多模态
模型名称 | 框架 | 数据类型 | 卡数 |
lama3-llava-next-8b-hf | vLLM 0.6.1.post2 | FP16 | 1 |
Phi-3-vision-128k-instruct | vLLM 0.6.1.post2 | BF16 | 1 |
Qwen2-VL-7B-Instruct-GPTQ-Int4 | vLLM 0.6.1.post2 | w4a16 | 1 |
Qwen2-VL-2B-Instruct | vLLM 0.6.1.post2 | FP16 | 1 |
MiniCPM-V 2.6 | vLLM 0.6.1.post2 | FP16 | 1 |
2.2.3 视频生成
模型名称 | 框架 | 数据类型 | |
CogVideox-5b-t2v | PyTorch2.3+xdit0.3.3 | FP16 | 1/2/4 |
CogVideox-5B-i2v 多卡并行 | PyTorch2.3+xdit0.3.3 | BF16 | 1/2 |
2.2.4 传统模型
模型名称 | 框架 | 数据类型 | 卡数 |
OCR | topsInference | FP16 | 1 |
3. API 变更信息
相对于上一次 Topsrider v3.2.204 版本 , runtime、算子、hlir_build 的 API 变更信息如下,具体 API 内容请参考对应 API 手册。
3.1 Runtime API 变更
3.1.1 新增API
- typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceGetDefaultMemPool
- typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceSetMemPool
3.1.2 删除API
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAlloc
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAvailableNumGet
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalFree
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalMaxNumGet
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalRead
- typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalWrite
3.2 算子API变更
3.2.1 新增API
- namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActTwice
- namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool3d
- namespace:topsaten TOPSATEN_EXPORT topsatenCorrcoef
- namespace:topsaten TOPSATEN_EXPORT topsatenFftC2c
- namespace:topsaten TOPSATEN_EXPORT topsatenFftR2c
- namespace:topsaten TOPSATEN_EXPORT topsatenIm2col
- namespace:topsaten TOPSATEN_EXPORT topsatenIstft
- namespace:topsaten TOPSATEN_EXPORT topsatenLinalgMatrixExp
- namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3d
- namespace:topsaten TOPSATEN_EXPORT topsatenScatterAdd
- namespace:topsaten TOPSATEN_EXPORT topsatenStft
- namespace:topsaten TOPSATEN_EXPORT topsatenThnnFusedLstmCell
- namespace:topsexts TOPSATEN_EXPORT topsextsAttnLayerFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsBottleneckNoBN
- namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusionV2
- namespace:topsexts TOPSATEN_EXPORT topsextsConv2dActivationDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConv2dDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConvAddActivationDepthToSpace
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2
- namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2DtS
- namespace:topsexts TOPSATEN_EXPORT topsextsConvScaledBiasActivation
- namespace:topsexts TOPSATEN_EXPORT topsextsFusedBiasAct
- namespace:topsexts TOPSATEN_EXPORT topsextsInplaceAbn
- namespace:topsexts TOPSATEN_EXPORT topsextsMaxPool2d
- namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAddRelu
- namespace:topsexts TOPSATEN_EXPORT topsextsMultiHeadAttentionFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsMultiLayerPerceptronFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsResizeConvBiasActivationFusion
- namespace:topsexts TOPSATEN_EXPORT topsextsSubAddDivMulAddClampSubAdd
- namespace:topsexts TOPSATEN_EXPORT topsextsTemperature
- namespace:topsexts TOPSATEN_EXPORT topsextsTemperatureSoftmax
- namespace:topsexts TOPSATEN_EXPORT topsextsUpfirdn2d
3.2.2 更新的API
- namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenBaddbmmActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenLinearActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenLinearQuantActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenMatmulActivation
- namespace:topsaten TOPSATEN_EXPORT topsatenMedian
- namespace:topsaten TOPSATEN_EXPORT topsatenMultiHeadAttentionFusion
- namespace:topsaten TOPSATEN_EXPORT topsatenMultiLayerPerceptronFusion
- namespace:topsaten TOPSATEN_EXPORT topsatenNanmedian
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScatter
- namespace:topsaten TOPSATEN_EXPORT topsatenTop_k_top_p
- namespace:topsaten TOPSATEN_EXPORT topsatenTrapz
3.3 Hilr_build
3.3.1 删除 API
- namespace:builder::Op AddQuant
- namespace:builder::Op AllReduce
- namespace:builder::Op AveragePoolQuant
- namespace:builder::Op ConcatQuant
- namespace:builder::Op ConvBias
- namespace:builder::Op ConvQuant
- namespace:builder::Op ConvTransposeQuant
- namespace:builder::Op DeformConv
- namespace:builder::Op DequantizeLinearQuant
- namespace:builder::Op DotGeneralBias
- namespace:builder::Op DotGeneralBiasQuant
- namespace:builder::Op GeneralSplit
- namespace:builder::Op GlobalAveragePoolQuant
- namespace:builder::Op LayerNormInference
- namespace:builder::Op MulQuant
- namespace:builder::Op PartialReduce
- namespace:builder::Op QuantConvert
- namespace:builder::Op QuantizeLinearQuant
- namespace:builder::shared_ptr SequentialMergeBuilder
- namespace:builder::Op SubQuant
4.FW信息
FW | 版本号 |
S60 SSM FW | Boot FW 33.6.5,Runtime FW 33.6.5.28 |
AP | 1.1.6 |
SP | 2.11.88 |
VPU | 3.1.5 |
5. 组件信息
5.1 TopsRider run包组件信息
Package Name | File |
topsplatform | TopsPlatform_1.2.4.12-ee064d_deb_amd64.run |
tgi | ai_framework/text-generation-inference/src |
sentence-transformers | sentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl |
vllm | vllm-0.6.1.post2+gcu.3.2.20241230-cp38-abi3-linux_x86_64.whl |
topscompressor | topscompressor-3.3.20241224-py3.8-none-any.whl |
topscompressor-3.3.20241224-py3.10-none-any.whl | |
yunchang | yunchang-0.3.6+gcu.3.2.20241212-py3-none-any.whl |
xfuser | xfuser-0.3.3+gcu.3.1.20241212-py3.10-none-any.whl |
topsideas | topsideas-3.2.20241115-cp38-cp38-linux_x86_64.whl |
topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl | |
onnxruntime_gcu | onnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl |
tops-extension | tops_extension-3.2.20241219-cp310-cp310-linux_x86_64.whl |
tensorflow_2.13 | tensorflow_gcu-2.13.1+3.3.0-cp38-cp38-linux_x86_64.whl |
tensorflow_gcu-2.13.1+3.3.0-cp310-cp310-linux_x86_64.whl | |
tensorflow_2.9 | tensorflow_gcu-2.9.0+3.3.0-cp38-cp38-linux_x86_64.whl |
tensorflow_gcu-2.9.0+3.3.0-cp310-cp310-linux_x86_64.whl | |
paddle-custom-gcu | paddle_custom_gcu-3.0.0b1+3.3.0-cp310-cp310-linux_x86_64.whl |
onnxruntime_gcu | onnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl |
xformers | xformers-0.0.25+gcu.3.2.20241220-cp310-cp310-linux_x86_64.whl |
tops-extension | tops_extension-3.2.20241219-cp38-cp38-linux_x86_64.whl |
topsgraph | topsgraph_0.1.20241124-1_amd64.deb |
topsfactor | topsfactor_3.3.112-1_amd64.deb |
topsaten | topsaten_3.2.20241227-1_amd64.deb |
tops-sdk | tops-sdk_3.3.112-1_amd64.deb |
tops-inference | tops-inference_3.3.112-1_amd64.deb |
eccl | eccl_3.1.20241213-1_amd64.deb |
eccl-tests | eccl-tests_3.1.20241213-1_amd64.deb |
topsgraph-py | topsgraph-0.1.20241124-py3.10-none-any.whl |
xformers | xformers-0.0.25+gcu.3.2.20241220-cp38-cp38-linux_x86_64.whl |
TopsInference | TopsInference-3.3.112-py3.10-none-any.whl |
TopsInference-3.3.112-py3.8-none-any.whl | |
torch-gcu-2 | torch_gcu-2.3.0_3.2.3_x86_64.run |
fast-diffusers | fast_diffusers-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl |
fast_diffusers-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl | |
fast-diffusers-utils | fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl |
fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl | |
libtorch | ai_framework/torch_gcu/libtorch_gcu |
5.2 TopsRider run 包外的组件信息
Package Name | File |
ffmpeg-gcu | ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64.deb |
ffmpeg-gcu-1.2.3.7-20241120-n4.4-1.x86_64.rpm | |
ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64-dbgsym.ddeb | |
TopsVisualProfiler | TopsVisualProfiler_1.2.4.12-ee064d_win64.zip |
Application run | TopsRider_i3x_3.3.112_application.run |
5.3 TopsRider ddeb 包组件信息
Package Name | File |
TopsRider_3.3.112_ddeb_amd64.run | eccl_3.1.20241213-1_amd64-dbgsym.ddeb |
eccl-tests_3.1.20241213-1_amd64-dbgsym.ddeb | |
topsaten_3.2.20241227-1_amd64-dbgsym.ddeb | |
topscv_1.2.2.15-20241112-1_amd64-dbgsym.ddeb | |
topsfactor_3.3.112-1_amd64-dbgsym.ddeb | |
tops-inference_3.3.112-1_amd64-dbgsym.ddeb | |
TopsPlatform_1.2.4.12-ee064d_ddeb_amd64.run | |
tops-sdk_3.3.112-1_amd64-dbgsym.ddeb |
6. 操作系统和Python支持
6.1 适配说明
- Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
- Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host
6.2 操作系统支持列表
操作系统名称 | 架构 | 内核版本 | GCC | GLIBC | 说明 |
Ubuntu 20.04.z(z<=5) | x86 | 5.4 & 5.11 & 5.13 & 5.15 | 9.3 | 2.31 | Host & Docker |
Ubuntu 22.04.z (z<=1) | x86 | 5.15 | 11.2 | 2.35 | Host & Docker |
Kylin v10 | x86 | 4.19.0 | 7.3 | 2.28 | 仅驱动在 Host 上已适 |
UOS 20 Server | x86 | 4.19.0 | 7.3 | 2.28 | |
OpenEular | X86 | 5.10.0 | 10.3.1 | 2.34 | |
龙蜥 8.2 QU2 | X86 | 4.18.0 | 8.3.1 | 2.28 | |
龙蜥 8.6 | X86 | 4.19.90 | 7.3.0 | 2.28 | |
TLinux 4.2 | X86 | 6.6.30 | 12.3.1 | 2.38 |
6.3 Python 支持版本
Python 3.8(TopsInference 推理模型),Python 3.10
7. 文档更新
7.1 增加文档
《TopsGraph Python API 参考》
7.2 删除文档
《torch_GCU2.1 用户使用手册》 《Library Kernel API 参考》
8. 使用限制
ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
vllm : llama2 7b、llama2 13b、
典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert