1. 简介
TopsRider v3.4.107 版本适用于 S60 系列设备。下述新增/修改特性和问题修复部分是相对于上一次 发布 Topsrider v3.3.112 的主要变更。
2. 功能优化
2.1 新增/修改基本特性
- TopsRider run 包
- 增加同名选项 –toolkit 作为 –container 的同名选项,并列在基本功能帮助显示中
- Torch-gcu
- 新增支持 Torch-gcu 2.5.1 版本
- 移除 Torch 2.3 版本支持
- 算子
- Sort 算子支持 BF16/FP32
- Aten Copy 算子增加对特定 pattern 的优化
- ECCL
- 支持集合通信的统计
- 支持 send/recv 多 channel
- Xdit
- 升级到 v0.4.1
- 去除 v0.3.3
- TopsPlatform
- Host 上支持 kernel 6.8.0-ubuntu22.04.5.x86_64 + gcc 12.3.0
- 新增 efml 接口来查询 GCU 异常和错误的记录
- 新增 efsmi -dmon 显示 sip 利用率
- 新增 efsmi -q 显示每个内存控制器上的 DBE 数量
- 新增心跳功能,FW 不响应时会自动 reset
- 新增 3.0 launch kernel 前清 L1 的开关,该选项默认关闭,作为 debug 功能提供
- TopsProfiler 新增支持–timeunit 选项,可控制 console 输出的 GCU 事件的时间单位
- TopsVisualProfiler 新增支持关联显示 Runtime memcpy/launchKernel trace 和其相关 的 GCU 事件
- Topsprof 新增 kernel filter 模式,可通过参数(–kernel-name, –kernel-id)过滤需要 profile 的 gcu kernel
- TopsVisualProfiler 新增支持直接双击打开 vpd 文件
- TopsVisualProfiler 新增时间线横向导航图
- 新增 Findtops
- 新增语言 tops,CMake 现可识别并支持 .tops 编程语言
- 支持将 .tops 源文件添加至目标程序,并使用 topscc 编译器完成构建
- 可通过设置 CMAKE_TOPS_COMPILER、CMAKE_TOPS_FLAGS 等变量,自定 义 .tops 文件的编译流程
2.2 新支持模型
2.2.1 LLM
模型名称 | 框架 | 数据类型 | 卡数 |
DeepSeek-R1-Distill-Qwen-1.5B | vllm-0.6.1.post2 | bf16 | 1 |
DeepSeek-R1-Distill-Qwen-7B | vllm-0.6.1.post2 | bf16 | 1 |
DeepSeek-R1-Distill-Llama-8B | vllm-0.6.1.post2 | bf16 | 1 |
DeepSeek-R1-Distill-Qwen-14B | vllm-0.6.1.post2 | bf16 | 1 |
DeepSeek-R1-Distill-Qwen-32B | vllm-0.6.1.post2 | bf16 | 4 |
DeepSeek-R1-Distill-Llama-70B | vllm-0.6.1.post2 | bf16 | 8 |
LLaMa3.3-70B-Instruct | vllm-0.6.1.post2 | bf16 | 8 |
qwen2.5-0.5b-instruct | vllm-0.6.1.post2 | bf16 | 1 |
qwen2.5-vl-3b | vllm-0.7.2 | bf16 | 1 |
qwen2.5-72b-instruct-gptq-int8 | vllm-0.6.1.post2 | w8a16 | 4 |
qwen2.5-32b-instruct-gptq-int8 | vllm-0.6.1.post2 | w8a16 | 2 |
2.2.2 多模态
模型名称 | 框架 | 数据类型 | 卡数 |
internVL-2.5-78b | vllm-0.6.1.post2 | bf16 | 8 |
internVL2-8b | vllm-0.6.1.post2 | bf16 | 1 |
internVL2.5-2b | vllm-0.6.1.post2 | bf16 | 1 |
2.2.3 视频生成
模型名称 | 框架 | 数据类型 | 卡数 |
SD-3.5-large | PyTorch2.5.1 | fp16 | 1 |
HunyuanVideo | xdit-0.4.0 | bf16 | 2/4/8 |
deepseek-JanusPro | PyTorch2.5.1 | bf16 | 1 |
3 API变更信息
相对于 Topsrider v3.3.112 版本 , runtime、算子的 API 变更信息如下,具体 API 内容请参考对 应 API 手册。
3.1 Runtime API 变更
3.1.1 新增API
- typeref:typename:TOPS_PUBLIC_API topsError_t topsExecutableGetRuntimeOutputShapeV2
- typeref:typename:TOPS_PUBLIC_API topsError_t topsProfilerStart
- typeref:typename:TOPS_PUBLIC_API topsError_t topsProfilerStop
3.2 算子API变更
3.2.1 新增API
- namespace:topsaten TOPSATEN_EXPORT topsatenAddQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenAddReluQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool2dQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenCol2im
- namespace:topsaten TOPSATEN_EXPORT topsatenConcatQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenConv1d
- namespace:topsaten TOPSATEN_EXPORT topsatenConv3d
- namespace:topsaten TOPSATEN_EXPORT topsatenConvQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenConvTranspose1d
- namespace:topsaten TOPSATEN_EXPORT topsatenConvTranspose3d
- namespace:topsaten TOPSATEN_EXPORT topsatenConvolutionQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenFlushNan
- namespace:topsaten TOPSATEN_EXPORT topsatenGeluQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenLeakyReluQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3dWithIndices
- namespace:topsaten TOPSATEN_EXPORT topsatenMulQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenNllLoss
- namespace:topsaten TOPSATEN_EXPORT topsatenRandintGetOffset
- namespace:topsaten TOPSATEN_EXPORT topsatenRandomGetOffset
- namespace:topsaten TOPSATEN_EXPORT topsatenReplicationPad1d
- namespace:topsaten TOPSATEN_EXPORT topsatenReplicationPad3d
- namespace:topsaten TOPSATEN_EXPORT topsatenReshapeBs
- namespace:topsaten TOPSATEN_EXPORT topsatenRngUniformGetOffset
- namespace:topsaten TOPSATEN_EXPORT topsatenRoiAlign
- namespace:topsaten TOPSATEN_EXPORT topsatenRoiPool
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttentionGetOffset
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttentionGetOffset
- namespace:topsaten TOPSATEN_EXPORT topsatenSoftmaxQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenSubQuant
- namespace:topsaten TOPSATEN_EXPORT topsatenTileBS
- namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleBicubic2dAa
- namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearest1d
- namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearestExact1d
- namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearestExact2d
- namespace:topsexts TOPSATEN_EXPORT topsextsDynamicSplit
- namespace:topsexts TOPSATEN_EXPORT topsextsSiluAndMul
- namespace:topsexts TOPSATEN_EXPORT topsextsSum
- namespace:topsfa TOPSATEN_EXPORT topsfaFlashAttnBwd
- namespace:topspaddle TOPSATEN_EXPORT topspaddleConvScaledBiasActivation
- namespace:topspaddle TOPSATEN_EXPORT topspaddleConvTransposeActivation
- namespace:topste TOPSATEN_EXPORT topsteAdam
- namespace:topste TOPSATEN_EXPORT topsteAdamCapturable
- namespace:topste TOPSATEN_EXPORT topsteAdamCapturableMaster
- namespace:topste TOPSATEN_EXPORT topsteBlasGemmQuant
- namespace:topste TOPSATEN_EXPORT topsteDelayedScaling
- namespace:topste TOPSATEN_EXPORT topsteDelayedScalingAfterReduction
- namespace:topste TOPSATEN_EXPORT topsteL2Norm
- namespace:topste TOPSATEN_EXPORT topsteMultilTensorScale
- namespace:topste TOPSATEN_EXPORT topsteRmsNormFwdFP8
- namespace:topste TOPSATEN_EXPORT topsteTranspose
- namespace:topste TOPSATEN_EXPORT topsteUnscaleL2Norm
- namespace:topstf TOPSATEN_EXPORT topstfMatrixDiagPartV3
- namespace:topstf TOPSATEN_EXPORT topstfMatrixDiagV3
- namespace:topstf TOPSATEN_EXPORT topstfMatrixTriangularSolve
- namespace:topstf TOPSATEN_EXPORT topstfOneHot
- namespace:topsvllm TOPSATEN_EXPORT topsvllmConcatAndCacheMla
- namespace:topsvllm TOPSATEN_EXPORT topsvllmDotBiasQuant
- namespace:topsvllm TOPSATEN_EXPORT topsvllmDynamicScaledInt8Quant
- namespace:topsvllm TOPSATEN_EXPORT topsvllmFusedDotBiasScaledQuant
- namespace:topsvllm TOPSATEN_EXPORT topsvllmGetEPIndices
- namespace:topsvllm TOPSATEN_EXPORT topsvllmGroupedTopk
- namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionDotQuantV2
- namespace:topsaten TOPSATEN_EXPORT topsatenAmpForeachNonFiniteCheckAndUnscale
- namespace:topsaten TOPSATEN_EXPORT topsatenAmpUpdateScale
- namespace:topsaten TOPSATEN_EXPORT topsatenElementwiseFusion
- namespace:topsaten TOPSATEN_EXPORT topsatenForeachNorm
- namespace:topsaten TOPSATEN_EXPORT topsatenKthvalue
- namespace:topsaten TOPSATEN_EXPORT topsatenRandom
- namespace:topsaten TOPSATEN_EXPORT topsatenRemainder
- namespace:topsaten TOPSATEN_EXPORT topsatenRngUniform
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttentionBackward
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
- namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttentionBackward
- namespace:topsaten TOPSATEN_EXPORT topsatenTrapz
- namespace:topstf TOPSATEN_EXPORT topstfScatterNDUpdate
- namespace:topsvllm TOPSATEN_EXPORT topsvllmInvokeFusedMoeNonGatherQuantKernel
- namespace:topsvllm TOPSATEN_EXPORT topsvllmInvokeFusedMoeQuantKernel
- namespace:topsvllm TOPSATEN_EXPORT topsvllmMoeAlignBlockSize
- namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionV1
- namespace:topsaten TOPSATEN_EXPORT topsatenDotBiasQuant
- namespace:topsexts TOPSATEN_EXPORT topsextSiluAndMul
3.2.2 更新的API
- namespace:topsaten TOPSATEN_EXPORT topsatenGather
- namespace:topste TOPSATEN_EXPORT topsteFP8Dequantize
- namespace:topste TOPSATEN_EXPORT topsteFP8Quantize
- namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionDotQuantV1
- namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionV2
- namespace:topsaten TOPSATEN_EXPORT topsatenFusedBiasAct
- namespace:topsaten TOPSATEN_EXPORT topsatenInplaceAbn
- namespace:topsaten TOPSATEN_EXPORT topsatenUpfirdn2d
4.FW信息
FW | 版本号 |
S60 SSM FW | Boot FW 33.6.5,Runtime FW 33.6.5.30 |
AP | 1.1.6 |
SP | 3.1.2 |
VPU | 3.1.5 |
5. 组件信息
5.1 TopsRider run包组件信息
- eccl
- eccl_3.4.20250416-1_amd64.deb
- eccl-tests
- eccl-tests_3.4.20250416-1_amd64.deb
- fast-diffusers/fast-diffusers-utils
- fast_diffusers-0.29.2+gcu.3.2.20250327-py3.10-none-any.whl
- fast_diffusers_utils-0.29.2+gcu.3.2.20250327-py3.10-none-any.whl
- libtorch
- libtorch_gcu-2.5.0+3.3.1.zip
- onnxruntime_gcu
- onnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl
- onnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl
- paddle-custom-gcu
- paddle_custom_gcu-3.0.0b1+3.4.0-cp310-cp310-linux_x86_64.whl
- sentence-transformers
- sentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl
- tensorflow_2.13
- tensorflow_gcu-2.13.1+3.4.0-cp38-cp38-linux_x86_64.whl
- tensorflow_gcu-2.13.1+3.4.0-cp310-cp310-linux_x86_64.whl
- tensorflow_2.9
- tensorflow_gcu-2.9.0+3.4.0-cp38-cp38-linux_x86_64.whl
- tensorflow_gcu-2.9.0+3.4.0-cp310-cp310-linux_x86_64.whl
- tgi
- text-generation-inference_2.2.0+gcu.3.4.107.tar.gz
- topsaten
- topsaten_3.3.20250402-1_amd64.deb
- topscompressor
- topscompressor-3.3.20250327-py3.10-none-any.whl
- tops-extension
- tops_extension-3.2.20250311+torch.2.5.1-cp310-cp310-linux_x86_64.whl
- topsfactor
- topsfactor_3.4.107-1_amd64.deb
- topsgraph
- topsgraph_3.4.0-1_amd64.deb
- topsgraph-py
- topsgraph-3.4.0-cp310-cp310-linux_x86_64.whl
- topsideas
- topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl
- TopsInference
- TopsInference-3.4.107-py3.10-none-any.whl
- TopsInference-3.4.107-py3.8-none-any.whl
- tops-inference_3.4.107-1_amd64.deb
- topsplatform
- TopsPlatform_1.4.0.606-e9069e_deb_amd64.run
- tops-sdk
- tops-sdk_3.4.107-1_amd64.deb
- torch-gcu-2
- torch_gcu-2.5.x_2.5.1+3.4.0_x86_64.run
- triton-gcu
- triton-gcu_0.3.0.4-1_amd64.deb
- triton_gcu-0.3.0.4-py3.10-none-any.whl
- Vllm 0.6.1
- vllm-0.6.1.post2+torch.2.5.1.gcu.3.2.20250311-cp39-abi3-linux_x86_64.whl
- vllm-gcu 0.7.2
- vllm_gcu-0.7.2+3.4.20250318-cp39-abi3-linux_x86_64.whl
- xformers
- xformers-0.0.25+torch.2.5.1.gcu.3.2.20250315-cp310-cp310-linux_x86_64.whl
- xformers-0.0.28.post3+torch.2.5.1.gcu.3.2.20250317-cp310-cp310-linux_x86_64.whl
- xfuser
- xfuser-0.4.1+gcu.3.3.20250331-py3.10-none-any.whl
5.2 TopsRider run 包外的组件信息
- ffmpeg-gcu
- ffmpeg-gcu-1.2.4.3-n4.4-1.tar.gz
- TopsVisualProfiler
- TopsVisualProfiler_1.4.0.606-e9069e_win64.zi
- Application run
- TopsRider_i3x_3.4.107_application.run
5.3 TopsRider ddeb 包组件信息
- TopsRider_3.4.107_ddeb_amd64.run
- eccl_3.4.20250416-1_amd64-dbgsym.ddeb
- eccl-tests_3.4.20250416-1_amd64-dbgsym.ddeb
- topsaten_3.3.20250402-1_amd64-dbgsym.ddeb
- topscv_1.2.4.1-20250205-1_amd64-dbgsym.ddeb
- topsfactor_3.4.107-1_amd64-dbgsym.ddeb
- tops-inference_3.4.107-1_amd64-dbgsym.ddeb
- TopsPlatform_1.4.0.606-e9069e_ddeb_amd64.run
- tops-sdk_3.4.107-1_amd64-dbgsym.ddeb
- triton-gcu_0.3.0.4-1_amd64-dbgsym.ddeb
6. 操作系统和Python支持
6.1 适配说明
- Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
- Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host
6.2 操作系统支持列表
操作系统名称 | 架构 | 内核版本 | GCC | GLIBC | 说明 |
Ubuntu 20.04.z(z<=5) | x86 | 5.4 & 5.11 & 5.13 & 5.15 | 9.3 | 2.31 | Host & Docker |
Ubuntu 22.04.z (z<=4) | x86 | 5.15 | 11.2 | 2.35 | Host & Docker |
Ubuntu 22.04.z x | 86 | 6.8 | 12.3 | 2.35 | 仅驱动在 Host 上已适配 |
Kylin v10 | x86 | 4.19.0 | 7.3 | 2.28 | |
UOS 20 Server | x86 | 4.19.0 | 7.3 | 2.28 | |
OpenEular | X86 | 5.10.0 | 10.3.1 | 2.34 | |
龙蜥 8.2 QU2 | X86 | 4.18.0 | 8.3.1 | 2.28 | |
龙蜥 8.6 | X86 | 4.19.90 | 7.3.0 | 2.28 | |
TLinux 4.2 | X86 | 6.6.30 | 12.3.1 | 2.38 |
6.3 Python 支持版本
Python 3.8(只支持 TopsInference 推理框架),Python 3.10
7. 文档更新
7.1 增加文档
《torch_GCU2.5 用户使用手册》
7.2 删除文档
《torch_GCU2.3 用户使用手册》
8. 使用限制
ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
vllm : llama2 7b、llama2 13b、
典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert