1. 简介

TopsRider v3.4.107 版本适用于 S60 系列设备。下述新增/修改特性和问题修复部分是相对于上一次 发布 Topsrider v3.3.112 的主要变更。

2. 功能优化

2.1 新增/修改基本特性

  • TopsRider run 包
    • 增加同名选项 –toolkit 作为 –container 的同名选项,并列在基本功能帮助显示中
  • Torch-gcu
    • 新增支持 Torch-gcu 2.5.1 版本
    • 移除 Torch 2.3 版本支持
  • 算子
    • Sort 算子支持 BF16/FP32
    • Aten Copy 算子增加对特定 pattern 的优化
  • ECCL
    • 支持集合通信的统计
    • 支持 send/recv 多 channel
  • Xdit
    • 升级到 v0.4.1
    • 去除 v0.3.3
  • TopsPlatform
    • Host 上支持 kernel 6.8.0-ubuntu22.04.5.x86_64 + gcc 12.3.0
    • 新增 efml 接口来查询 GCU 异常和错误的记录
    • 新增 efsmi -dmon 显示 sip 利用率
    • 新增 efsmi -q 显示每个内存控制器上的 DBE 数量
    • 新增心跳功能,FW 不响应时会自动 reset
    • 新增 3.0 launch kernel 前清 L1 的开关,该选项默认关闭,作为 debug 功能提供
    • TopsProfiler 新增支持–timeunit 选项,可控制 console 输出的 GCU 事件的时间单位
    • TopsVisualProfiler 新增支持关联显示 Runtime memcpy/launchKernel trace 和其相关 的 GCU 事件
    • Topsprof 新增 kernel filter 模式,可通过参数(–kernel-name, –kernel-id)过滤需要 profile 的 gcu kernel
    • TopsVisualProfiler 新增支持直接双击打开 vpd 文件
    • TopsVisualProfiler 新增时间线横向导航图
    • 新增 Findtops
      • 新增语言 tops,CMake 现可识别并支持 .tops 编程语言
      • 支持将 .tops 源文件添加至目标程序,并使用 topscc 编译器完成构建
      • 可通过设置 CMAKE_TOPS_COMPILER、CMAKE_TOPS_FLAGS 等变量,自定 义 .tops 文件的编译流程

2.2 新支持模型

2.2.1 LLM

模型名称框架数据类型卡数
DeepSeek-R1-Distill-Qwen-1.5Bvllm-0.6.1.post2bf161
DeepSeek-R1-Distill-Qwen-7Bvllm-0.6.1.post2bf161
DeepSeek-R1-Distill-Llama-8Bvllm-0.6.1.post2bf161
DeepSeek-R1-Distill-Qwen-14Bvllm-0.6.1.post2bf161
DeepSeek-R1-Distill-Qwen-32Bvllm-0.6.1.post2bf164
DeepSeek-R1-Distill-Llama-70Bvllm-0.6.1.post2bf168
LLaMa3.3-70B-Instructvllm-0.6.1.post2bf168
qwen2.5-0.5b-instructvllm-0.6.1.post2bf161
qwen2.5-vl-3bvllm-0.7.2bf161
qwen2.5-72b-instruct-gptq-int8vllm-0.6.1.post2w8a164
qwen2.5-32b-instruct-gptq-int8vllm-0.6.1.post2w8a162

2.2.2 多模态

模型名称框架数据类型卡数
internVL-2.5-78bvllm-0.6.1.post2bf168
internVL2-8bvllm-0.6.1.post2bf161
internVL2.5-2bvllm-0.6.1.post2bf161

2.2.3 视频生成

模型名称框架数据类型卡数
SD-3.5-largePyTorch2.5.1fp161
HunyuanVideoxdit-0.4.0bf162/4/8
deepseek-JanusProPyTorch2.5.1bf161

3 API变更信息

相对于 Topsrider v3.3.112 版本 , runtime、算子的 API 变更信息如下,具体 API 内容请参考对 应 API 手册。

3.1 Runtime API 变更

3.1.1 新增API

  1. typeref:typename:TOPS_PUBLIC_API topsError_t topsExecutableGetRuntimeOutputShapeV2
  2. typeref:typename:TOPS_PUBLIC_API topsError_t topsProfilerStart
  3. typeref:typename:TOPS_PUBLIC_API topsError_t topsProfilerStop

3.2 算子API变更

3.2.1 新增API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenAddQuant
  2. namespace:topsaten TOPSATEN_EXPORT topsatenAddReluQuant
  3. namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool2dQuant
  4. namespace:topsaten TOPSATEN_EXPORT topsatenCol2im
  5. namespace:topsaten TOPSATEN_EXPORT topsatenConcatQuant
  6. namespace:topsaten TOPSATEN_EXPORT topsatenConv1d
  7. namespace:topsaten TOPSATEN_EXPORT topsatenConv3d
  8. namespace:topsaten TOPSATEN_EXPORT topsatenConvQuant
  9. namespace:topsaten TOPSATEN_EXPORT topsatenConvTranspose1d
  10. namespace:topsaten TOPSATEN_EXPORT topsatenConvTranspose3d
  11. namespace:topsaten TOPSATEN_EXPORT topsatenConvolutionQuant
  12. namespace:topsaten TOPSATEN_EXPORT topsatenFlushNan
  13. namespace:topsaten TOPSATEN_EXPORT topsatenGeluQuant
  14. namespace:topsaten TOPSATEN_EXPORT topsatenLeakyReluQuant
  15. namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3dWithIndices
  16. namespace:topsaten TOPSATEN_EXPORT topsatenMulQuant
  17. namespace:topsaten TOPSATEN_EXPORT topsatenNllLoss
  18. namespace:topsaten TOPSATEN_EXPORT topsatenRandintGetOffset
  19. namespace:topsaten TOPSATEN_EXPORT topsatenRandomGetOffset
  20. namespace:topsaten TOPSATEN_EXPORT topsatenReplicationPad1d
  21. namespace:topsaten TOPSATEN_EXPORT topsatenReplicationPad3d
  22. namespace:topsaten TOPSATEN_EXPORT topsatenReshapeBs
  23. namespace:topsaten TOPSATEN_EXPORT topsatenRngUniformGetOffset
  24. namespace:topsaten TOPSATEN_EXPORT topsatenRoiAlign
  25. namespace:topsaten TOPSATEN_EXPORT topsatenRoiPool
  26. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttentionGetOffset
  27. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttentionGetOffset
  28. namespace:topsaten TOPSATEN_EXPORT topsatenSoftmaxQuant
  29. namespace:topsaten TOPSATEN_EXPORT topsatenSubQuant
  30. namespace:topsaten TOPSATEN_EXPORT topsatenTileBS
  31. namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleBicubic2dAa
  32. namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearest1d
  33. namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearestExact1d
  34. namespace:topsaten TOPSATEN_EXPORT topsatenUpsampleNearestExact2d
  35. namespace:topsexts TOPSATEN_EXPORT topsextsDynamicSplit
  36. namespace:topsexts TOPSATEN_EXPORT topsextsSiluAndMul
  37. namespace:topsexts TOPSATEN_EXPORT topsextsSum
  38. namespace:topsfa TOPSATEN_EXPORT topsfaFlashAttnBwd
  39. namespace:topspaddle TOPSATEN_EXPORT topspaddleConvScaledBiasActivation
  40. namespace:topspaddle TOPSATEN_EXPORT topspaddleConvTransposeActivation
  41. namespace:topste TOPSATEN_EXPORT topsteAdam
  42. namespace:topste TOPSATEN_EXPORT topsteAdamCapturable
  43. namespace:topste TOPSATEN_EXPORT topsteAdamCapturableMaster
  44. namespace:topste TOPSATEN_EXPORT topsteBlasGemmQuant
  45. namespace:topste TOPSATEN_EXPORT topsteDelayedScaling
  46. namespace:topste TOPSATEN_EXPORT topsteDelayedScalingAfterReduction
  47. namespace:topste TOPSATEN_EXPORT topsteL2Norm
  48. namespace:topste TOPSATEN_EXPORT topsteMultilTensorScale
  49. namespace:topste TOPSATEN_EXPORT topsteRmsNormFwdFP8
  50. namespace:topste TOPSATEN_EXPORT topsteTranspose
  51. namespace:topste TOPSATEN_EXPORT topsteUnscaleL2Norm
  52. namespace:topstf TOPSATEN_EXPORT topstfMatrixDiagPartV3
  53. namespace:topstf TOPSATEN_EXPORT topstfMatrixDiagV3
  54. namespace:topstf TOPSATEN_EXPORT topstfMatrixTriangularSolve
  55. namespace:topstf TOPSATEN_EXPORT topstfOneHot
  56. namespace:topsvllm TOPSATEN_EXPORT topsvllmConcatAndCacheMla
  57. namespace:topsvllm TOPSATEN_EXPORT topsvllmDotBiasQuant
  58. namespace:topsvllm TOPSATEN_EXPORT topsvllmDynamicScaledInt8Quant
  59. namespace:topsvllm TOPSATEN_EXPORT topsvllmFusedDotBiasScaledQuant
  60. namespace:topsvllm TOPSATEN_EXPORT topsvllmGetEPIndices
  61. namespace:topsvllm TOPSATEN_EXPORT topsvllmGroupedTopk
  62. namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionDotQuantV2
  63. namespace:topsaten TOPSATEN_EXPORT topsatenAmpForeachNonFiniteCheckAndUnscale
  64. namespace:topsaten TOPSATEN_EXPORT topsatenAmpUpdateScale
  65. namespace:topsaten TOPSATEN_EXPORT topsatenElementwiseFusion
  66. namespace:topsaten TOPSATEN_EXPORT topsatenForeachNorm
  67. namespace:topsaten TOPSATEN_EXPORT topsatenKthvalue
  68. namespace:topsaten TOPSATEN_EXPORT topsatenRandom
  69. namespace:topsaten TOPSATEN_EXPORT topsatenRemainder
  70. namespace:topsaten TOPSATEN_EXPORT topsatenRngUniform
  71. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
  72. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttentionBackward
  73. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
  74. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttentionBackward
  75. namespace:topsaten TOPSATEN_EXPORT topsatenTrapz
  76. namespace:topstf TOPSATEN_EXPORT topstfScatterNDUpdate
  77. namespace:topsvllm TOPSATEN_EXPORT topsvllmInvokeFusedMoeNonGatherQuantKernel
  78. namespace:topsvllm TOPSATEN_EXPORT topsvllmInvokeFusedMoeQuantKernel
  79. namespace:topsvllm TOPSATEN_EXPORT topsvllmMoeAlignBlockSize
  80. namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionV1
  81. namespace:topsaten TOPSATEN_EXPORT topsatenDotBiasQuant
  82. namespace:topsexts TOPSATEN_EXPORT topsextSiluAndMul

3.2.2 更新的API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenGather
  2. namespace:topste TOPSATEN_EXPORT topsteFP8Dequantize
  3. namespace:topste TOPSATEN_EXPORT topsteFP8Quantize
  4. namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionDotQuantV1
  5. namespace:topsvllm TOPSATEN_EXPORT topsvllmPagedAttentionV2
  6. namespace:topsaten TOPSATEN_EXPORT topsatenFusedBiasAct
  7. namespace:topsaten TOPSATEN_EXPORT topsatenInplaceAbn
  8. namespace:topsaten TOPSATEN_EXPORT topsatenUpfirdn2d

4.FW信息

FW版本号
S60 SSM FWBoot FW 33.6.5,Runtime FW 33.6.5.30
AP1.1.6
SP3.1.2
VPU3.1.5

5. 组件信息

5.1 TopsRider run包组件信息

  • eccl
    • eccl_3.4.20250416-1_amd64.deb
  • eccl-tests
    • eccl-tests_3.4.20250416-1_amd64.deb
  • fast-diffusers/fast-diffusers-utils
    • fast_diffusers-0.29.2+gcu.3.2.20250327-py3.10-none-any.whl
    • fast_diffusers_utils-0.29.2+gcu.3.2.20250327-py3.10-none-any.whl
  • libtorch
    • libtorch_gcu-2.5.0+3.3.1.zip
  • onnxruntime_gcu
    • onnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl
    • onnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl
  • paddle-custom-gcu
    • paddle_custom_gcu-3.0.0b1+3.4.0-cp310-cp310-linux_x86_64.whl
  • sentence-transformers
    • sentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl
  • tensorflow_2.13
    • tensorflow_gcu-2.13.1+3.4.0-cp38-cp38-linux_x86_64.whl
    • tensorflow_gcu-2.13.1+3.4.0-cp310-cp310-linux_x86_64.whl
  • tensorflow_2.9
    • tensorflow_gcu-2.9.0+3.4.0-cp38-cp38-linux_x86_64.whl
    • tensorflow_gcu-2.9.0+3.4.0-cp310-cp310-linux_x86_64.whl
  • tgi
    • text-generation-inference_2.2.0+gcu.3.4.107.tar.gz
  • topsaten
    • topsaten_3.3.20250402-1_amd64.deb
  • topscompressor
    • topscompressor-3.3.20250327-py3.10-none-any.whl
  • tops-extension
    • tops_extension-3.2.20250311+torch.2.5.1-cp310-cp310-linux_x86_64.whl
  • topsfactor
    • topsfactor_3.4.107-1_amd64.deb
  • topsgraph
    • topsgraph_3.4.0-1_amd64.deb
  • topsgraph-py
    • topsgraph-3.4.0-cp310-cp310-linux_x86_64.whl
  • topsideas
    • topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl
  • TopsInference
    • TopsInference-3.4.107-py3.10-none-any.whl
    • TopsInference-3.4.107-py3.8-none-any.whl
    • tops-inference_3.4.107-1_amd64.deb
  • topsplatform
    • TopsPlatform_1.4.0.606-e9069e_deb_amd64.run
  • tops-sdk
    • tops-sdk_3.4.107-1_amd64.deb
  • torch-gcu-2
    • torch_gcu-2.5.x_2.5.1+3.4.0_x86_64.run
  • triton-gcu
    • triton-gcu_0.3.0.4-1_amd64.deb
    • triton_gcu-0.3.0.4-py3.10-none-any.whl
  • Vllm 0.6.1
    • vllm-0.6.1.post2+torch.2.5.1.gcu.3.2.20250311-cp39-abi3-linux_x86_64.whl
  • vllm-gcu 0.7.2
    • vllm_gcu-0.7.2+3.4.20250318-cp39-abi3-linux_x86_64.whl
  • xformers
    • xformers-0.0.25+torch.2.5.1.gcu.3.2.20250315-cp310-cp310-linux_x86_64.whl
    • xformers-0.0.28.post3+torch.2.5.1.gcu.3.2.20250317-cp310-cp310-linux_x86_64.whl
  • xfuser
    • xfuser-0.4.1+gcu.3.3.20250331-py3.10-none-any.whl

5.2 TopsRider run 包外的组件信息

  • ffmpeg-gcu
    • ffmpeg-gcu-1.2.4.3-n4.4-1.tar.gz
  • TopsVisualProfiler
    • TopsVisualProfiler_1.4.0.606-e9069e_win64.zi
  • Application run
    • TopsRider_i3x_3.4.107_application.run

5.3 TopsRider ddeb 包组件信息

  • TopsRider_3.4.107_ddeb_amd64.run
    • eccl_3.4.20250416-1_amd64-dbgsym.ddeb
    • eccl-tests_3.4.20250416-1_amd64-dbgsym.ddeb
    • topsaten_3.3.20250402-1_amd64-dbgsym.ddeb
    • topscv_1.2.4.1-20250205-1_amd64-dbgsym.ddeb
    • topsfactor_3.4.107-1_amd64-dbgsym.ddeb
    • tops-inference_3.4.107-1_amd64-dbgsym.ddeb
    • TopsPlatform_1.4.0.606-e9069e_ddeb_amd64.run
    • tops-sdk_3.4.107-1_amd64-dbgsym.ddeb
    • triton-gcu_0.3.0.4-1_amd64-dbgsym.ddeb

6. 操作系统和Python支持

6.1 适配说明

  • Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
  • Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host

6.2 操作系统支持列表

操作系统名称架构内核版本GCCGLIBC说明
Ubuntu 20.04.z(z<=5)x865.4 & 5.11 & 5.13 & 5.159.32.31Host & Docker
Ubuntu 22.04.z (z<=4)x865.1511.22.35Host & Docker
Ubuntu 22.04.z x866.812.32.35仅驱动在 Host 上已适配
Kylin v10x864.19.07.32.28
UOS 20 Serverx864.19.07.32.28
OpenEularX865.10.010.3.12.34
龙蜥 8.2 QU2X864.18.08.3.12.28
龙蜥 8.6X864.19.907.3.02.28
TLinux 4.2X866.6.3012.3.12.38

6.3 Python 支持版本

Python 3.8(只支持 TopsInference 推理框架),Python 3.10

7. 文档更新

7.1 增加文档

《torch_GCU2.5 用户使用手册》

7.2 删除文档

《torch_GCU2.3 用户使用手册》

8. 使用限制

ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
vllm : llama2 7b、llama2 13b、
典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert

Categories:

Tags: