TopsRider v3.3.112 版本适用于 S60 系列设备。下述新增/修改特性和问题修复部分是相对于上一次 发布 Topsrider v3.2.204 的变更。

2. 功能优化

2.1 新增/修改基本特性

  • TopsRidervrun包中增加libtorch_gcu、yunchang、Tensorflow_gcu 2.9/2.13及TopsGraph2 beta版本安装包
  • TopsRiderrun包移除Torch 2.1版本
  • Tensorflow_gcu支持Tanh和Pow算子的支持
  • Tensorflow_gcu支持输出内存放在device上的接口
  • Tensorflow_gcu支持session run stream接口
  • Tensorflow_gcu支持Resourse Gatherop
  • Xdit升级到0.3.2,支持CogVideox-5B的i2v多卡并行
  • vLLM框架升级到0.6.1.post2,支持lora
  • TopsCompressorGCU量化w8a16支持gptq方案
  • TopsGraph2beta版本支持opensora模型
  • ffmpegvtar包增加符号表安装包
  • TopsPlatform 1.2.4.9:
    • SSM FW通过firewall增强对sipexception的保护
    • 驱动对MQD多进程高并发场景下的锁进行优化,提升系统在多任务并发情况下
    • 的稳定性,并优化了MQD数量超限后的异常处理流程
    • 驱动增加进程权限检查,避免不合法的空指针、野指针导致的内核重启
    • 为避免用户使用错误,去除掉efsmi–dmon以及–pmon页面显示的LogicID
    • Runtime提供自定义异步分配能力相关API,以及event重构以提升eventrecord相关性能
    • runtime新增核函数启动过程中对资源参数的检查
    • TopsProfiler新增支持–delay和–duration选项,可按需动态控制profiling启停
    • TopsVisualProfiler新增时间线缩略图功能
    • TopsProfiler新增支持–timeunit选项,可控制console输出的GCU事件的时间单位
    • Topsprof 使能 operator profiling 时默认不再收集 kernel 隐式参数信息 (如果需要 该信息可使用 topsprof –enable-extra-activities runtime/kernel_meta 选项来使能)
  • 默认关闭低功耗 LP2
  • 驱动支持 OS 增加 TLinux 4.2

2.2 新支持模型

2.2.1 LLM

模型名称框架数据类型卡数
qwen1.5-32BvLLM 0.6.1.post2w8a8c82
llama3.1-70BvLLM 0.6.1.post2w8a8c84
deepseek-moe-16b-basevLLM 0.6.1.post2w8a8c81
baichuan2-13BvLLM 0.6.1.post2w8a8c81

2.2.2 多模态

模型名称框架数据类型卡数
lama3-llava-next-8b-hfvLLM 0.6.1.post2FP161
Phi-3-vision-128k-instructvLLM 0.6.1.post2BF161
Qwen2-VL-7B-Instruct-GPTQ-Int4vLLM 0.6.1.post2w4a161
Qwen2-VL-2B-InstructvLLM 0.6.1.post2FP161
MiniCPM-V 2.6vLLM 0.6.1.post2FP161

2.2.3 视频生成

模型名称框架数据类型
CogVideox-5b-t2vPyTorch2.3+xdit0.3.3FP161/2/4
CogVideox-5B-i2v 多卡并行PyTorch2.3+xdit0.3.3BF161/2

2.2.4 传统模型

模型名称框架数据类型卡数
OCRtopsInferenceFP161

3. API 变更信息

相对于上一次 Topsrider v3.2.204 版本 , runtime、算子、hlir_build 的 API 变更信息如下,具体 API 内容请参考对应 API 手册。

3.1 Runtime API 变更

3.1.1 新增API

  1. typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceGetDefaultMemPool
  2. typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceSetMemPool

3.1.2 删除API

  1. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAlloc
  2. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAvailableNumGet
  3. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalFree
  4. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalMaxNumGet
  5. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalRead
  6. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalWrite

3.2 算子API变更

3.2.1 新增API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActTwice
  2. namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool3d
  3. namespace:topsaten TOPSATEN_EXPORT topsatenCorrcoef
  4. namespace:topsaten TOPSATEN_EXPORT topsatenFftC2c
  5. namespace:topsaten TOPSATEN_EXPORT topsatenFftR2c
  6. namespace:topsaten TOPSATEN_EXPORT topsatenIm2col
  7. namespace:topsaten TOPSATEN_EXPORT topsatenIstft
  8. namespace:topsaten TOPSATEN_EXPORT topsatenLinalgMatrixExp
  9. namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3d
  10. namespace:topsaten TOPSATEN_EXPORT topsatenScatterAdd
  11. namespace:topsaten TOPSATEN_EXPORT topsatenStft
  12. namespace:topsaten TOPSATEN_EXPORT topsatenThnnFusedLstmCell
  13. namespace:topsexts TOPSATEN_EXPORT topsextsAttnLayerFusion
  14. namespace:topsexts TOPSATEN_EXPORT topsextsBottleneckNoBN
  15. namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusion
  16. namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusionV2
  17. namespace:topsexts TOPSATEN_EXPORT topsextsConv2dActivationDepthToSpace
  18. namespace:topsexts TOPSATEN_EXPORT topsextsConv2dDepthToSpace
  19. namespace:topsexts TOPSATEN_EXPORT topsextsConvAddActivationDepthToSpace
  20. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationAdd
  21. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2
  22. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2DtS
  23. namespace:topsexts TOPSATEN_EXPORT topsextsConvScaledBiasActivation
  24. namespace:topsexts TOPSATEN_EXPORT topsextsFusedBiasAct
  25. namespace:topsexts TOPSATEN_EXPORT topsextsInplaceAbn
  26. namespace:topsexts TOPSATEN_EXPORT topsextsMaxPool2d
  27. namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAdd
  28. namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAddRelu
  29. namespace:topsexts TOPSATEN_EXPORT topsextsMultiHeadAttentionFusion
  30. namespace:topsexts TOPSATEN_EXPORT topsextsMultiLayerPerceptronFusion
  31. namespace:topsexts TOPSATEN_EXPORT topsextsResizeConvBiasActivationFusion
  32. namespace:topsexts TOPSATEN_EXPORT topsextsSubAddDivMulAddClampSubAdd
  33. namespace:topsexts TOPSATEN_EXPORT topsextsTemperature
  34. namespace:topsexts TOPSATEN_EXPORT topsextsTemperatureSoftmax
  35. namespace:topsexts TOPSATEN_EXPORT topsextsUpfirdn2d

3.2.2 更新的API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActivation
  2. namespace:topsaten TOPSATEN_EXPORT topsatenBaddbmmActivation
  3. namespace:topsaten TOPSATEN_EXPORT topsatenLinearActivation
  4. namespace:topsaten TOPSATEN_EXPORT topsatenLinearQuantActivation
  5. namespace:topsaten TOPSATEN_EXPORT topsatenMatmulActivation
  6. namespace:topsaten TOPSATEN_EXPORT topsatenMedian
  7. namespace:topsaten TOPSATEN_EXPORT topsatenMultiHeadAttentionFusion
  8. namespace:topsaten TOPSATEN_EXPORT topsatenMultiLayerPerceptronFusion
  9. namespace:topsaten TOPSATEN_EXPORT topsatenNanmedian
  10. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
  11. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
  12. namespace:topsaten TOPSATEN_EXPORT topsatenScatter
  13. namespace:topsaten TOPSATEN_EXPORT topsatenTop_k_top_p
  14. namespace:topsaten TOPSATEN_EXPORT topsatenTrapz

3.3 Hilr_build

3.3.1 删除 API

  1. namespace:builder::Op AddQuant
  2. namespace:builder::Op AllReduce
  3. namespace:builder::Op AveragePoolQuant
  4. namespace:builder::Op ConcatQuant
  5. namespace:builder::Op ConvBias
  6. namespace:builder::Op ConvQuant
  7. namespace:builder::Op ConvTransposeQuant
  8. namespace:builder::Op DeformConv
  9. namespace:builder::Op DequantizeLinearQuant
  10. namespace:builder::Op DotGeneralBias
  11. namespace:builder::Op DotGeneralBiasQuant
  12. namespace:builder::Op GeneralSplit
  13. namespace:builder::Op GlobalAveragePoolQuant
  14. namespace:builder::Op LayerNormInference
  15. namespace:builder::Op MulQuant
  16. namespace:builder::Op PartialReduce
  17. namespace:builder::Op QuantConvert
  18. namespace:builder::Op QuantizeLinearQuant
  19. namespace:builder::shared_ptr SequentialMergeBuilder
  20. namespace:builder::Op SubQuant

4.FW信息

FW版本号
S60 SSM FWBoot FW 33.6.5,Runtime FW 33.6.5.28
AP1.1.6
SP2.11.88
VPU3.1.5

5. 组件信息

5.1 TopsRider run包组件信息

Package NameFile
topsplatformTopsPlatform_1.2.4.12-ee064d_deb_amd64.run
tgiai_framework/text-generation-inference/src
sentence-transformerssentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl
vllmvllm-0.6.1.post2+gcu.3.2.20241230-cp38-abi3-linux_x86_64.whl
topscompressortopscompressor-3.3.20241224-py3.8-none-any.whl
topscompressor-3.3.20241224-py3.10-none-any.whl
yunchangyunchang-0.3.6+gcu.3.2.20241212-py3-none-any.whl
xfuserxfuser-0.3.3+gcu.3.1.20241212-py3.10-none-any.whl
topsideastopsideas-3.2.20241115-cp38-cp38-linux_x86_64.whl
topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl
onnxruntime_gcuonnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl
tops-extensiontops_extension-3.2.20241219-cp310-cp310-linux_x86_64.whl
tensorflow_2.13tensorflow_gcu-2.13.1+3.3.0-cp38-cp38-linux_x86_64.whl
tensorflow_gcu-2.13.1+3.3.0-cp310-cp310-linux_x86_64.whl
tensorflow_2.9tensorflow_gcu-2.9.0+3.3.0-cp38-cp38-linux_x86_64.whl
tensorflow_gcu-2.9.0+3.3.0-cp310-cp310-linux_x86_64.whl
paddle-custom-gcupaddle_custom_gcu-3.0.0b1+3.3.0-cp310-cp310-linux_x86_64.whl
onnxruntime_gcuonnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl
xformersxformers-0.0.25+gcu.3.2.20241220-cp310-cp310-linux_x86_64.whl
tops-extensiontops_extension-3.2.20241219-cp38-cp38-linux_x86_64.whl
topsgraphtopsgraph_0.1.20241124-1_amd64.deb
topsfactortopsfactor_3.3.112-1_amd64.deb
topsatentopsaten_3.2.20241227-1_amd64.deb
tops-sdktops-sdk_3.3.112-1_amd64.deb
tops-inferencetops-inference_3.3.112-1_amd64.deb
eccleccl_3.1.20241213-1_amd64.deb
eccl-testseccl-tests_3.1.20241213-1_amd64.deb
topsgraph-pytopsgraph-0.1.20241124-py3.10-none-any.whl
xformersxformers-0.0.25+gcu.3.2.20241220-cp38-cp38-linux_x86_64.whl
TopsInferenceTopsInference-3.3.112-py3.10-none-any.whl
TopsInference-3.3.112-py3.8-none-any.whl
torch-gcu-2torch_gcu-2.3.0_3.2.3_x86_64.run
fast-diffusersfast_diffusers-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
fast_diffusers-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
fast-diffusers-utilsfast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
libtorchai_framework/torch_gcu/libtorch_gcu

    5.2 TopsRider run 包外的组件信息

    Package NameFile
    ffmpeg-gcuffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64.deb
    ffmpeg-gcu-1.2.3.7-20241120-n4.4-1.x86_64.rpm
    ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64-dbgsym.ddeb
    TopsVisualProfilerTopsVisualProfiler_1.2.4.12-ee064d_win64.zip
    Application runTopsRider_i3x_3.3.112_application.run

      5.3 TopsRider ddeb 包组件信息

      Package NameFile
      TopsRider_3.3.112_ddeb_amd64.runeccl_3.1.20241213-1_amd64-dbgsym.ddeb
      eccl-tests_3.1.20241213-1_amd64-dbgsym.ddeb
      topsaten_3.2.20241227-1_amd64-dbgsym.ddeb
      topscv_1.2.2.15-20241112-1_amd64-dbgsym.ddeb
      topsfactor_3.3.112-1_amd64-dbgsym.ddeb
      tops-inference_3.3.112-1_amd64-dbgsym.ddeb
      TopsPlatform_1.2.4.12-ee064d_ddeb_amd64.run
      tops-sdk_3.3.112-1_amd64-dbgsym.ddeb

        6. 操作系统和Python支持

        6.1 适配说明

        • Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
        • Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host

        6.2 操作系统支持列表

        操作系统名称架构内核版本GCCGLIBC说明
        Ubuntu 20.04.z(z<=5)x865.4 & 5.11 & 5.13 & 5.159.32.31Host & Docker
        Ubuntu 22.04.z (z<=1)x865.1511.22.35Host & Docker
        Kylin v10x864.19.07.32.28仅驱动在 Host 上已适
        UOS 20 Serverx864.19.07.32.28
        OpenEularX865.10.010.3.12.34
        龙蜥 8.2 QU2X864.18.08.3.12.28
        龙蜥 8.6X864.19.907.3.02.28
        TLinux 4.2X866.6.3012.3.12.38

        6.3 Python 支持版本

        Python 3.8(TopsInference 推理模型),Python 3.10

        7. 文档更新

        7.1 增加文档

        《TopsGraph Python API 参考》

        7.2 删除文档

        《torch_GCU2.1 用户使用手册》 《Library Kernel API 参考》

        8. 使用限制

        ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
        vllm : llama2 7b、llama2 13b、
        典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert

        Categories:

        Tags: