1. 简介

TopsRider v3.3.112版本适用于S60系列设备。下述新增/修改特性和问题修复部分是相对于上一次发布TopsRider v3.2.204的变更

2. 功能优化

2.1 新增/修改基本特性

  • TopsRider run包中增加libtorch_gcu 、yunchang、Tensorflow_gcu 2.9/2.13及TopsGraph2 beta版本安装包
  • TopsRider run包移除Torch 2.1版本
  • Tensorflow_gcu支持Tanh和Pow算子的支持
  • Tensorflow_gcu支持输出内存放在device上的接口
  • Tensorflow_gcu 支持session run stream 接口
  • Tensorflow_gcu支持Resourse Gather op
  • Xdit升级到0.3.2,支持CogVideox-5B的i2v多卡并行
  • vLLM框架升级到0.6.1.post2,支持lora
  • TopsCompressor GCU量化w8a16支持gptq方案
  • TopsGraph2 beta版本支持 opensora模型
  • ffmpeg tar包增加符号表安装包
  • 默认关闭低功耗LP2
  • 驱动支持OS增加TLinux 4.2
  • TopsPlatform 1.2.4.9:
    • SSM FW通过firewall增强对sip exception的保护
    • 驱动对MQD多进程高并发场景下的锁进行优化,提升系统在多任务并发情况下的稳定性,并优化了MQD数量超限后的异常处理流程
    • 驱动增加进程权限检查,避免不合法的空指针、野指针导致的内核重启
    • 为避免用户使用错误,去除掉efsmi –dmon以及–pmon页面显示的Logic ID
    • Runtime提供自定义异步分配能力相关API,以及event重构以提升event record相关性能
    • runtime新增核函数启动过程中对资源参数的检查
    • TopsProfiler新增支持-delay和-duration选项,可按需动态控制profiling启停
    • TopsVisualProfiler新增时间线缩略图功能
    • TopsProfiler新增支持-timeunit选项,可控制console输出的GCU事件的时间单位
    • Topsprof使能operator profiling时默认不再收集kernel隐式参数信息 (如果需要该信息可使用topsprof -enable-extra-activities runtime/kernel_meta选项来使能)

2.2 新支持模型

2.2.1 LLM-W8A8

模型名称框架数据类型卡数
qwen1.5-32BvLLM 0.6.1.post2w8a8c82
llama3.1-70BvLLM 0.6.1.post2w8a8c84
deepseek-moe-16b-basevLLM 0.6.1.post2w8a8c81
baichuan2-13BvLLM 0.6.1.post2w8a8c81

2.2.2 多模态

模型名称框架数据类型卡数
lama3-llava-next-8b-hfvLLM 0.6.1.post2FP161
Phi-3-vision-128k-instructvLLM 0.6.1.post2BF161
Qwen2-VL-7B-Instruct-GPTQ-Int4vLLM 0.6.1.post2w4a161
Qwen2-VL-2B-InstructvLLM 0.6.1.post2FP161
MiniCPM-V 2.6vLLM 0.6.1.post2FP161

2.2.3 视频生成

模型名称框架数据类型卡数
CogVideox-5b-t2vPyTorch2.3+xdit0.3.3FP161/2/4
CogVideox-5B-i2v 多卡并行PyTorch2.3+xdit0.3.3BF161/2

2.2.4 传统模型

模型名称框架数据类型卡数
OCRtopsInferenceFP161

3 API变更信息

相对于上一次Topsrider v3.2.204版本 , runtime、算子、hlir_build的API变更信息如下,具体API内容请参考对应API手册。

3.1 Runtime API 变更

3.1.1 新增API

  1. typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceGetDefaultMemPool
  2. typeref:typename:TOPS_PUBLIC_API topsError_t topsDeviceSetMemPool

3.1.2 删除API

  1. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAlloc
  2. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalAvailableNumGet
  3. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalFree
  4. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalMaxNumGet
  5. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalRead
  6. typeref:typename:TOPS_PUBLIC_API topsError_t topsKernelSignalWrite

3.2 算子API变更

3.2.1 新增API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActTwice
  2. namespace:topsaten TOPSATEN_EXPORT topsatenAvgPool3d
  3. namespace:topsaten TOPSATEN_EXPORT topsatenCorrcoef
  4. namespace:topsaten TOPSATEN_EXPORT topsatenFftC2c
  5. namespace:topsaten TOPSATEN_EXPORT topsatenFftR2c
  6. namespace:topsaten TOPSATEN_EXPORT topsatenIm2col
  7. namespace:topsaten TOPSATEN_EXPORT topsatenIstft
  8. namespace:topsaten TOPSATEN_EXPORT topsatenLinalgMatrixExp
  9. namespace:topsaten TOPSATEN_EXPORT topsatenMaxPool3d
  10. namespace:topsaten TOPSATEN_EXPORT topsatenScatterAdd
  11. namespace:topsaten TOPSATEN_EXPORT topsatenStft
  12. namespace:topsaten TOPSATEN_EXPORT topsatenThnnFusedLstmCell
  13. namespace:topsexts TOPSATEN_EXPORT topsextsAttnLayerFusion
  14. namespace:topsexts TOPSATEN_EXPORT topsextsBottleneckNoBN
  15. namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusion
  16. namespace:topsexts TOPSATEN_EXPORT topsextsConcatConvBiasActivationFusionV2
  17. namespace:topsexts TOPSATEN_EXPORT topsextsConv2dActivationDepthToSpace
  18. namespace:topsexts TOPSATEN_EXPORT topsextsConv2dDepthToSpace
  19. namespace:topsexts TOPSATEN_EXPORT topsextsConvAddActivationDepthToSpace
  20. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationAdd
  21. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2
  22. namespace:topsexts TOPSATEN_EXPORT topsextsConvBiasActivationFusionV2DtS
  23. namespace:topsexts TOPSATEN_EXPORT topsextsConvScaledBiasActivation
  24. namespace:topsexts TOPSATEN_EXPORT topsextsFusedBiasAct
  25. namespace:topsexts TOPSATEN_EXPORT topsextsInplaceAbn
  26. namespace:topsexts TOPSATEN_EXPORT topsextsMaxPool2d
  27. namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAdd
  28. namespace:topsexts TOPSATEN_EXPORT topsextsMulAddAddRelu
  29. namespace:topsexts TOPSATEN_EXPORT topsextsMultiHeadAttentionFusion
  30. namespace:topsexts TOPSATEN_EXPORT topsextsMultiLayerPerceptronFusion
  31. namespace:topsexts TOPSATEN_EXPORT topsextsResizeConvBiasActivationFusion
  32. namespace:topsexts TOPSATEN_EXPORT topsextsSubAddDivMulAddClampSubAdd
  33. namespace:topsexts TOPSATEN_EXPORT topsextsTemperature
  34. namespace:topsexts TOPSATEN_EXPORT topsextsTemperatureSoftmax
  35. namespace:topsexts TOPSATEN_EXPORT topsextsUpfirdn2d

3.2.2 更新的API

  1. namespace:topsaten TOPSATEN_EXPORT topsatenAddmmActivation
  2. namespace:topsaten TOPSATEN_EXPORT topsatenBaddbmmActivation
  3. namespace:topsaten TOPSATEN_EXPORT topsatenLinearActivation
  4. namespace:topsaten TOPSATEN_EXPORT topsatenLinearQuantActivation
  5. namespace:topsaten TOPSATEN_EXPORT topsatenMatmulActivation
  6. namespace:topsaten TOPSATEN_EXPORT topsatenMedian
  7. namespace:topsaten TOPSATEN_EXPORT topsatenMultiHeadAttentionFusion
  8. namespace:topsaten TOPSATEN_EXPORT topsatenMultiLayerPerceptronFusion
  9. namespace:topsaten TOPSATEN_EXPORT topsatenNanmedian
  10. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductEfficientAttention
  11. namespace:topsaten TOPSATEN_EXPORT topsatenScaledDotProductFlashAttention
  12. namespace:topsaten TOPSATEN_EXPORT topsatenScatter
  13. namespace:topsaten TOPSATEN_EXPORT topsatenTop_k_top_p
  14. namespace:topsaten TOPSATEN_EXPORT topsatenTrapz

3.3 Hilr_build

3.3.1 删除API

  1. namespace:builder::Op AddQuant
  2. namespace:builder::Op AllReduce
  3. namespace:builder::Op AveragePoolQuant
  4. namespace:builder::Op ConcatQuant
  5. namespace:builder::Op ConvBias
  6. namespace:builder::Op ConvQuant
  7. namespace:builder::Op ConvTransposeQuant
  8. namespace:builder::Op DeformConv
  9. namespace:builder::Op DequantizeLinearQuant
  10. namespace:builder::Op DotGeneralBias
  11. namespace:builder::Op DotGeneralBiasQuant
  12. namespace:builder::Op GeneralSplit
  13. namespace:builder::Op GlobalAveragePoolQuant
  14. namespace:builder::Op LayerNormInference
  15. namespace:builder::Op MulQuant
  16. namespace:builder::Op PartialReduce
  17. namespace:builder::Op QuantConvert
  18. namespace:builder::Op QuantizeLinearQuant
  19. namespace:builder::shared_ptr SequentialMergeBuilder
  20. namespace:builder::Op SubQuant

4.FW信息

FW版本号
S60 SSM FWBoot FW 33.6.5,Runtime FW 33.6.5.28
AP1.1.6
SP2.11.88
VPU3.1.5

5. 组件信息

5.1 TopsRider run包组件信息

  • topsplatform
    • TopsPlatform_1.2.4.12-ee064d_deb_amd64.run
  • tgi
    • ai_framework/text-generation-inference/src
  • sentence-transformers
    • sentence_transformers-2.7.0+gcu.3.2.20240805-py3-none-any.whl
  • vllm
    • vllm-0.6.1.post2+gcu.3.2.20241230-cp38-abi3-linux_x86_64.whl
  • topscompressor
    • topscompressor-3.3.20241224-py3.8-none-any.whl
    • topscompressor-3.3.20241224-py3.10-none-any.whl
  • yunchang
    • yunchang-0.3.6+gcu.3.2.20241212-py3-none-any.whl
  • xfuser
    • xfuser-0.3.3+gcu.3.1.20241212-py3.10-none-any.whl
  • topsideas
    • topsideas-3.2.20241115-cp38-cp38-linux_x86_64.whl
    • topsideas-3.2.20241115-cp310-cp310-linux_x86_64.whl
  • onnxruntime_gcu
    • onnxruntime_gcu-1.9.1+3.1.0-cp38-cp38-linux_x86_64.whl
  • tops-extension
    • tops_extension-3.2.20241219-cp310-cp310-linux_x86_64.whl
  • tensorflow_2.13
    • tensorflow_gcu-2.13.1+3.3.0-cp38-cp38-linux_x86_64.whl
    • tensorflow_gcu-2.13.1+3.3.0-cp310-cp310-linux_x86_64.whl
  • tensorflow_2.9
    • tensorflow_gcu-2.9.0+3.3.0-cp38-cp38-linux_x86_64.whl
    • tensorflow_gcu-2.9.0+3.3.0-cp310-cp310-linux_x86_64.whl
  • paddle-custom-gcu
    • paddle_custom_gcu-3.0.0b1+3.3.0-cp310-cp310-linux_x86_64.whl
  • onnxruntime_gcu
    • onnxruntime_gcu-1.9.1+3.1.0-cp310-cp310-linux_x86_64.whl
  • xformers
    • xformers-0.0.25+gcu.3.2.20241220-cp310-cp310-linux_x86_64.whl
  • tops-extension
    • tops_extension-3.2.20241219-cp38-cp38-linux_x86_64.whl
  • topsgraph
    • topsgraph_0.1.20241124-1_amd64.deb
  • topsfactor
    • topsfactor_3.3.112-1_amd64.deb
  • topsaten
    • topsaten_3.2.20241227-1_amd64.deb
  • tops-sdk
    • tops-sdk_3.3.112-1_amd64.deb
  • tops-inference
    • tops-inference_3.3.112-1_amd64.deb
  • eccl
    • eccl_3.1.20241213-1_amd64.deb
  • eccl-tests
    • eccl-tests_3.1.20241213-1_amd64.deb
  • topsgraph-py
    • topsgraph-0.1.20241124-py3.10-none-any.whl
  • xformers
    • xformers-0.0.25+gcu.3.2.20241220-cp38-cp38-linux_x86_64.whl
  • TopsInference
    • TopsInference-3.3.112-py3.10-none-any.whl
    • TopsInference-3.3.112-py3.8-none-any.whl
  • torch-gcu-2
    • torch_gcu-2.3.0_3.2.3_x86_64.run
  • fast-diffusers
    • fast_diffusers-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
    • fast_diffusers-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
  • fast-diffusers-utils
    • fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.8-none-any.whl
    • fast_diffusers_utils-0.29.2+gcu.3.2.20250102-py3.10-none-any.whl
  • libtorch
    • ai_framework/torch_gcu/libtorch_gcu

5.2 TopsRider run 包外的组件信息

  • ffmpeg-gcu
    • ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64.deb
    • ffmpeg-gcu-1.2.3.7-20241120-n4.4-1.x86_64.rpm
    • ffmpeg-gcu_1.2.3.7-20241120-n4.4-1_amd64-dbgsym.ddeb
  • TopsVisualProfiler
    • TopsVisualProfiler_1.2.4.12-ee064d_win64.zip
  • Application run
    • TopsRider_i3x_3.3.112_application.run

5.3 TopsRider ddeb 包组件信息

  • TopsRider_3.3.112_ddeb_amd64.run
    • eccl_3.1.20241213-1_amd64-dbgsym.ddeb
    • eccl-tests_3.1.20241213-1_amd64-dbgsym.ddeb
    • topsaten_3.2.20241227-1_amd64-dbgsym.ddeb
    • topscv_1.2.2.15-20241112-1_amd64-dbgsym.ddeb
    • topsfactor_3.3.112-1_amd64-dbgsym.ddeb
    • tops-inference_3.3.112-1_amd64-dbgsym.ddeb
    • TopsPlatform_1.2.4.12-ee064d_ddeb_amd64.run
    • tops-sdk_3.3.112-1_amd64-dbgsym.ddeb

6. 操作系统和Python支持

6.1 适配说明

  • Host 环境:仅Enflame Driver 对此 OS 环境做兼容适配,Docker 运行 Ubuntu
  • Docker 环境:软件栈功能已做适配测试,需使用相同OS 的 Host

6.2 操作系统支持列表

操作系统名称架构内核版本GCCGLIBC说明
Ubuntu 20.04.z(z<=5)x865.4 & 5.11 & 5.13 & 5.159.32.31Host & Docker
Ubuntu 22.04.z (z<=1)x865.1511.22.35Host & Docker
Kylin v10x864.19.07.32.28仅驱动在 Host 上已适配
UOS 20 Serverx864.19.07.32.28
OpenEularX865.10.010.3.12.34
龙蜥 8.2 QU2X864.18.08.3.12.28
龙蜥 8.6X864.19.907.3.02.28
TLinux 4.2X866.6.3012.3.12.38

6.3 Python 支持版本

Python 3.8(TopsInference 推理模型),Python 3.10

7. 文档更新

7.1 增加文档

《TopsGraph Python API参考》《paddle-custom-gcu3.0用户使用手册》《paddle-custom-gcu3.0算子支持列表》《Node Problem Detector用户使用手册》

7.2 删除文档

《torch_GCU2.1用户使用手册》《Library Kernel API参考》

8. 使用限制

ARM 平台目前只覆盖了单卡环境,覆盖模型如下:
vllm : llama2 7b、llama2 13b、
典型传统模型 : resnet50 v1.5、yolov5m、vit_b 、vit_l、bert

Categories:

Tags: