4.3. Qwen 系列¶

介绍¶

Qwen 系列大模型是由阿里巴巴集团旗下的通义实验室（Tongyi Lab）研发的一系列大型语言模型。该系列以“通义千问 ”为核心产品名称，覆盖了从基础语言理解、文本生成到多模态、代码生成等广泛场景。

Qwen2.5-VL-3B-Instruct¶

本模型推理及性能测试需要1张enflame gcu。

模型下载¶

url: Qwen2.5-VL-3B-Instruct
branch: main
commit id: 66285546d2b821cf421d4f5eb2576359d3770cd3

将上述url路径下的内容下载到本地Qwen2.5-VL-3B-Instruct文件夹中。

requirements¶

python3 -m pip install transformers==4.48.3 datasets==3.5.0

export ENFLAME_TORCH_GCU_ENABLE_AUTO_MIGRATION=1

注：

环境要求：python3.10；transformers == 4.48.3；

环境变量ENFLAME_TORCH_GCU_ENABLE_AUTO_MIGRATION=1用于开启torch-gcu自动迁移功能；

online推理示例¶

启动server

python3 -m sglang.launch_server --model-path [ path of Qwen2.5-VL-3B-Instruct ] --host 0.0.0.0 --port 8089  --dp-size 1 --tp-size 1 --trust-remote-code

注：

--port：可以配置为本机未被占用的任意端口；

--context-length：可以配置模型可生成最大token的数量；

client发起请求

import requests
from sglang.utils import print_highlight

url = f"http://localhost:8089/v1/chat/completions"
data = {
    "model": "[ path of Qwen2.5-VL-3B-Instruct ]",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
}

response = requests.post(url, json=data)
print(response.json())

性能测试¶

参考online示例，先启动sglang server.

启动server

python3 -m sglang.launch_server --model-path [ path of Qwen2.5-VL-3B-Instruct ] --host 0.0.0.0 --port 8089  --dp-size 1 --tp-size 1 --trust-remote-code

性能测试

python3 -m sglang.gcu_bench_serving --backend sglang --dataset-name random --random-range-ratio 1.0 --num-prompts 16 --random-input-len 1024 --random-output-len 1024   --host 0.0.0.0 --port 8089  --dataset-path [ path of dataset ]

注：

可以通过--dataset-path指定数据集的存储路径，否则默认从/tmp/下读取;

默认使用sharegpt数据集：
数据集文件名为：ShareGPT_V3_unfiltered_cleaned_split.json；
下载地址：https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/
ShareGPT_V3_unfiltered_cleaned_split.json；
可以通过--dataset-name指定其他数据集；

可以通过 --num-prompts、--random-input-len、--random-output-len 等参数自定义测试规模和输入输出长度。详细参数和用法见官方文档和脚本注释；

精度验证¶

此处以MMMU精度验证为例说明，如需了解更多详细信息或其他基准测试，请参考sglang/benchmark下每个特定基准测试文件夹中的 README 文件。

数据集准备

下载地址:
- url: MMMU
上述url路径下的内容会在执行精度测试文件benchmark/mmmu/bench_sglang.py时下载到本地MMMU/MMMU文件夹中。
- 下载sglang 源码 https://github.com/sgl-project/sglang
启动server

python3 -m sglang.launch_server \
--model-path [ path of Qwen2.5-VL-3B-Instruct ] \  # Model selection
--mem-fraction-static 0.8 --port 30000 \ # Network configuration

精度测试

python sglang/benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 16 --dataset-path [ path of MMMU ]

结果查看

# 测试结果存储在当前目录下名为 val_sglang.json 的文件中。
cat ./val_sglang.json | grep -oP '"acc": \K\d+\.\d+'