vLLM-gcu用户使用手册¶
vLLM-gcu是适配于燧原S60 gcu的vLLM,用于支持在Enflame gcu上运行各LLM的推理。
目录
- 1. 版权声明
- 2. 总体介绍
- 3. 安装
- 4. 使用指南
- 5. 模型推理及性能评估指南
- 5.1. 已支持的大语言模型列表
- 5.2. baichuan2
- 5.3. chatglm2/3
- 5.4. codellama
- 5.5. dbrx
- 5.6. deepseek
- deepseek-llm-67b-base
- deepseek-llm-67b-chat
- deepseek-moe-16b-base-w4a16
- deepseek-moe-16b-chat
- deepseek-coder-6.7b-base
- DeepSeek-V2-Lite-Chat
- deepseek-moe-16b-base-w8a8c8
- DeepSeek-R1-Distill-Qwen-1.5B
- DeepSeek-R1-Distill-Qwen-7B
- DeepSeek-R1-Distill-Qwen-14B
- DeepSeek-R1-Distill-Qwen-32B
- DeepSeek-R1-Distill-Llama-8B
- DeepSeek-R1-Distill-Llama-70B
- 5.7. gemma
- 5.8. glm4
- 5.9. iFlytekSpark
- 5.10. internlm
- 5.11. llama
- llama2-7b
- llama2-70b
- Meta-Llama-3-8B
- Meta-Llama-3-70B
- llama2-7b-w8a16_gptq
- llama2-13b-w8a16_gptq
- llama2-70b-w8a16_gptq
- llama3-8b-w8a16_gptq
- llama3-70b-w8a16_gptq
- Meta-Llama-3.1-8B-Instruct
- llama2-7b-w4a16
- Meta-Llama-3.1-70B-Instruct
- llama3-70b-w4a16
- llama2-7b-w4a16c8
- llama2-70b-w4a16c8
- Llama-2-13B-chat-GPTQ
- llama2-70b-w8a8c8
- Meta-Llama-3.1-70B-Instruct-w4a16
- llama2_7b_chat_w8a8c8
- Meta-Llama-3.1-70B-Instruct_W8A8C8
- Llama-3.3-70B-Instruct
- 5.12. Mistral
- 5.13. Qwen
- Qwen1.5-7B
- Qwen1.5-32B
- Qwen1.5-72B-Chat
- Qwen1.5-14B-Chat-w8a16_gptq
- Qwen1.5-32B-w8a16_gptq
- Qwen1.5-MoE-A2.7B
- Qwen2-7B
- Qwen-7B-Instruct
- Qwen2-72B-padded-w8a16_gptq
- Qwen2-72B-Instruct
- Qwen2-1.5B-Instruct
- Qwen1.5-4B-Chat
- Qwen1.5-32B-Chat-w8a16_gptq
- Qwen1.5-72B-w8a16_gptq
- Qwen1.5-72B-Chat-w8a16_gptq
- Qwen1.5-32B-w4a16
- qwen2-72b-instruct-gptq-int4
- qwen2-72b-instruct-gptq-int8
- Qwen1.5-32B-w4a16c8
- Qwen2-72B-Instruct-w4a16c8
- qwen1.5-72b-chat-awq
- Qwen2-57B-A14B
- Qwen1.5-110B-Chat-w8a16_gptq
- Qwen1.5-32B-Chat-w4a16c8
- Qwen2-72B-w8a8c8
- Qwen1.5-32B-w8a8c8
- Qwen2.5-32B-Instruct-GPTQ-Int8_w8a16
- Qwen2.5-72B-Instruct-GPTQ-Int8_w8a16
- Qwen2.5-0.5B-Instruct
- 5.14. starcoder
- 5.15. SUS-Chat
- 5.16. WizardCoder
- 5.17. Yi
- 5.18. Ziya-Coding
- 6. 多模态模型