DiT 系列模型多卡推理¶

概述¶

本文档介绍在 Enflame GCU 上基于 pytorch native 进行 sdxl 的 text2image, hunyuandit1.2 的 text2image 多卡推理过程。

环境配置¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装环境：安装过程请参考《TopsRider 软件栈安装手册》，请根据手册完成 TopsRider 软件栈安装

安装 torch_gcu

注意：安装 torch_gcu-2.3.0 会自动安装 torch 2.3.0
```
pip3 install torch_gcu-2.3.0*-cp310-cp310-linux_x86_64.whl
```

安装 xfuser

pip3 install xfuser-0.2+gcu.*-py3.10-none-any.whl

如果缺失其它依赖，请根据提示安装

stable-diffusion-xl 多卡推理¶

准备模型¶

下载预训练模型：

请从 stable-diffusion-xl-base-1.0 路径下下载全部内容到模型存放目录，以下用

path_to_model_dir 表示其路径
- branch: main
- commit id: 4621659

执行推理¶

Text2Image 两卡 patch parallel 推理¶

torchrun --nproc_per_node=2 -m xfuser.xfuser_utils.example.stable_diffuision_xl.sdxl_example \
--model_dir ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--denoising_steps 30 \
--scheduler 'ddim' \
--warmup_steps 4 \
--sync_mode corrected_async_gn \
--parallelism patch \
--image_height 1024 \
--image_width 1024 \
--output_dir './results/sdxl-base/text2img'

hunyuandit1.2 多卡推理¶

准备模型¶

下载预训练模型：

请从 HunyuanDiT-v1.2-Diffusers 路径下下载全部内容到模型存放目录，以下用

path_to_model_dir 表示其路径
- branch: main
- commit id: 5e96094e0ad19e7f475de8711f03634ca0ccc40c

执行推理¶

Text2Image 两卡 CFG parallel 推理¶

torchrun --nproc_per_node=2 -m xfuser.xfuser_utils.example.hunyuandit.hunyuandit_example \
--model ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--num_inference_steps  30 \
--warmup_steps 4 \
--use_cfg_parallel \
--pipefusion_parallel_degree 1 \
--height 1024 \
--width 1024 \
--no_use_resolution_binning \
--output_dir './results/hunyuandit/text2img'

Text2Image 四卡 pipefusion parallel 推理¶

torchrun --nproc_per_node=4 -m xfuser.xfuser_utils.example.hunyuandit.hunyuandit_example \
--model ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--num_inference_steps  30 \
--warmup_steps 4 \
--use_cfg_parallel \
--pipefusion_parallel_degree 2 \
--height 1024 \
--width 1024 \
--no_use_resolution_binning \
--output_dir './results/hunyuandit/text2img'

DiT 系列模型多卡性能评估¶

概述¶

本文档介绍在 Enflame GCU 上基于 pytorch native 进行 sdxl 的 text2image, hunyuandit1.2 的 text2image 多卡性能测试方法。

环境配置¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装环境：安装过程请参考《TopsRider 软件栈安装手册》，请根据手册完成 TopsRider 软件栈安装

安装 torch_gcu

注意：安装 torch_gcu-2.3.0 会自动安装 torch 2.3.0
```
pip3 install torch_gcu-2.3.0*-cp310-cp310-linux_x86_64.whl
```

安装 xfuser

pip3 install xfuser-0.2+gcu.*-py3.10-none-any.whl

如果缺失其它依赖，请根据提示安装

stable-diffusion-xl 多卡性能评估¶

准备模型¶

下载预训练模型：

请从 stable-diffusion-xl-base-1.0 路径下下载全部内容到模型存放目录，以下用

path_to_model_dir 表示其路径
- branch: main
- commit id: 4621659

执行性能评估¶

Text2Image 两卡 patch parallel 性能评估¶

torchrun --nproc_per_node=2 -m xfuser.xfuser_utils.benchmark.stable_diffuision_xl.benchmark_test_stable_diffuision_xl_txt2img \
--model_dir ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--denoising_steps 30 \
--scheduler 'ddim' \
--warmup_steps 4 \
--sync_mode corrected_async_gn \
--parallelism patch \
--image_height 1024 \
--image_width 1024 \
--output_dir './results/sdxl-base/text2img' \
--warmup_count 2 \
--eval_count 3

hunyuandit1.2 多卡性能评估¶

准备模型¶

下载预训练模型：

请从 HunyuanDiT-v1.2-Diffusers 路径下下载全部内容到模型存放目录，以下用

path_to_model_dir 表示其路径
- branch: main
- commit id: 5e96094e0ad19e7f475de8711f03634ca0ccc40c

执行性能评估¶

Text2Image 两卡 CFG parallel 性能评估¶

torchrun --nproc_per_node=2 -m xfuser.xfuser_utils.benchmark.hunyuandit.benchmark_test_hunyuandit_txt2img \
--model ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--num_inference_steps  30 \
--warmup_steps 4 \
--use_cfg_parallel \
--pipefusion_parallel_degree 1 \
--height 1024 \
--width 1024 \
--no_use_resolution_binning \
--output_dir './results/hunyuandit/text2img' \
--warmup_count 2 \
--eval_count 3

Text2Image 四卡 pipefusion parallel 性能评估¶

torchrun --nproc_per_node=4 -m xfuser.xfuser_utils.benchmark.hunyuandit.benchmark_test_hunyuandit_txt2img \
--model ${path_to_model_dir} \
--device 'gcu' \
--prompt 'photo of an astronaut riding a horse on mars' \
--negative_prompt '' \
--seed 12345 \
--num_inference_steps  30 \
--warmup_steps 4 \
--use_cfg_parallel \
--pipefusion_parallel_degree 2 \
--height 1024 \
--width 1024 \
--no_use_resolution_binning \
--output_dir './results/hunyuandit/text2img' \
--warmup_count 2 \
--eval_count 3