V2.1¶

概述¶

本文档介绍如何基于提供的代码，进行 stable diffusion v1_5/v2_base/v2_1 在融合与不融合 lora 两种情况下进行 text2image 以及 image2image 任务的推理、评估。

其中，

sd_v1_5 支持:
- 512x512, 512x680, 680x512, 512x1024, 576x1024, 1024x576 共 6 种分辨率（Image Height x Image Width）
- lora 和非 lora
- text2img 和 image2image
sd_v2_base 支持：
- 512x512 共 1 种分辨率
- 非 lora
- text2img 和 image2image
sd_v2_1 支持：
- 768x768 共 1 种分辨率
- lora 和非 lora
- text2img 和 image2image

推理步骤¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖
```
pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu
```
注意：请确保使用了指定的 torch 1.13.1+cpu 和 diffusers 0.21.0 版本，否则会产生错误。

步骤 1: 下载文件¶

测试 stable diffusion 相关功能，请从技术支持处获取 sd v1_5/v2_base/v2_1 相关 onnx 模型，请参考以下命令将相关 onnx 模型及其它相关文件下载到您需要保存模型的位置：
```
path_to_onnx_models=/your/onnx/path
```
从技术支持处获取 sd_v1_5 与 lora 对应部分的权重文件，将其移动到 ${path_to_onnx_model}/unet 下：
```
mv partial_refitting_layers.safetensors ${path_to_onnx_models}/v1-5/huggingface_diffusers/unet
```

从技术支持处获取 sd_v2_1 与 lora 对应部分的权重文件，并将其移动到 ${path_to_onnx_model}/unet 下：

mv partial_refitting_layers.safetensors ${path_to_onnx_models}/v2-1/huggingface_diffusers/unet

形成类似如下目录结构：

/your/onnx/path
├── v1-5
│   └── huggingface_diffusers
│       ├── model_index.json
│       ├── safety_checker
│       │   └── stable_diffusion_v1-5_safety_checker-hf-op13-fp32.onnx
│       ├── scheduler
│       │   └── scheduler_config.json
│       ├── stable_diffusion_v1-5-huggingface_diffusers-op14-fp32-N.md
│       ├── text_encoder
│       │   └── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp32-N.onnx
│       ├── tokenizer
│       │   ├── merges.txt
│       │   ├── special_tokens_map.json
│       │   ├── tokenizer_config.json
│       │   └── vocab.json
│       ├── unet
│       │   ├── partial_refitting_layers.safetensors
│       │   ├── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp32-N.onnx
│       │   └── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp32-N.weights
│       ├── vae_decoder
│       │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx
│       └── vae_encoder
│           └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp32-N.onnx

准备 pokemon lora 并解压:

sd_v1_5_lora 目录结构如下：

sd_v1_5_lora
├── mengx_girlmix_lora
│   └── unet
│       └── adapter_model.safetensors
└── pokemon
    └── unet
        └── adapter_model.safetensors

步骤 2: 编译 engine¶

准备好上述 onnx 模型及其它相关文件后，请执行以下命令编译 engine：

python3 -m stable_diffusion.scripts.stable_diffusion.generate_onnx_engine \
--model "" \
--model_type sd_v1_5 \
--output_path ${path_to_onnx_models}/v1-5/huggingface_diffusers \
--gcu 0 \
--resolutions 512x512 \
--need_refit_engine only_refit

其中，

--model: 如果已经获得onnx，仅需编engine，则将–model设置为空字符；如需导出onnx，则–model设置为torch模型所在目录。如果本地没有torch模型，则将–model设置为 [runwayml/stable-diffusion-v1-5, stabilityai/stable-diffusion-2-base, stabilityai/stable-diffusion-2-1]之一，这将首先从缓存寻找对应预训练模型，如果不存在则会从hugginface下载torch模型并缓存到本地。
--model_type: 有 [sd_v1_5, sd_v2_base, sd_v2_1] 3 种选择。需根据 --model 参数的设置做相应的选择。
--export_onnx: 控制是否需要重新导出 onnx 文件，如果已经获得onnx，无需重新导出 onnx, 则不添加此参数；如果需要从torch模型导出onnx，则需要正确设置–model参数，并添加 --export_onnx参数。
--output_path: 设置生成的 engine 以及导出的 onnx 文件的存储路径。
--resolutions: 设置所需编的engine的分辨率，对于sd_v1_5支持{512x512, 512x680, 512x1024, 576x1024, 680x512, 1024x576}，如果设置多个分辨率，各分辨率之间需要以英文逗号”,”分隔开；sd_v2_base支持512x512分辨率；sd_v2_1支持768x768分辨率。

--need_refit_engine: 有 [only_refit, only_unrefit, both] 3 种选择。only_refit表示只编可以refit的engine(融合lora需要用到), only_unrefit表示只编不可refit的engine，both表示2种engine都编，默认为only_refit。

以sd_v1_5 512x512、680x512分辨率为例，形成的目录结构类似如下：

${path_to_onnx_models}/v1-5/huggingface_diffusers
      ├── 512x512
      │   ├── model_index.json
      │   ├── safety_checker
      │   ├── scheduler
      │   │   └── scheduler_config.json
      │   ├── text_encoder
      │   │   ├── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp16_mix-N.bin
      │   │   └── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp16_mix-N-refit.bin
      │   ├── tokenizer
      │   │   ├── merges.txt
      │   │   ├── special_tokens_map.json
      │   │   ├── tokenizer_config.json
      │   │   └── vocab.json
      │   ├── unet
      │   │   ├── layer_mapping.json
      │   │   ├── partial_refitting_layers.safetensors
      │   │   ├── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp16_mix-N.bin
      │   │   └── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp16_mix-N-refit.bin
      │   ├── vae_decoder
      │   │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp16_mix-N.bin
      │   └── vae_encoder
      │       └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp16_mix-N.bin
      ├── 680x512
      │   ├── model_index.json
      │   ├── safety_checker
      │   ├── scheduler
      │   │   └── scheduler_config.json
      │   ├── text_encoder
      │   │   ├── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp16_mix-N.bin
      │   │   └── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp16_mix-N-refit.bin
      │   ├── tokenizer
      │   │   ├── merges.txt
      │   │   ├── special_tokens_map.json
      │   │   ├── tokenizer_config.json
      │   │   └── vocab.json
      │   ├── unet
      │   │   ├── layer_mapping.json
      │   │   ├── partial_refitting_layers.safetensors
      │   │   ├── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp16_mix-N.bin
      │   │   └── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp16_mix-N-refit.bin
      │   ├── vae_decoder
      │   │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp16_mix-N.bin
      │   └── vae_encoder
      │       └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp16_mix-N.bin
      ├── model_index.json
      ├── safety_checker
      │   └── stable_diffusion_v1-5_safety_checker-hf-op13-fp32.onnx
      ├── scheduler
      │   └── scheduler_config.json
      ├── text_encoder
      │   └── stable_diffusion_v1-5_text_encoder-huggingface-diffusers-op14-fp32-N.onnx
      ├── tokenizer
      │   ├── merges.txt
      │   ├── special_tokens_map.json
      │   ├── tokenizer_config.json
      │   └── vocab.json
      ├── unet
      │   ├── partial_refitting_layers.safetensors
      │   ├── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp32-N.onnx
      │   └── stable_diffusion_v1-5_unet-huggingface-diffusers-op14-fp32-N.weights
      ├── vae_decoder
      │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx
      └── vae_encoder
              └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp32-N.onnx

步骤 3: 推理运行¶

请运行下述脚本，分别尝试原生/融合 lora 的 text2image, image2image 的推理功能：

首先，设置模型和 lora 路径(注意 lora 的路径是截止到 pokemon，而不是到 adapter_model.safetensors)，以及图片保存路径：

以 sd_v1_5 为例

model_base=${path_to_onnx_models}/v1-5/huggingface_diffusers
path_to_pokemon_lora=/path/to/sd_v1_5_lora/pokemon/
gcu_results=./results/text2image/tops_gcu_sd_v1_5_temp

Text2Image 任务¶

使用 sd_v1_5/sd_v2_base/sd_v2_1 进行推理：

python3 -m stable_diffusion.examples.stable_diffusion.demo_text2image_topsinference \
--model ${model_base} \
--lora ${path_to_pokemon_lora}:1.0 \  # optional
--image_num 16 \
--prompt "cute dragon creature" \
--negative_prompt "nsfw" \
--output ${gcu_results} \
--gcu 0 \
--model_type sd_v1_5 \
--image_height 512 \
--image_width 512 \
--platform general \
--scheduler ddim \
--denoising_steps 20 \
--seed 1111111

注意：如果不设置 --lora 参数或者设置为 ''，则进行原生推理。如果设置了 --lora 参数，则在融合 lora 后进行推理。

其中，

--model 参数为前面步骤生成的 engine 以及相关配置文件。
--lora: [optional] lora 路径以及其融合强度。路径和强度之间以 : 分隔。
--model_type: 有 [sd_v1_5, sd_v2_base, sd_v2_1] 共 3 种 sd 版本选择。
--image_height: 生成图片的高度，需与 --model 指定路径的分辨率相匹配。
--image_width: 生成图片的宽度，需与 --model 指定路径的分辨率相匹配。

其他参数及其含义请使用 python3 -m stable_diffusion.examples.stable_diffusion.demo_text2image_topsinference -h 进行查看。

Image2Image 任务¶

使用 sd_v1_5/sd_v2_base/sd_v2_1 进行推理：

python3 -m stable_diffusion.examples.stable_diffusion.demo_image2image_topsinference \
--model ${model_base} \
--lora ${path_to_pokemon_lora}:1.0 \  # optional
--image_num 16 \
--prompt "cute dragon creature" \
--negative_prompt "nsfw" \
--output ${gcu_results} \
--gcu 0 \
--model_type sd_v1_5 \
--image_height 512 \
--image_width 512 \
--platform general \
--scheduler ddim\
--denoising_steps 20 \
--seed 1111111 \
--init_image ${init_img_path} \
--prompt_strength 0.8

注意：如果不设置 --lora 参数或者设置为 ''，则进行原生推理。如果设置了 --lora 参数，则在融合 lora 后进行推理。

其中，

--model: 前面步骤生成的 engine 以及相关配置文件。
--lora: [optional] lora 路径以及其融合强度。路径和强度之间以 : 分隔。
--model_type: 有 [sd_v1_5, sd_v2_base, sd_v2_1] 共 3 种 sd 版本选择。
--image_height: 生成图片的高度，需与 --model 指定路径的分辨率相匹配。
--image_width: 生成图片的宽度，需与 --model 指定路径的分辨率相匹配。
--init_image: [optional] Image2Image 任务的初始文件。如果不传入，则会联网下载一张图片作为初始文件。
--prompt_strength: 对初始文件的变化程度，取值为 [0-1].

其他参数及其含义请使用 python3 -m stable_diffusion.examples.stable_diffusion.demo_image2image_topsinference -h 进行查看。

Controlnet-SD-V1.5¶

概述¶

本文档介绍如何基于提供的代码，进行 Stable Diffusion Controlnet各controlnet的text2image和image2image任务。目前各controlnet支持512x512, 512x680, 680x512, 576x1024, 1024x576 共 5 种分辨率（Image Height x Image Width）

推理步骤¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖
```
pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu
```
注意：请确保使用了指定的 torch 1.13.1+cpu 和 diffusers 0.21.0 版本，否则会产生错误。

步骤 1: 下载文件¶

测试 controlnet 相关功能，请从技术支持处获取 controlnet 相关 onnx 模型，请参考以下命令将相关 onnx 模型及其它相关文件下载到您需要保存模型的位置：

path_to_onnx_models=/your/onnx/path

形成类似如下目录结构：

/your/onnx/path
└── v1-5_controlnet
    └── huggingface_diffusers
        ├── controlnet
        │   ├── stable_diffusion_v1-5_controlnet-scribble-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-canny-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-depth-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-inpaint-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-lineart_anime-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-lineart-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-mlsd-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-normalbae-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-openpose-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-seg-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-shuffle-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-softedge-huggingface-diffusers-op14-fp32-N.onnx
        │   └── stable_diffusion_v1-5_controlnet_v1-1-tile-huggingface-diffusers-op14-fp32-N.onnx
        ├── controlnet_aux
        │   ├── controlnet_aux-hed-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-lineartAnime-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-lineart-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-mlsd-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-openpose-v006-op13-fp32-N.onnx
        │   └── controlnet_aux-pidinet_model-v006-op13-fp32-N.onnx
        ├── safety_checker
        │   └── stable_diffusion_v1-5_safety_checker-hf-op14-fp32.onnx
        ├── text_encoder
        │   └── stable_diffusion_v1-5_clip-huggingface-diffusers-op14-fp32-N.onnx
        ├── unet_controlnet
        │   ├── layer_mapping.json
        │   ├── partial_refitting_layers.safetensors
        │   ├── stable_diffusion_v1-5_UnetControlnet-huggingface-diffusers-op14-fp32-N.onnx
        │   └── stable_diffusion_v1-5_UnetControlnet-huggingface-diffusers-op14-fp32-N.weights
        ├── vae_decoder
        │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx
        └── vae_encoder
            └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp32-N.onnx

* 准备 pokemon lora 并解压:

  sd_v1_5_lora 目录结构如下：

  ``` bash
  sd_v1_5_lora
  ├── mengx_girlmix_lora
  │   └── unet
  │       └── adapter_model.safetensors
  └── pokemon
      └── unet
          └── adapter_model.safetensors

步骤 2: 编译 engine¶

准备好上述 onnx 模型及其它相关文件后，请执行以下命令编译engine：

python3 -m stable_diffusion.scripts.controlnet.sd1_5.generate_onnx_engine \
--controlnet_dir ${path_to_onnx_models}/v1-5_controlnet/huggingface_diffusers \
--platform general \
--export_type engine \
--controlnet_type lineart_anime \
--image_height -1 \
--image_width 512

相关的参数说明如下:

--controlnet_dir: controlnet_dir如上述命令如示，指向存放onnx及其它文件的路径，生成的engine也会存放于controlnet_dir
--platform: 有 [general, maas] 2 种选择，默认为general。如果已获得上述onnx，则只能设置为general；如果从torch模型开始，重新导出onnx，则可以选择general或maas。
--export_type: 有 [onnx, engine, both] 3 种选择，选择both，则会先导出onnx，再根据onnx编engine。如果已获得上述onnx，仅需编engine，则设置为engine；如果希望从torch模型开始，重新导出onnx，则可以根据需要，设置为onnx或both。
--controlnet_type: 设置希望编的controlnet类型，目前支持 [scribble,openpose,canny,mlsd,lineart_anime,lineart,depth,softedge,normalbae,tile,shuffle,all]，默认是all，也就是默认编出所有controlnet的engine，可以设置仅编其中的一种controlnet的engine。
--image_height: 设置希望编的分辨率的高，支持 [-1, 512, 576, 680, 1024]，如果是-1则表示编出所有支持的分辨率,目前各controlnet支持512x512, 512x680, 680x512, 576x1024, 1024x576共5 种分辨率, 如果仅需编一种分辨率，则设置其为期望数值
--image_width: 设置希望编的分辨率的宽，支持 [-1, 512, 576, 680, 1024]，如果是-1则表示编出所有支持的分辨率,目前各controlnet支持512x512, 512x680, 680x512, 576x1024, 1024x576共5 种分辨率, 如果仅需编一种分辨率，则设置其为期望数值

注意：如果需要重新导出onnx，则需要额外再设置下列部分或全部参数：

--model: 如果需要重新导出onnx，则请将model设置为sd v1.5 torch模型所在目录，并根据需要将–export_type设置为onnx或both；如果本地没有sd v1.5 torch模型，则保持–model的默认值：”runwayml/stable-diffusion-v1-5”，程序将会自动下载sd v1.5 torch模型并缓存到本地，以备后续使用
“–openpose_model”、”–lineart_model”等参数，在需要导出对应的controlnet的onnx时，请设置这些参数为对应的torch模型所在目录，如果本地没有，则保持默认值，程序将会自动下载相应torch模型并缓存到本地，以备后续使用
“–openpose_detector”、”–lineart_detector”等参数，在需要导出对应的controlnet的onnx时，请设置这些参数为对应的torch模型所在目录，如果本地没有，则保持默认值，程序将会自动下载相应torch模型并缓存到本地，以备后续使用

如果编完lineart_anime controlnet的5种分辨率，生成的目录结构类似如下：

/your/onnx/path
└── v1-5_controlnet
    └── huggingface_diffusers
        ├── controlnet
        │   ├── stable_diffusion_v1-5_controlnet-scribble-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-canny-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-depth-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-inpaint-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-lineart_anime-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-lineart-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-mlsd-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-normalbae-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-openpose-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-seg-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-shuffle-huggingface-diffusers-op14-fp32-N.onnx
        │   ├── stable_diffusion_v1-5_controlnet_v1-1-softedge-huggingface-diffusers-op14-fp32-N.onnx
        │   └── stable_diffusion_v1-5_controlnet_v1-1-tile-huggingface-diffusers-op14-fp32-N.onnx
        ├── controlnet_aux
        │   ├── controlnet_aux-hed-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-lineartAnime-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-lineart-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-mlsd-v006-op13-fp32-N.onnx
        │   ├── controlnet_aux-openpose-v006-op13-fp32-N.onnx
        │   └── controlnet_aux-pidinet_model-v006-op13-fp32-N.onnx
        ├── controlNet_lineart_anime
        │   ├── 1024x576
        │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   ├── 512x512
        │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   ├── 512x680
        │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   ├── 576x1024
        │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   └── 680x512
        │       └── diffusion_pytorch_model_tops_engine.bin
        ├── lineartAnime
        │   └── diffusion_pytorch_model_tops_engine.bin
        ├── safety_checker
        │   └── stable_diffusion_v1-5_safety_checker-hf-op14-fp32.onnx
        ├── text_encoder
        │   ├── diffusion_pytorch_model_tops_engine_refit.bin
        │   └── stable_diffusion_v1-5_clip-huggingface-diffusers-op14-fp32-N.onnx
        ├── unet_controlnet
        │   ├── 1024x576
        │   │   └── diffusion_pytorch_model_tops_engine_refit.bin
        │   ├── 512x512
        │   │   └── diffusion_pytorch_model_tops_engine_refit.bin
        │   ├── 512x680
        │   │   └── diffusion_pytorch_model_tops_engine_refit.bin
        │   ├── 576x1024
        │   │   └── diffusion_pytorch_model_tops_engine_refit.bin
        │   ├── 680x512
        │   │   └── diffusion_pytorch_model_tops_engine_refit.bin
        │   ├── layer_mapping.json
        │   ├── partial_refitting_layers.safetensors
        │   ├── stable_diffusion_v1-5_UnetControlnet-huggingface-diffusers-op14-fp32-N.onnx
        │   └── stable_diffusion_v1-5_UnetControlnet-huggingface-diffusers-op14-fp32-N.weights
        ├── vae
        │   ├── vae_decoder
        │   │   ├── 1024x576
        │   │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   │   ├── 512x512
        │   │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   │   ├── 512x680
        │   │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   │   ├── 576x1024
        │   │   │   └── diffusion_pytorch_model_tops_engine.bin
        │   │   └── 680x512
        │   │       └── diffusion_pytorch_model_tops_engine.bin
        │   └── vae_encoder
        │       ├── 1024x576
        │       │   └── diffusion_pytorch_model_tops_engine.bin
        │       ├── 512x512
        │       │   └── diffusion_pytorch_model_tops_engine.bin
        │       ├── 512x680
        │       │   └── diffusion_pytorch_model_tops_engine.bin
        │       ├── 576x1024
        │       │   └── diffusion_pytorch_model_tops_engine.bin
        │       └── 680x512
        │           └── diffusion_pytorch_model_tops_engine.bin
        ├── vae_decoder
        │   └── stable_diffusion_v1-5_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx
        └── vae_encoder
            └── stable_diffusion_v1-5_vae_encoder-huggingface-diffusers-op14-fp32-N.onnx

步骤3 推理运行¶

功能性验证¶

请运行下述脚本，分别尝试原生/融合 lora 的 text2image 的推理功能：

首先，设置模型和 lora 路径(注意 lora 的路径是截止到 pokemon，而不是到 adapter_model.safetensors)：

model_base=${path_to_onnx_models}/v1-5_controlnet/huggingface_diffusers/
path_to_pokemon_lora=/path/to/sd_v1_5_lora/pokemon/

在GCU设备上运行text2image功能¶

所有controlnet pipeline，在gcu上运行的时候，如果需要融合lora，可以在下述命令行中添加–lora ${path_to_pokemon_lora}:alpha, alpha为融合lora的系数，如0.9

openpose¶

请运行下述的脚本，在 gcu设备上实现基于 openpose控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 66 --controlnet_type  openpose --output ./output_gcu_openpose_images/ --image_num 5  --device gcu

上述的脚本会在 output_gcu_openpose_images文件夹中，生成基于 TopsInference框架推理的 5张图片

scribble¶

请运行下述的脚本，在 gcu设备上实现基于 scribble控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type  scribble --output ./output_gcu_scribble_images/ --image_num 5  --device gcu --prompt "bag"

上述的脚本会在 output_gcu_scribble_images文件夹中，生成基于 TopsInference框架推理的 5张图片

canny¶

请运行下述的脚本，在 gcu设备上实现基于 canny控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type  canny --output ./output_gcu_canny_images/ --image_num 5  --device gcu --prompt "bird"

上述的脚本会在 output_gcu_canny_images文件夹中，生成基于 TopsInference框架推理的 5张图片

mlsd¶

请运行下述的脚本，在 gcu设备上实现基于 mlsd控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 32 --controlnet_type  mlsd --output ./output_gcu_mlsd_images/ --image_num 5  --device gcu   --prompt "royal chamber with fancy bed"

上述的脚本会在 output_gcu_mlsd_images文件夹中，生成基于 TopsInference框架推理的 5张图片

lineart_anime¶

请运行下述的脚本，在 gcu设备上实现基于 lineart_anime控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 32 --controlnet_type  lineart_anime --output ./output_gcu_lineart_anime_images/ --image_num 5  --device gcu   --prompt "A warrior girl in the jungle"

上述的脚本会在 output_gcu_lineart_anime_images文件夹中，生成基于 TopsInference框架推理的 5张图片

lineart¶

请运行下述的脚本，在 gcu设备上实现基于 lineart控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type  lineart  --output ./output_gcu_lineart_images/ --image_num 5  --device gcu --prompt "A warrior girl in the jungle"

上述的脚本会在 output_gcu_lineart_images文件夹中，生成基于 TopsInference框架推理的 5张图片

depth¶

请运行下述的脚本，在 gcu设备上实现基于 depth控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type  depth  --output ./output_gcu_depth_images/ --image_num 5  --device gcu --prompt "Stormtrooper's lecture in beautiful lecture hall"

上述的脚本会在 output_gcu_depth_images文件夹中，生成基于 TopsInference框架推理的 5张图片

softedge¶

请运行下述的脚本，在 gcu设备上实现基于 softedge控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type softedge  --output ./output_gcu_softedge_images/ --image_num 5  --device gcu --prompt "royal chamber with fancy bed"

上述的脚本会在 output_gcu_softedge_images文件夹中，生成基于 TopsInference框架推理的 5张图片

normalbae¶

请运行下述的脚本，在 gcu设备上实现基于 normalbae控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type normalbae  --output ./output_gcu_normalbae_images/ --image_num 5  --device gcu --prompt "A head full of roses"

上述的脚本会在 output_gcu_normalbae_images文件夹中，生成基于 TopsInference框架推理的 5张图片

tile¶

请运行下述的脚本，在 gcu设备上实现基于 tile控制模式的推理功能

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_controlnet_topsinference  --controlnet $model_base --platform general --seed 55 --controlnet_type tile  --output ./output_gcu_tile_images/ --image_num 5  --device gcu --prompt "best quality"

上述的脚本会在 output_gcu_tile_images文件夹中，生成基于 TopsInference框架推理的 5张图片

其他功能¶

使用多重Controlnet进行推理¶

运行下述代码，进行 Multi-Controlnet的 GCU推理

openpose_image_path="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/person_pose.png"
canny_image_path="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/controlnet/landscape_canny_masked.png"

python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_text2img_multi_controlnet_topsinference  \
--controlnet $model_base \
--platform general \
--seed 66 \
--controlnet_type  openpose canny  \
--output ./output_gcu_images/ \
--image_num 5  \
--device gcu \
--scheduler UniPC \
--need_preprocess False False  \
--controlnet_conditioning_scale 0.8 0.6  \
--control_image  $openpose_image_path $canny_image_path \
--prompt "a giant standing in a fantasy landscape, best quality"

添加Img2Img功能进行推理¶

运行下述代码，进行Controlnet Img2Img的GCU推理

control_image="https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png"
init_image="https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
python3 -m stable_diffusion.examples.controlnet.sd1_5.demo_img2img_controlnet_topsinference \
--controlnet $model_dir \
--controlnet_type canny \
--scheduler UniPC \
--image_num 4 \
--control_image $controlnet_image \
--need_preprocess False\
--prompt "futuristic-looking woman" \
--output "./output/img2img" \
--gcu 0 \
--image_height 512 \
--image_width 512 \
--denoising_steps 20 \
--seed 32 \
--guidance_scale 7.5 \
--strength 0.8 \
--init_image $init_image \
--device "gcu"\
--lora ""

注意事项¶

目前基底模型统一为 StableDiffusion1.5
倘若从步骤1开始运行，需要从 huggingface下载数个预训练模型，建议从步骤2获取 ONNX模型开始进行验证

SD-XL-1.0¶

本文档介绍如何基于提供的代码，进行stable diffusion XL 1.0 的推理任务

支持范围:

1024x1024, 1024x768, 768x1024 共 3 种分辨率（Image Height x Image Width）
lora 和非 lora
text2img 和 image2image
text2img + refiner

推理步骤¶

步骤1: 安装软件包¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖

pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu

步骤2: 准备模型¶

onnx模型位置如下。如果找不到onnx，请从客户支持处获取相关onnx模型及其它相关文件。

stable diffusion XL 1.0 onnx模型目录结构：

sd-xl-1.0/
└── huggingface_diffusers
    ├── model_index.json
    ├── scheduler
    │   └── scheduler_config.json
    ├── text_encoder
    │   ├── config.json
    │   └── stable-diffusion-xl-base-1.0-text_encoder-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
    ├── text_encoder_2
    │   ├── config.json
    │   ├── stable-diffusion-xl-base-1.0-text_encoder_2-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
    │   └── stable-diffusion-xl-base-1.0-text_encoder_2-huggingface-diffusers_0.21.0-op14-fp32-N.weights
    ├── tokenizer
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── tokenizer_2
    │   ├── merges.txt
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   └── vocab.json
    ├── unet
    │   ├── config.json
    │   ├── stable-diffusion-xl-base-1.0-unet-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
    │   ├── stable-diffusion-xl-base-1.0-unet-huggingface-diffusers_0.21.0-op14-fp32-N.weights
    │   ├── stable-diffusion-xl-refiner-1.0-unet-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
    │   └── stable-diffusion-xl-refiner-1.0-unet-huggingface-diffusers_0.21.0-op14-fp32-N.weights
    ├── vae_decoder
    │   ├── config.json
    │   └── stable-diffusion-xl-base-1.0-vae_decoder-huggingface-diffusers_0.21.0-op14-fp16-N.onnx
    └── vae_encoder
        ├── config.json
        └── stable-diffusion-xl-base-1.0-vae_encoder-huggingface-diffusers_0.21.0-op14-fp16-N.onnx

编engine:

准备好上述onnx模型及其它相关文件后，请执行以下命令编engine,其中‘/path/to/model’为上述目录结构中的scheduler、text_encoder等的上级目录：

python3 -m stable_diffusion.scripts.stable_diffusion_xl.generate_onnx_engine --base_model /path/to/model --model_type sd_xl_base_1_0 --gcu 0 --refit_mode need_refit_engine --output_path /path/to/output --resoltions 1024x1024 1024x768
python3 -m stable_diffusion.scripts.stable_diffusion_xl.generate_onnx_engine --base_model /path/to/model --model_type sd_xl_refiner_1_0 --gcu 0 --refit_mode need_refit_engine --output_path /path/to/output --resoltions 1024x1024 1024x768

相关的参数说明如下:

base_model: root path to stable diffusion models
model_type: sd_xl_base_1_0, sd_xl_refiner_1_0
gcu：decide which card to use, eg: 0, 1
refit_mode: Control whether to compile the refittable unet and text encoder engine, [need_refit_engine, non_refit_engine].need_refit_engine for refittable engine, non_refit_engine for unrefittable engine
output_path: root path to output
resoltions: The supported image resolutions of the engine. [1024x1024,1024x768,768x1024] in hxw order, multiple resolutions separated by spaces

步骤3 推理运行¶

使用stable diffusion XL 1.0 进行推理预测，请参考以下命令,其中‘/path/to/model’为上述目录结构中的scheduler、text_encoder等的上级目录：

text2img¶

python3 -m stable_diffusion.examples.stable_diffusion_xl.demo_text2image_topsinference \
--model /path/to/model \
--model_type sd_xl_base_1_0 \
--platform general \
--gcu 0 \
--image_num 2 \
--image_height 1024 \
--image_width 1024 \
--prompt "a beautiful photograph of Mt. Fuji during cherry blossom" \
--negative_prompt "low quality, ugly" \
--seed 42 \
--scheduler "ddim" \
--denoising_steps 20 \
--guidance_scale 7 \
--output /path/to/output_dir

相关的参数说明如下:

model: root path to stable diffusion models
model_type: sd_xl_base_1_0 for SD-XL-base-1.0 model
platform: general or maas, two different directory structures for model files.
gcu：decide which card to use, eg: 0, 1
image_num: number of images that generated from one prompt
--image_height: The height of the generated image should match the resolution specified by the –model path.
--image_width: The width of the generated image should match the resolution specified by the –model path.
prompt: Text prompt(s) to guide image generation
prompt_2: Optional, text prompt(s) to guide image generation
negative_prompt: The negative prompt(s) to guide the image generation
negative_prompt_2: Optional, the negative prompt(s) to guide the image generation
seed: seed for generating random data, an integer
denoising_steps: how many steps to run unet
scheduler: scheduler name
guidance_scale: guidance_scale or CFG scale
output: target dir to save generated images

img2img¶

python3 -m stable_diffusion.examples.stable_diffusion_xl.demo_image2image_topsinference \
--model /path/to/model \
--model_type sd_xl_base_1_0 \
--platform general \
--gcu 0 \
--image_height 1024 \
--image_width 1024 \
--init_image /path/your_test_image \
--image_num 2 \
--prompt "a beautiful photograph of Mt. Fuji during cherry blossom" \
--negative_prompt "low quality, ugly" \
--seed 42 \
--scheduler "ddim" \
--denoising_steps 20 \
--guidance_scale 7 \
--output /path/to/output_dir

相关的参数说明如下:

model: root path to stable diffusion models
model_type: sd_xl_base_1_0 for SD-XL-base-1.0 model
platform: general or maas, two different directory structures for model files.
gcu：decide which card to use, eg: 0, 1
init_image：reference image for img2img
image_num: number of images that generated from one prompt
--image_height: The height of the generated image should match the resolution specified by the –model path.
--image_width: The width of the generated image should match the resolution specified by the –model path.
prompt: Text prompt(s) to guide image generation
prompt_2: Optional, text prompt(s) to guide image generation
negative_prompt: The negative prompt(s) to guide the image generation
negative_prompt_2: Optional, the negative prompt(s) to guide the image generation
seed: seed for generating random data, an integer
denoising_steps: how many steps to run unet
scheduler: scheduler name
guidance_scale: guidance_scale or CFG scale
output: target dir to save generated images

text2img + refiner¶

python3 -m stable_diffusion.examples.stable_diffusion_xl.demo_text2image_and_refiner_topsinference \
--model /path/to/model \
--model_type sd_xl_base_1_0 \
--platform general \
--gcu 0 \
--image_num 2 \
--image_height 1024 \
--image_width 1024 \
--prompt "a beautiful photograph of Mt. Fuji during cherry blossom" \
--negative_prompt "low quality, ugly" \
--seed 42 \
--scheduler "ddim" \
--denoising_steps 20 \
--guidance_scale 7 \
--high_noise_frac 0.8 \
--output /path/to/output_dir

相关的参数说明如下:

model: root path to stable diffusion models
model_type: sd_xl_base_1_0 for SD-XL-base-1.0 model
platform: general or maas, two different directory structures for model files.
gcu：decide which card to use, eg: 0, 1
image_num: number of images that generated from one prompt
--image_height: The height of the generated image should match the resolution specified by the –model path.
--image_width: The width of the generated image should match the resolution specified by the –model path.
prompt: Text prompt(s) to guide image generation
prompt_2: Optional, text prompt(s) to guide image generation
negative_prompt: The negative prompt(s) to guide the image generation
negative_prompt_2: Optional, the negative prompt(s) to guide the image generation
seed: seed for generating random data, an integer
denoising_steps: how many steps to run unet
scheduler: scheduler name
guidance_scale: guidance_scale or CFG scale
high_noise_frac: Use the proportion of inference steps based on the base model.
output: target dir to save generated images

SD-XL-T2I-Adapter¶

本文档介绍如何基于提供的代码，进行stable diffusio XL T2I Adapter 的text2image任务的推理

支持范围:

1024x1024, 1024x768, 768x1024 共 3 种分辨率 （Image Height x Image Width）

推理步骤¶

步骤1: 安装软件包¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖

pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu

步骤2: 准备模型¶

onnx模型位置如下。如果找不到onnx，请从客户支持处获取相关onnx模型及其它相关文件。

stable diffusion XL base 1.0 onnx模型目录结构：

sd-xl-base-1.0/
└──  scheduler
│   └── scheduler_config.json
├── text_encoder
│   └── stable-diffusion-xl-base-1.0-text_encoder-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
├── text_encoder_2
│   ├── stable-diffusion-xl-base-1.0-text_encoder_2-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
│   └── stable-diffusion-xl-base-1.0-text_encoder_2-huggingface-diffusers_0.21.0-op14-fp32-N.weights
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── tokenizer_2
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet_adapter
│   ├── stable-diffusion-xl-base-1.0-unet_adapter-huggingface-diffusers_0.21.0-op14-fp32-N.onnx
│   └── stable-diffusion-xl-base-1.0-unet_adapter-huggingface-diffusers_0.21.0-op14-fp32-N.weights
└── vae
    ├── vae_decoder
    │   └── stable-diffusion-xl-base-1.0-vae_decoder-huggingface-diffusers_0.21.0-op14-fp16-N.onnx
    └── vae_encoder
        └── stable-diffusion-xl-base-1.0-vae_encoder-huggingface-diffusers_0.21.0-op14-fp16-N.onnx

xl-adapter onnx模型目录结构

xl-adapter
├── adapter_canny
│   └── stable_diffusion_xl-adapter_canny-hf-op14-fp32-N.onnx
├── adapter_depth_midas
│   └── stable_diffusion_xl-adapter_depth-hf-op14-fp32-N.onnx
├── adapter_depth_zoe
│   └── stable_diffusion_xl-adapter_zoe-hf-op14-fp32-N.onnx
├── adapter_lineart
│   └── stable_diffusion_xl-adapter_lineart-hf-op14-fp32-N.onnx
├── adapter_openpose
│   └── stable_diffusion_xl-adapter_openpose-hf-op14-fp32-N.onnx
└── adapter_sketch
    └── stable_diffusion_xl-adapter_sketch-hf-op14-fp32-N.onnx

编engine:

准备好上述onnx模型及其它相关文件后，请执行以下命令编engine,其中

--adapter_type参数决定控制类型，可选参数为canny, openpose,lineart,sketch,depth_midas, depth_zoo
--resolution参数决定生成engine的分辨率，可选参数为[0, 1, 2]，其中0代表[768x1024]分辨率，1代表[1024x768]分辨率，2代表[1024x1024]分辨率
--refit_mode 代表编译的engine类型是否支持lora，暂时仅支持non_refit_engine选项

python3.10 generate_onnx_engine.py  --resolution 0  --adapter_type "canny"  --refit_mode non_refit_engine

步骤3 推理运行¶

使用XL T2I-Adapter进行推理预测，请参考以下命令

text2img¶

python3.10 -m stable_diffusion.examples.t2i_adapter.demo_xl_t2i_adapter_topsinference   \
--model ./stable-diffusion-xl-base-1.0 \
--adapter_model ./xl-adapter \
--output ./output_openpose_pic \
--need_preprocess 1 \
--image_height 1024 \
--image_width 1024  \
--image_path https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg  \
--gcu 1   \
--prompt "A couple, 4k photo, highly detailed"   \
--adapter_type "openpose"  \
--image_num 2  \
--denoising_steps 30 \
--negative_prompt "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"   \
--seed 1024 \
--adapter_conditioning_scale 1.0 \
--adapter_conditioning_factor 1.0 \
--controlnet_aux_pretrained "lllyasviel/Annotators" \

相关的参数说明如下:

model: stable-diffusion-xl-base-1.0模型目录地址
adapter_model: xl-adapter 模型目录地址
gcu：进行推理的gcu卡号id
adapter_type: 应用Adapter的类型，当前支持[canny, openpose, sketch, lineart, depth_midas, depth_zoe]
image_num: 生成的图片数目
image_path: 应用于Adapter的特征图像
image_height: 生成的图片高度
image_width: 生成的图片宽度
prompt: 生成图片的正向文本提示词
prompt_2: Optional生成图片的正向文本提示词
negative_prompt: 生成图片的负向文本提示词
negative_prompt_2: Optional生成图片的负向文本提示词
seed: 随机种子
denoising_steps: 生成图片的降噪步数
scheduler: 调度器名字
guidance_scale: 文本对生成图片的影响强度
output: 保存图片的文件夹
need_preprocess: 特征图像是否需要经过预处理
adapter_conditioning_factor: 它决定了在多少时间步长中应用Adapter。设置为0.0时，那么Adapter将不会被应用。设置为1.0，那么Adapter将在所有时间步长中被应用。
adapter_conditioning_scale: Adapter对最终生成图片的影响程度
controlnet_aux_pretrained: 预处理器使用的模型来源

DeepDanBooru¶

概述¶

本文档用以描述如何基于提供的代码，实现DeepDanBooru推理的流程。

推理步骤¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

步骤1 下载onnx模型并生成engine¶

如果找不到onnx，请从客户支持处获取相关onnx模型及其它相关文件，使用onnx文件生成engines。

python3 -m stable_diffusion.scripts.deepdanbooru.generate_onnx_engine --model path_to_onnx_model --output_path path_to_output

相关的参数说明如下

model: onnx模型路径,pt权重url,pt权重路径
output_path: 输出目录

步骤2 推理运行¶

请运行下述的脚本，实现 deepdanbooru 推理

python3 -m stable_diffusion.examples.deepdanbooru.demo_deepdanbooru_topsinference.py --model_dir path_to_output --infer_type  topsinference  --img_dir  path_to_img --gcu 0

相关的参数说明如下

model_dir: 指向步骤1的输出目录
img_dir: 用于测试的图片目录

注意事项¶

暂无

Esrgan¶

概述¶

本文档用以描述如何基于提供的代码，实现 esrgan x4 推理的流程。

支持范围:

512x512 分辨率

推理步骤¶

以下步骤基于 python3.10, 请先安装所需依赖：

安装 stable_diffusion 包

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖

pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu

步骤 1. 准备 onnx 模型并生成 engine¶

准备 esrgan-x4-mmediting-op13-fp32-N.onnx，如果找不到 onnx，请从客户支持处获取相关 onnx 模型及其它相关文件。

编译生成 engines。

python3 -m stable_diffusion.scripts.upscale.esrgan.generate_engine --model path_to_onnx_model --output_dir path_to_save_engine

相关的参数说明如下

model: esrgan 的 onnx 模型所在路径
output_dir: 保存模型engine的目录，会自动创建engines文件夹

步骤 2. 推理运行¶

请运行下述的脚本，实现 esrgan engine 的 upscale 功能。

python3 -m stable_diffusion.examples.upscale.esrgan.demo_esrgan_topsinference \
--model_path 'esrgan-x4-op13-fp32-512.bin' \
--input_images 'inputs_imgs' \
--output 'outputs_imgs' \
--gcu 0

Real Esrgan¶

概述¶

本文档用以描述如何基于提供的代码，实现 real esrgan x4 推理的流程。

支持范围:

512x512 分辨率

推理步骤¶

以下步骤基于 python3.10, 请先安装所需依赖：

安装 stable_diffusion 包

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖

pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu

步骤 1. 准备 onnx 模型并生成 engine¶

准备 realesrgan-x4-op13-fp32-N.onnx，如果找不到 onnx，请从客户支持处获取相关 onnx 模型及其它相关文件。

python3 -m stable_diffusion.scripts.upscale.real_esrgan.generate_engine --model path_to_onnx_model --output_dir path_to_save_engine

相关的参数说明如下

model: real esrgan 的 onnx 模型所在目录
output_dir: 保存模型engine的目录，会自动创建engines文件夹

步骤 2. 推理运行¶

请运行下述的脚本，实现real esrgan engine的upscale功能。

python3 -m stable_diffusion.examples.upscale.real_esrgan.demo_real_esrgan_topsinference \
--model_path 'realesrgan-x4-op13-fp32-512.bin' \
--input_dir 'inputs_imgs' \
--output_dir 'outputs_imgs' \
--output_scale 4 \
--gcu 0

stable_diffusion_x2_latent_upscaler¶

本文档介绍如何基于提供的代码，进行基于 stable diffusion 的放大 2 倍任务的推理。

支持范围:

512x512 分辨率

推理步骤¶

步骤 1: 安装软件包¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装 stable_diffusion 包

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖
```
pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu
```
注意：请确保使用了指定的 torch 和 diffusers 版本，否则会产生错误。

步骤 2: 准备模型和配置文件¶

获取 onnx 模型和 config 文件

如果测试基于 stable diffusion 的 upscale 功能，请从技术支持处获取相关的 onnx 模型及 config 文件，将相关 onnx 模型及其它相关文件放置到您需要保存的位置：

准备完成 onnx 文件和 config 文件后，请按照如下所示的目录结构进行组织：

onnx 模型目录结构：

onnx/
├── stable_diffusion_x2_latent_upscaler_text_encoder-huggingface-diffusers-op14-fp32-N.onnx
├── stable_diffusion_x2_latent_upscaler_unet-huggingface-diffusers-op14-fp32-N.onnx
├── stable_diffusion_x2_latent_upscaler_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx
└── stable_diffusion_x2_latent_upscaler_vae_encoder-huggingface-diffusers-op14-fp32-N.onnx

config 文件目录结构：

x2/
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   └── config.json
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   └── config.json
└── vae
    └── config.json

编译 engine:

编译 engine 的过程是嵌入在推理过程中的，因此在初次执行推理时，会自动编译 engine. 后续若要单独编译某个 engine，只需删除已有的相应 engine 文件，即可重新生成。

步骤 3: 推理运行¶

使用 stable_diffusion_x2_latent_upscaler 进行推理预测，请参考以下命令

python3 -m stable_diffusion.examples.upscale.stable_diffusion_x2_latent_upscaler.demo_latent_upscale_x2 \
--backend gcu \
--prompt "" \
--num_inference_steps 5 \
--guidance_scale 0.0 \
--seed 42 \
--input_image imgs/a1.png \
--model_dir onnx \
--output_dir imgs \
--pipeline_config_folder ../common/configs/x2 \
--card_id 0 \
--cluster_id 0 \
--num_warmup_runs 0 \
--log_dir ./log/ \
--benchmark

其中，

--backend 参数有 [gcu, torch, onnxruntime] 3 种选择。
--model_dir 参数为 jfrog 下载后重新组织的 onnx 文件夹路径。
--pipeline_config_folder 参数为 jfrog 下载后重新组织的 config 文件夹路径。
--num_warmup_runs 参数为 benchmark 前进行 warmup 的次数。
--benchmark 参数控制是否进行 benchmark, 即打印每个模型推理及整体 pipeline 耗时。

其他参数及其含义请使用 python3 -m stable_diffusion.examples.upscale.stable_diffusion_x2_latent_upscaler.demo_latent_upscale_x2 -h 进行查看。

步骤 4: 精度验证¶

对数据集中的图像进行 upscale 后，可通过 rmse 来判断不同 backend 生成图片的质量差别。一般以 torch 作为 backend 生成的图像作为基准，判断 gcu 生成图像与其之间的 rmse.

python3 -m stable_diffusion.scripts.upscale.common.compare_images \
--gcu_image_dir stable_diffusion_x2_latent_upscaler/imgs/gcu \
--torch_image_dir stable_diffusion_x2_latent_upscaler/imgs/torch

控制台以及日志中会打印出计算得到的 rmse 的值。

stable_diffusion_x4_upscaler¶

本文档介绍如何基于提供的代码，进行基于 stable diffusion 的放大 4 倍任务的推理。

支持范围:

512x512 分辨率

推理步骤¶

步骤 1: 安装软件包¶

以下步骤基于 Python3.10, 请先安装所需依赖：

安装 stable_diffusion 包

安装（python3.10版本的） TopsInference

pip3 install --force-reinstall TopsInference-*-py3.10-*.whl

安装 stable_diffusion 包及其依赖
```
pip3 install /path/to/stable/diffusion/whl --extra-index-url https://download.pytorch.org/whl/cpu
```
注意：请确保使用了指定的 torch 和 diffusers 版本，否则会产生错误。

步骤 2: 准备模型和配置文件¶

获取 onnx 模型和 config 文件

如果测试基于 stable diffusion 的 upscale 功能，请从技术支持处获取相关的 onnx 模型及 config 文件，将相关 onnx 模型及其它相关文件放置到您需要保存的位置：

准备完成 onnx 文件和 config 文件后，请按照如下所示的目录结构进行组织：

onnx 模型目录结构：

onnx/
├── stable_diffusion_x4_upscaler_text_encoder-huggingface-diffusers-op14-fp32-N.onnx
├── stable_diffusion_x4_upscaler_unet-huggingface-diffusers-op14-fp32-N.onnx
└── stable_diffusion_x4_upscaler_vae_decoder-huggingface-diffusers-op14-fp32-N.onnx

config 文件目录结构：

x4/
├── low_res_scheduler
│   └── scheduler_config.json
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   └── config.json
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   └── config.json
└── vae
    └── config.json

编译 engine:

编译 engine 的过程是嵌入在推理过程中的，因此在初次执行推理时，会自动编译 engine. 后续若要单独编译某个 engine，只需删除已有的相应 engine 文件，即可重新生成。

步骤 3: 推理运行¶

使用 stable_diffusion_x4_upscaler 进行推理预测，请参考以下命令

python3 -m stable_diffusion.examples.upscale.stable_diffusion_x4_upscaler.demo_latent_upscale_x4 \
--backend gcu \
--prompt "a dog" \
--num_inference_steps 5 \
--seed 42 \
--input_image imgs/a2.png \
--model_dir onnx \
--output_dir imgs \
--pipeline_config_folder ../common/configs/x4 \
--card_id 0 \
--cluster_id 0 \
--num_warmup_runs 0 \
--log_dir ./log/ \
--benchmark

其中，

--backend 参数有 [gcu, torch, onnxruntime] 3 种选择。
--model_dir 参数为 jfrog 下载后重新组织的 onnx 文件夹路径。
--pipeline_config_folder 参数为 jfrog 下载后重新组织的 config 文件夹路径。
--num_warmup_runs 参数为 benchmark 前进行 warmup 的次数。
--benchmark 参数控制是否进行 benchmark, 即打印每个模型推理及整体 pipeline 耗时。

其他参数及其含义请使用 python3 -m stable_diffusion.examples.upscale.stable_diffusion_x4_upscaler -h 进行查看。

步骤 4: 精度验证¶

对数据集中的图像进行 upscale 后，可通过 rmse 来判断不同 backend 生成图片的质量差别。一般以 torch 作为 backend 生成的图像作为基准，判断 gcu 生成图像与其之间的 rmse.

python3 -m stable_diffusion.scripts.upscale.common.compare_images \
--gcu_image_dir stable_diffusion_x4_upscaler/imgs/gcu \
--torch_image_dir stable_diffusion_x4_upscaler/imgs/torch

控制台以及日志中会打印出计算得到的 rmse 的值。