August 23, 2022

Deploy Stable Diffusion for AI Image Generation

In the past few months, I tried almost all popular text-to-image AI generation models/products, such as Dall-E 2, MidJourney, Disco Diffusion, Stable Diffusion, etc. Stable Diffusion checkpoint was just released a few days ago. I deployed one on my old GPU server and record my notes here for people who may also want to try. Machine creativity is a quite interesting research area for IS scholars and I jotted down some potential research topics in the end of this post as well.

I first spent a few hours trying to set up Stable Diffusion on Mac M1 and failed - I cannot install the packages properly, e.g., version not found, dependency issues, etc. I found some successful attempts here but have no time to try them yet.

I ended up setting up Stable Diffusion on my old GPU server running Ubuntu and here are my notes.

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.6 LTS
Release:	18.04
Codename:	bionic
nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:1A:00.0 Off |                  N/A |
| 30%   27C    P8    20W / 250W |      1MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:68:00.0 Off |                  N/A |
| 30%   26C    P8    19W / 250W |     73MiB / 11016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
sudo apt install links
links www.google.com

rename the checkpoint file to model.ckpt and put it in the following folder (create a new one):

mkdir -p models/ldm/stable-diffusion-v1/

A side note on estimated training cost based on the reported GPU usage and the related AWS price I found:

Price of p4d.24xlarge instance with 8 A100 with 40G VRAM:

The training would cost between 225,000 USD and 600,000 USD.

name: ldm
channels:
  - pytorch
  - defaults
dependencies:
  - python=3.8.5
  - pip=20.3
  - cudatoolkit=10.2
  - pytorch=1.11.0
...
conda env create -f environment.yaml
conda activate ldm

Now, Stable Diffusion is ready to go and let’s see what AI will create based on the following text:

A car in 2050 designed by Antoni Gaudi

python optimizedSD/optimized_txt2img.py --prompt "a car in 2050 designed by Antoni Gaudi" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 10 --ddim_steps 50

This whole area is relatively new and there are many potential interesting research topics, e.g.,

Anyway, out of the 20 generated images from the prompt above, the following are my top 3: