 
Canny Edge Control - Top, Sample - Below
 
This repository implements ControlNet in PyTorch for diffusion models. As of now, the repo provides code to do the following:
- Training and Inference of Unconditional DDPM on MNIST dataset
- Training and Inference of ControlNet with DDPM on MNIST using canny edges
- Training and Inference of Unconditional Latent Diffusion Model on CelebHQ dataset(resized to 128x128 with latent images being 32x32)
- Training and Inference of ControlNet with Unconditional Latent Diffusion Model on CelebHQ using canny edges
For autoencoder of Latent Diffusion Model, I provide training and inference code for vae.
- Create a new conda environment with python 3.10 then run below commands
- conda activate <environment_name>
- git clone https://github.com/explainingai-code/ControlNet-PyTorch.git
- cd ControlNet-PyTorch
- pip install -r requirements.txt
- Download lpips weights by opening this link in browser(dont use cURL or wget) https://github.com/richzhang/PerceptualSimilarity/blob/master/lpips/weights/v0.1/vgg.pth and downloading the raw file. Place the downloaded weights file in models/weights/v0.1/vgg.pth
For setting up the mnist dataset follow - https://github.com/explainingai-code/Pytorch-VAE#data-preparation
Ensure directory structure is following
ControlNet-PyTorch
    -> data
        -> mnist
            -> train
                -> images
                    -> *.png
            -> test
                -> images
                    -> *.png
For setting up on CelebHQ, simply download the images from the official repo of CelebMASK HQ here.
Ensure directory structure is the following
ControlNet-PyTorch
    -> data
        -> CelebAMask-HQ
            -> CelebA-HQ-img
                -> *.jpg
Allows you to play with different components of ddpm and autoencoder training
- config/mnist.yaml- Config for MNIST dataset
- config/celebhq.yaml- Configuration used for celebhq dataset
Relevant configuration parameters
Most parameters are self-explanatory but below I mention couple which are specific to this repo.
- autoencoder_acc_steps: For accumulating gradients if image size is too large for larger batch sizes
- save_latents: Enable this to save the latents , during inference of autoencoder. That way ddpm training will be faster
The repo provides training and inference for Mnist(Unconditional DDPM) and CelebHQ (Unconditional LDM) and ControlNet with both these variations using canny edges.
For working on your own dataset:
- Create your own config and have the path in config point to images (look at celebhq.yamlfor guidance)
- Create your own dataset class which will just collect all the filenames and return the image and its hint in its getitem method. Look at mnist_dataset.pyorceleb_dataset.pyfor guidance
Once the config and dataset is setup:
- For training and inference of Unconditional DDPM follow this section
- For training and inference of ControlNet with Unconditional DDPM follow this section
- Train the auto encoder on your dataset using this section
- For training and inference of Unconditional LDM follow this section
- For training and inference of ControlNet with Unconditional LDM follow this section
- For training ddpm on mnist,ensure the right path is mentioned in mnist.yaml
- For training ddpm on your own dataset
- Create your own config and have the path point to images (look at celebhq.yaml for guidance)
- Create your own dataset class, similar to celeb_dataset.py
 
- Call the desired dataset class in training file here
- For training DDPM run python -m tools.train_ddpm --config config/mnist.yamlfor training ddpm with the desire config file
- For inference run python -m tools.sample_ddpm --config config/mnist.yamlfor generating samples with right config file.
- For training controlnet, ensure the right path is mentioned in mnist.yaml
- For training controlnet with ddpm on your own dataset
- Create your own config and have the path point to images (look at celebhq.yaml for guidance)
- Create your own dataset class, similar to celeb_dataset.py
 
- Call the desired dataset class in training file here
- Ensure return_hintsis passed as True in the dataset class initialization
- For training controlnet run python -m tools.train_ddpm_controlnet --config config/mnist.yamlfor training controlnet ddpm with the desire config file
- For inference run python -m tools.sample_ddpm_controlnet --config config/mnist.yamlfor generating ddpm samples using canny hints with right config file.
- For training autoencoder on celebhq,ensure the right path is mentioned in celebhq.yaml
- For training autoencoder on your own dataset
- Create your own config and have the path point to images (look at celebhq.yaml for guidance)
- Create your own dataset class, similar to celeb_dataset.py
 
- Call the desired dataset class in training file here
- For training autoencoder run python -m tools.train_vae --config config/celebhq.yamlfor training autoencoder with the desire config file
- For inference make sure save_latentisTruein the config
- For inference run python -m tools.infer_vae --config config/celebhq.yamlfor generating reconstructions and saving latents with right config file.
Train the autoencoder first and setup dataset accordingly.
For training unconditional LDM ensure the right dataset is used in train_ldm_vae.py here
- python -m tools.train_ldm_vae --config config/celebhq.yamlfor training unconditional ldm using right config
- python -m tools.sample_ldm_vae --config config/celebhq.yamlfor generating images using trained ldm
- For training controlnet with celebhq, ensure the right path is mentioned in celebhq.yaml
- For training controlnet with ldm on your own dataset
- Create your own config and have the path point to images (look at celebhq.yaml for guidance)
- Create your own dataset class, similar to celeb_dataset.py
 
- Ensure Autoencoder and LDM have already been trained
- Call the desired dataset class in training file here
- Ensure return_hintsis passed as True in the dataset class initialization
- Ensure down_sample_factoris correctly computed in the model initialization here
- For training controlnet run python -m tools.train_ldm_controlnet --config config/celebhq.yamlfor training controlnet ldm with the desire config file
- For inference with controlnet run python -m tools.sample_ldm_controlnet --config config/celebhq.yamlfor generating ldm samples using canny hints with right config file.
Outputs will be saved according to the configuration present in yaml files.
For every run a folder of task_name key in config will be created
During training of autoencoder the following output will be saved
- Latest Autoencoder and discriminator checkpoint in task_namedirectory
- Sample reconstructions in task_name/vae_autoencoder_samples
During inference of autoencoder the following output will be saved
- Reconstructions for random images in  task_name
- Latents will be save in task_name/vae_latent_dir_nameif mentioned in config
During training and inference of unconditional ddpm or ldm following output will be saved:
- During training we will save the latest checkpoint in task_namedirectory
- During sampling, unconditional sampled image grid for all timesteps in task_name/samples/*.png. The final decoded generated image will bex0_0.png. Images fromx0_999.pngtox0_1.pngwill be latent image predictions of denoising process from T=999 to T=1. Generated Image is at T=0
During training and inference of controlnet with ddpm/ldm following output will be saved:
- During training we will save the latest checkpoint in task_namedirectory
- During sampling, randomly selected hints and generated samples will be saved in task_name/hint.pngandtask_name/controlnet_samples/*.png. The final decoded generated image will bex0_0.png. Images fromx0_999.pngtox0_1.pngwill be latent image predictions of denoising process from T=999 to T=1. Generated Image is at T=0
@misc{zhang2023addingconditionalcontroltexttoimage,
      title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
      author={Lvmin Zhang and Anyi Rao and Maneesh Agrawala},
      year={2023},
      eprint={2302.05543},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2302.05543}, 
}