guoyww · TheDenk · Jul 18, 2023 · Jul 18, 2023 · Jul 18, 2023 · Jul 19, 2023
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,18 @@
+samples/
+wandb/
+outputs/
+__pycache__/
+
+scripts/animate_inter.py
+scripts/gradio_app.py
+models/Controlnet/*
+models/DreamBooth_LoRA/*
+models/DreamBooth_LoRA/Put*personalized*T2I*checkpoints*here.txt
+models/Motion_Module/*
+models/*
+*.ipynb
+*.safetensors
+*.ckpt
+.ossutil_checkpoint/
+ossutil_output/
+debugs/
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
-# AnimateDiff
+# Controled AnimateDiff (V2 is also available)
 
-This repository is the official implementation of [AnimateDiff](https://arxiv.org/abs/2307.04725).
+This repository is an <b>Controlnet Extension</b> of the official implementation of [AnimateDiff](https://arxiv.org/abs/2307.04725).
 
 **[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725)**
 </br>
@@ -11,48 +11,113 @@ Yaohui Wang,
 Yu Qiao,
 Dahua Lin,
 Bo Dai
-
 <p style="font-size: 0.8em; margin-top: -1em">*Corresponding Author</p>
 
-[Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://animatediff.github.io/)
+<!-- [Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://animatediff.github.io/) -->
+[![arXiv](https://img.shields.io/badge/arXiv-2307.04725-b31b1b.svg)](https://arxiv.org/abs/2307.04725)
+[![Project Page](https://img.shields.io/badge/Project-Website-green)](https://animatediff.github.io/)
+[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Masbfca/AnimateDiff)
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/guoyww/AnimateDiff)
 
-## Todo
-- [x] Code Release
-- [x] Arxiv Report
-- [x] GPU Memory Optimization
-- [ ] Gradio Interface
+***WARNING! This version works as well as official but not compatible with the official implementation due to the difference of library versions.***
 
+<table width="1200" class="center">
+    <tr>
+    <td><img src="__assets__/animations/control/original/dance_original_16_2.gif"></td>
+    <td><img src="__assets__/animations/control/softedge/dance_1girl.gif"></td>
+    <td><img src="__assets__/animations/control/canny/dance_1girl.gif"></td>
+    <td><img src="__assets__/animations/control/canny/dance_medival_portrait.gif"></td>
+    </tr>
+</table>  
+<table width="1200" class="center">
+    <tr>
+      <td><img src="__assets__/animations/control/original/smiling_original_16_2.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_realistic_0.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_realistic_1.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_realistic_2.gif"></td>
+    </tr>
+    <tr>
+      <td><img src="__assets__/animations/control/depth/smiling_1girl.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_forbidden_castle.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_halo.gif"></td>
+      <td><img src="__assets__/animations/control/depth/smiling_medival.gif"></td>
+    </tr>
+</table>  
+Test video sources: <a href="https://stable-diffusion-art.com/video-to-video/">dance</a> and <a href="https://mixkit.co/free-stock-video/girl-smiling-portrait-in-the-library-4756/">smiling</a>.  
 
+## Todo
+- [x] Add Controlnet in the pipeline.
+- [x] Add Controlnet in Gradio Demo.
+- [X] Optimize code in attention processor style. 
+
+## Features
+- Added Controlnet for Video to Video control.
+- GPU Memory, ~12-14GB VRAM to inference w/o Controlnet and ~15-17GB VRAM with Controlnet.
+
+- **[2023/09/10]** New Motion Module release ! `mm_sd_v15_v2.ckpt` was trained on larger resolution & batch size, and gains noticabe quality improvements.Check it out at [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing) / [HuggingFace](https://huggingface.co/guoyww/animatediff) and use it with `configs/inference/inference-v2.yaml`. Example:
+  ```
+  python -m scripts.animate --config configs/prompts/v2/5-RealisticVision.yaml
+  ```
+  Here is a qualitative comparison between `mm_sd_v15.ckpt` (left) and `mm_sd_v15_v2.ckpt` (right):
+  <table class="center">
+      <tr>
+      <td><img src="__assets__/animations/compare/old_0.gif"></td>
+      <td><img src="__assets__/animations/compare/new_0.gif"></td>
+      <td><img src="__assets__/animations/compare/old_1.gif"></td>
+      <td><img src="__assets__/animations/compare/new_1.gif"></td>
+      <td><img src="__assets__/animations/compare/old_2.gif"></td>
+      <td><img src="__assets__/animations/compare/new_2.gif"></td>
+      <td><img src="__assets__/animations/compare/old_3.gif"></td>
+      <td><img src="__assets__/animations/compare/new_3.gif"></td>
+      </tr>
+  </table>
+- GPU Memory Optimization, ~12GB VRAM to inference
+
+- User Interface: [Gradio](#gradio-demo), A1111 WebUI Extension [sd-webui-animatediff](https://github.com/continue-revolution/sd-webui-animatediff) (by [@continue-revolution](https://github.com/continue-revolution))
+- Google Colab: [Colab](https://colab.research.google.com/github/camenduru/AnimateDiff-colab/blob/main/AnimateDiff_colab.ipynb) (by [@camenduru](https://github.com/camenduru))
 
 ## Common Issues
 <details>
 <summary>Installation</summary>
+
 Please ensure the installation of [xformer](https://github.com/facebookresearch/xformers) that is applied to reduce the inference memory.
 </details>
 
+
 <details>
 <summary>Various resolution or number of frames</summary>
 Currently, we recommend users to generate animation with 16 frames and 512 resolution that are aligned with our training settings. Notably, various resolution/frames may affect the quality more or less. 
 </details>
 
+
+<details>
+<summary>How to use it without any coding</summary>
+
+1) Get lora models: train lora model with [A1111](https://github.com/continue-revolution/sd-webui-animatediff) based on a collection of your own favorite images (e.g., tutorials [English](https://www.youtube.com/watch?v=mfaqqL5yOO4), [Japanese](https://www.youtube.com/watch?v=N1tXVR9lplM), [Chinese](https://www.bilibili.com/video/BV1fs4y1x7p2/)) 
+or download Lora models from [Civitai](https://civitai.com/).
+
+2) Animate lora models: using gradio interface or A1111 
+(e.g., tutorials [English](https://github.com/continue-revolution/sd-webui-animatediff), [Japanese](https://www.youtube.com/watch?v=zss3xbtvOWw), [Chinese](https://941ai.com/sd-animatediff-webui-1203.html)) 
+
+3) Be creative togther with other techniques, such as, super resolution, frame interpolation, music generation, etc.
+</details>
+
+
 <details>
 <summary>Animating a given image</summary>
+
 We totally agree that animating a given image is an appealing feature, which we would try to support officially in future. For now, you may enjoy other efforts from the [talesofai](https://github.com/talesofai/AnimateDiff).  
 </details>
 
 <details>
 <summary>Contributions from community</summary>
-Contributions are always welcome!! We will create another branch which community could contribute to. As for the main branch, we would like to align it with the original technical report:)
+Contributions are always welcome!! The <code>dev</code> branch is for community contributions. As for the main branch, we would like to align it with the original technical report :)
 </details>
 
 
-
-## Setup for Inference
+## Setups for Inference
 
 ### Prepare Environment
-~~Our approach takes around 60 GB GPU memory to inference. NVIDIA A100 is recommanded.~~
-
-***We updated our inference code with xformers and a sequential decoding trick. Now AnimateDiff takes only ~12GB VRAM to inference, and run on a single RTX3090 !!***
 
 ```
 git clone https://github.com/guoyww/AnimateDiff.git
@@ -71,7 +136,7 @@ git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDif
 
 bash download_bashscripts/0-MotionModule.sh
 ```
-You may also directly download the motion module checkpoints from [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing), then put them in `models/Motion_Module/` folder.
+You may also directly download the motion module checkpoints from [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing) / [HuggingFace](https://huggingface.co/guoyww/animatediff) / [CivitAI](https://civitai.com/models/108836), then put them in `models/Motion_Module/` folder.
 
 ### Prepare Personalize T2I
 Here we provide inference configs for 6 demo T2I on CivitAI.
@@ -123,6 +188,59 @@ Then run the following commands:
 ```
 python -m scripts.animate --config [path to the config file]
 ```
+## Inference with Controlnet
+Controlnet appoach is using video as source of content. It takes first `L` (usualy 16) frames from video. 
+
+Download controlnet models using script:
+```bash
+bash download_bashscripts/9-Controlnets.sh
+```  
+
+Run examples:
+```bash
+python -m scripts.animate --config configs/prompts/1-ToonYou-Controlnet.yaml
+python -m scripts.animate --config configs/prompts/2-Lyriel-Controlnet.yaml
+python -m scripts.animate --config configs/prompts/3-RcnzCartoon-Controlnet.yaml
+```
+
+Add controlnet to other config (see example in 1-ToonYou-Controlnet.yaml):
+```yaml
+control:
+  video_path: "./videos/smiling.mp4"
+  get_each: 2 # get each frame from video
+  controlnet_processor: "softedge" # softedge, canny, depth
+  controlnet_pipeline: "models/StableDiffusion/stable-diffusion-v1-5"
+  controlnet_processor_path: "models/Controlnet/control_v11p_sd15_softedge" # control_v11p_sd15_softedge, control_v11f1p_sd15_depth, control_v11p_sd15_canny
+  guess_mode: True
+```
+
+## Steps for Training
+
+### Dataset
+Before training, download the videos files and the `.csv` annotations of [WebVid10M](https://maxbain.com/webvid-dataset/) to the local mechine.
+Note that our examplar training script requires all the videos to be saved in a single folder. You may change this by modifying `animatediff/data/dataset.py`.
+
+### Configuration
+After dataset preparations, update the below data paths in the config `.yaml` files in `configs/training/` folder:
+```
+train_data:
+  csv_path:     [Replace with .csv Annotation File Path]
+  video_folder: [Replace with Video Folder Path]
+  sample_size:  256
+```
+Other training parameters (lr, epochs, validation settings, etc.) are also included in the config files.
+
+### Training
+To train motion modules
+```
+torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/training.yaml
+```
+
+To finetune the unet's image layers
+```
+torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/image_finetune.yaml
+```
+
 
 ## Gradio Demo
 We have created a Gradio demo to make AnimateDiff easier to use. To launch the demo, please run the following commands:
@@ -131,6 +249,8 @@ conda activate animatediff
 python app.py
 ```
 By default, the demo will run at `localhost:7860`.
+Be sure that imageio with backend is installed. (pip install imageio[ffmpeg])
+
 <br><img src="__assets__/figs/gradio.jpg" style="width: 50em; margin-top: 1em">
 
 ## Gallery
@@ -241,4 +361,4 @@ Pose Model：<a href="https://civitai.com/models/107295/or-holdingsign">Hold Sig
 **Bo Dai**: [daibo@pjlab.org.cn](mailto:daibo@pjlab.org.cn)
 
 ## Acknowledgements
-Codebase built upon [Tune-a-Video](https://github.com/showlab/Tune-A-Video).
+Codebase built upon [Tune-a-Video](https://github.com/showlab/Tune-A-Video).
diff --git a/__assets__/animations/compare/ffmpeg b/__assets__/animations/compare/ffmpeg
diff --git a/__assets__/animations/compare/new_0.gif b/__assets__/animations/compare/new_0.gif
diff --git a/__assets__/animations/compare/new_1.gif b/__assets__/animations/compare/new_1.gif
diff --git a/__assets__/animations/compare/new_2.gif b/__assets__/animations/compare/new_2.gif
diff --git a/__assets__/animations/compare/new_3.gif b/__assets__/animations/compare/new_3.gif
diff --git a/__assets__/animations/compare/old_0.gif b/__assets__/animations/compare/old_0.gif
diff --git a/__assets__/animations/compare/old_1.gif b/__assets__/animations/compare/old_1.gif
diff --git a/__assets__/animations/compare/old_2.gif b/__assets__/animations/compare/old_2.gif
diff --git a/__assets__/animations/compare/old_3.gif b/__assets__/animations/compare/old_3.gif
diff --git a/__assets__/animations/control/canny/dance_1girl.gif b/__assets__/animations/control/canny/dance_1girl.gif
diff --git a/__assets__/animations/control/canny/dance_medival_portrait.gif b/__assets__/animations/control/canny/dance_medival_portrait.gif
diff --git a/__assets__/animations/control/canny/smiling_medival_portrait.gif b/__assets__/animations/control/canny/smiling_medival_portrait.gif
diff --git a/__assets__/animations/control/depth/smiling_1girl.gif b/__assets__/animations/control/depth/smiling_1girl.gif
diff --git a/__assets__/animations/control/depth/smiling_forbidden_castle.gif b/__assets__/animations/control/depth/smiling_forbidden_castle.gif
diff --git a/__assets__/animations/control/depth/smiling_halo.gif b/__assets__/animations/control/depth/smiling_halo.gif
diff --git a/__assets__/animations/control/depth/smiling_medival.gif b/__assets__/animations/control/depth/smiling_medival.gif
diff --git a/__assets__/animations/control/depth/smiling_realistic_0.gif b/__assets__/animations/control/depth/smiling_realistic_0.gif
diff --git a/__assets__/animations/control/depth/smiling_realistic_1.gif b/__assets__/animations/control/depth/smiling_realistic_1.gif
diff --git a/__assets__/animations/control/depth/smiling_realistic_2.gif b/__assets__/animations/control/depth/smiling_realistic_2.gif
diff --git a/__assets__/animations/control/original/dance_original_16_2.gif b/__assets__/animations/control/original/dance_original_16_2.gif
diff --git a/__assets__/animations/control/original/smiling_original_16_2.gif b/__assets__/animations/control/original/smiling_original_16_2.gif
diff --git a/__assets__/animations/control/softedge/dance_1girl.gif b/__assets__/animations/control/softedge/dance_1girl.gif
diff --git a/__assets__/animations/control/softedge/smiling_realistic_0.gif b/__assets__/animations/control/softedge/smiling_realistic_0.gif