Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
samples/
wandb/
outputs/
__pycache__/

scripts/animate_inter.py
scripts/gradio_app.py
models/Controlnet/*
models/DreamBooth_LoRA/*
models/DreamBooth_LoRA/Put*personalized*T2I*checkpoints*here.txt
models/Motion_Module/*
models/*
*.ipynb
*.safetensors
*.ckpt
.ossutil_checkpoint/
ossutil_output/
debugs/
154 changes: 137 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AnimateDiff
# Controled AnimateDiff (V2 is also available)

This repository is the official implementation of [AnimateDiff](https://arxiv.org/abs/2307.04725).
This repository is an <b>Controlnet Extension</b> of the official implementation of [AnimateDiff](https://arxiv.org/abs/2307.04725).

**[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725)**
</br>
Expand All @@ -11,48 +11,113 @@ Yaohui Wang,
Yu Qiao,
Dahua Lin,
Bo Dai

<p style="font-size: 0.8em; margin-top: -1em">*Corresponding Author</p>

[Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://animatediff.github.io/)
<!-- [Arxiv Report](https://arxiv.org/abs/2307.04725) | [Project Page](https://animatediff.github.io/) -->
[![arXiv](https://img.shields.io/badge/arXiv-2307.04725-b31b1b.svg)](https://arxiv.org/abs/2307.04725)
[![Project Page](https://img.shields.io/badge/Project-Website-green)](https://animatediff.github.io/)
[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/Masbfca/AnimateDiff)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/guoyww/AnimateDiff)

## Todo
- [x] Code Release
- [x] Arxiv Report
- [x] GPU Memory Optimization
- [ ] Gradio Interface
***WARNING! This version works as well as official but not compatible with the official implementation due to the difference of library versions.***

<table width="1200" class="center">
<tr>
<td><img src="__assets__/animations/control/original/dance_original_16_2.gif"></td>
<td><img src="__assets__/animations/control/softedge/dance_1girl.gif"></td>
<td><img src="__assets__/animations/control/canny/dance_1girl.gif"></td>
<td><img src="__assets__/animations/control/canny/dance_medival_portrait.gif"></td>
</tr>
</table>
<table width="1200" class="center">
<tr>
<td><img src="__assets__/animations/control/original/smiling_original_16_2.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_realistic_0.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_realistic_1.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_realistic_2.gif"></td>
</tr>
<tr>
<td><img src="__assets__/animations/control/depth/smiling_1girl.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_forbidden_castle.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_halo.gif"></td>
<td><img src="__assets__/animations/control/depth/smiling_medival.gif"></td>
</tr>
</table>
Test video sources: <a href="https://stable-diffusion-art.com/video-to-video/">dance</a> and <a href="https://mixkit.co/free-stock-video/girl-smiling-portrait-in-the-library-4756/">smiling</a>.

## Todo
- [x] Add Controlnet in the pipeline.
- [x] Add Controlnet in Gradio Demo.
- [X] Optimize code in attention processor style.

## Features
- Added Controlnet for Video to Video control.
- GPU Memory, ~12-14GB VRAM to inference w/o Controlnet and ~15-17GB VRAM with Controlnet.

- **[2023/09/10]** New Motion Module release ! `mm_sd_v15_v2.ckpt` was trained on larger resolution & batch size, and gains noticabe quality improvements.Check it out at [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing) / [HuggingFace](https://huggingface.co/guoyww/animatediff) and use it with `configs/inference/inference-v2.yaml`. Example:
```
python -m scripts.animate --config configs/prompts/v2/5-RealisticVision.yaml
```
Here is a qualitative comparison between `mm_sd_v15.ckpt` (left) and `mm_sd_v15_v2.ckpt` (right):
<table class="center">
<tr>
<td><img src="__assets__/animations/compare/old_0.gif"></td>
<td><img src="__assets__/animations/compare/new_0.gif"></td>
<td><img src="__assets__/animations/compare/old_1.gif"></td>
<td><img src="__assets__/animations/compare/new_1.gif"></td>
<td><img src="__assets__/animations/compare/old_2.gif"></td>
<td><img src="__assets__/animations/compare/new_2.gif"></td>
<td><img src="__assets__/animations/compare/old_3.gif"></td>
<td><img src="__assets__/animations/compare/new_3.gif"></td>
</tr>
</table>
- GPU Memory Optimization, ~12GB VRAM to inference

- User Interface: [Gradio](#gradio-demo), A1111 WebUI Extension [sd-webui-animatediff](https://github.com/continue-revolution/sd-webui-animatediff) (by [@continue-revolution](https://github.com/continue-revolution))
- Google Colab: [Colab](https://colab.research.google.com/github/camenduru/AnimateDiff-colab/blob/main/AnimateDiff_colab.ipynb) (by [@camenduru](https://github.com/camenduru))

## Common Issues
<details>
<summary>Installation</summary>

Please ensure the installation of [xformer](https://github.com/facebookresearch/xformers) that is applied to reduce the inference memory.
</details>


<details>
<summary>Various resolution or number of frames</summary>
Currently, we recommend users to generate animation with 16 frames and 512 resolution that are aligned with our training settings. Notably, various resolution/frames may affect the quality more or less.
</details>


<details>
<summary>How to use it without any coding</summary>

1) Get lora models: train lora model with [A1111](https://github.com/continue-revolution/sd-webui-animatediff) based on a collection of your own favorite images (e.g., tutorials [English](https://www.youtube.com/watch?v=mfaqqL5yOO4), [Japanese](https://www.youtube.com/watch?v=N1tXVR9lplM), [Chinese](https://www.bilibili.com/video/BV1fs4y1x7p2/))
or download Lora models from [Civitai](https://civitai.com/).

2) Animate lora models: using gradio interface or A1111
(e.g., tutorials [English](https://github.com/continue-revolution/sd-webui-animatediff), [Japanese](https://www.youtube.com/watch?v=zss3xbtvOWw), [Chinese](https://941ai.com/sd-animatediff-webui-1203.html))

3) Be creative togther with other techniques, such as, super resolution, frame interpolation, music generation, etc.
</details>


<details>
<summary>Animating a given image</summary>

We totally agree that animating a given image is an appealing feature, which we would try to support officially in future. For now, you may enjoy other efforts from the [talesofai](https://github.com/talesofai/AnimateDiff).
</details>

<details>
<summary>Contributions from community</summary>
Contributions are always welcome!! We will create another branch which community could contribute to. As for the main branch, we would like to align it with the original technical report:)
Contributions are always welcome!! The <code>dev</code> branch is for community contributions. As for the main branch, we would like to align it with the original technical report :)
</details>



## Setup for Inference
## Setups for Inference

### Prepare Environment
~~Our approach takes around 60 GB GPU memory to inference. NVIDIA A100 is recommanded.~~

***We updated our inference code with xformers and a sequential decoding trick. Now AnimateDiff takes only ~12GB VRAM to inference, and run on a single RTX3090 !!***

```
git clone https://github.com/guoyww/AnimateDiff.git
Expand All @@ -71,7 +136,7 @@ git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 models/StableDif

bash download_bashscripts/0-MotionModule.sh
```
You may also directly download the motion module checkpoints from [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing), then put them in `models/Motion_Module/` folder.
You may also directly download the motion module checkpoints from [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI?usp=sharing) / [HuggingFace](https://huggingface.co/guoyww/animatediff) / [CivitAI](https://civitai.com/models/108836), then put them in `models/Motion_Module/` folder.

### Prepare Personalize T2I
Here we provide inference configs for 6 demo T2I on CivitAI.
Expand Down Expand Up @@ -123,6 +188,59 @@ Then run the following commands:
```
python -m scripts.animate --config [path to the config file]
```
## Inference with Controlnet
Controlnet appoach is using video as source of content. It takes first `L` (usualy 16) frames from video.

Download controlnet models using script:
```bash
bash download_bashscripts/9-Controlnets.sh
```

Run examples:
```bash
python -m scripts.animate --config configs/prompts/1-ToonYou-Controlnet.yaml
python -m scripts.animate --config configs/prompts/2-Lyriel-Controlnet.yaml
python -m scripts.animate --config configs/prompts/3-RcnzCartoon-Controlnet.yaml
```

Add controlnet to other config (see example in 1-ToonYou-Controlnet.yaml):
```yaml
control:
video_path: "./videos/smiling.mp4"
get_each: 2 # get each frame from video
controlnet_processor: "softedge" # softedge, canny, depth
controlnet_pipeline: "models/StableDiffusion/stable-diffusion-v1-5"
controlnet_processor_path: "models/Controlnet/control_v11p_sd15_softedge" # control_v11p_sd15_softedge, control_v11f1p_sd15_depth, control_v11p_sd15_canny
guess_mode: True
```

## Steps for Training

### Dataset
Before training, download the videos files and the `.csv` annotations of [WebVid10M](https://maxbain.com/webvid-dataset/) to the local mechine.
Note that our examplar training script requires all the videos to be saved in a single folder. You may change this by modifying `animatediff/data/dataset.py`.

### Configuration
After dataset preparations, update the below data paths in the config `.yaml` files in `configs/training/` folder:
```
train_data:
csv_path: [Replace with .csv Annotation File Path]
video_folder: [Replace with Video Folder Path]
sample_size: 256
```
Other training parameters (lr, epochs, validation settings, etc.) are also included in the config files.

### Training
To train motion modules
```
torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/training.yaml
```

To finetune the unet's image layers
```
torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/image_finetune.yaml
```


## Gradio Demo
We have created a Gradio demo to make AnimateDiff easier to use. To launch the demo, please run the following commands:
Expand All @@ -131,6 +249,8 @@ conda activate animatediff
python app.py
```
By default, the demo will run at `localhost:7860`.
Be sure that imageio with backend is installed. (pip install imageio[ffmpeg])

<br><img src="__assets__/figs/gradio.jpg" style="width: 50em; margin-top: 1em">

## Gallery
Expand Down Expand Up @@ -241,4 +361,4 @@ Pose Model:<a href="https://civitai.com/models/107295/or-holdingsign">Hold Sig
**Bo Dai**: [daibo@pjlab.org.cn](mailto:daibo@pjlab.org.cn)

## Acknowledgements
Codebase built upon [Tune-a-Video](https://github.com/showlab/Tune-A-Video).
Codebase built upon [Tune-a-Video](https://github.com/showlab/Tune-A-Video).
Empty file.
Binary file added __assets__/animations/compare/new_0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/new_1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/new_2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/new_3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/old_0.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/old_1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/old_2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added __assets__/animations/compare/old_3.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading