Anime Character Consistency Engine — maintain character identity across long-form animation using a Director-Orchestrated multi-agent pipeline.
AnimeLoom coordinates specialized AI agents (character training, video generation, quality evaluation) to produce consistent anime sequences where characters look the same across every shot. Built to run on free/cheap GPU resources (Colab, Kaggle, GCP free credits) for under $20 total.
- Character Identity Preservation — LoRA fine-tuning (rank 16-32, fp16) locks in each character's visual identity via SDXL
- Studio-Quality Video Pipeline — SDXL keyframes + CogVideoX 1.5 animation + GFPGAN face restoration + Real-ESRGAN anime sharpening
- Full Body Anime Output — Head-to-toe character rendering in portrait orientation (512x768 → 480x720)
- Interactive Gradio Studio — Configure settings, preview keyframes, and generate video with a web UI directly in Colab
- Multi-Agent Pipeline — Director orchestrates Character, Animator, and Evaluator agents with a dependency-aware workflow graph
- Quality-Gated Output — shots scoring below 0.85 consistency are automatically regenerated (up to 3 attempts)
- Colab Survival Mode — keepalive every 4 min, auto-checkpoint every 5 min, Google Drive persistence, resume after disconnect
- Multiple Generation Backends — Wan2.2-Animate primary, PixVerse fallback, ControlNet pose conditioning
- Detection + Segmentation — GroundingDINO + SAM isolate characters; CLIP embeddings measure identity similarity
- REST API — FastAPI endpoints for character creation, shot generation, sequence processing, and job tracking
- Async Job Queue — Celery + Redis for background LoRA training and batch video generation
The Colab notebook (notebooks/AnimeLoom_Colab_Training.ipynb) runs a 4-phase pipeline:
Phase 1: SDXL + LoRA → Character-consistent keyframes (512x768 portrait)
Phase 2: CogVideoX 1.5 → Animate keyframes into motion clips (480x720, int8 quantized)
Phase 3: GFPGAN + Real-ESRGAN → Face restoration + anime frame sharpening
Phase 4: Cross-fade stitch → Blend clips into final video (mp4)
| Parameter | Default | Description |
|---|---|---|
IMAGE_WIDTH |
512 | SDXL output width (portrait) |
IMAGE_HEIGHT |
768 | SDXL output height (portrait) |
COGVID_STEPS |
60 | CogVideoX inference steps |
COGVID_GUIDANCE |
7.5 | CogVideoX guidance scale |
FPS |
16 | Output video framerate |
FACE_RESTORE |
True | Enable GFPGAN + Real-ESRGAN post-processing |
DENOISING_STRENGTH |
0.45 | Img2img strength for keyframe continuity |
- Full body framing:
"full body, head to toe, facing viewer, front view" - Studio look:
"anime screencap, studio quality, sharp lineart, vibrant colors" - Face stability:
"symmetrical eyes, stable eye shape, detailed face" - Motion stability:
"consistent pose, stable camera angle, slow deliberate blink" - Negative prompts:
"3d render, cgi, photorealistic, distorted eyes, blurry hair, mouth blur"
┌─────────────────────────────────────────────────────┐
│ DirectorAgent │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Script │→ │ WorkflowGraph│→ │ Shot Executor │ │
│ │ Parser │ │ (DAG) │ │ + Checkpoint │ │
│ └──────────┘ └──────────────┘ └───────┬───────┘ │
└──────────────────────────────────────────┼──────────┘
┌──────────────┬───────────────┼──────────────┐
▼ ▼ ▼ ▼
┌──────────────┐ ┌───────────┐ ┌────────────┐ ┌────────────┐
│ Character │ │ Animator │ │ Evaluator │ │ Asset │
│ Agent │ │ Agent │ │ Agent │ │ MemoryBank │
│ │ │ │ │ │ │ │
│ • LoRA Train │ │ • Wan2.2 │ │ • Identity │ │ • LoRAs │
│ • LoRA Mgmt │ │ • PixVerse│ │ • Motion │ │ • Embeds │
│ • Consistency│ │ • CtrlNet │ │ • Visual │ │ • Scenes │
└──────────────┘ └───────────┘ └────────────┘ └────────────┘
animeloom/
├── director/
│ ├── agent.py # Main orchestrator
│ ├── workflow.py # Shot dependency graph (DAG)
│ └── memory_bank.py # Persistent asset storage
├── agents/
│ ├── character/
│ │ ├── trainer.py # LoRA fine-tuning (PEFT)
│ │ ├── lora_manager.py # Adapter load/unload
│ │ └── consistency.py # GroundingDINO + SAM + CLIP
│ ├── animator/
│ │ ├── wan_wrapper.py # Wan2.2-Animate integration
│ │ ├── pixverse.py # PixVerse fallback
│ │ └── controlnet.py # OpenPose pose conditioning
│ └── evaluator/
│ ├── character_score.py # Identity consistency
│ ├── motion_score.py # Motion fidelity
│ └── visual_score.py # Frame quality
├── api/
│ ├── app.py # FastAPI application
│ ├── routes/
│ │ ├── characters.py # Character CRUD
│ │ └── generation.py # Shot & sequence generation
│ └── schemas/
│ └── models.py # Pydantic models
├── jobs/
│ ├── worker.py # Celery async worker
│ └── tasks/
│ ├── training.py # Background LoRA training
│ └── generation.py # Background video generation
├── cloud/
│ ├── colab_survival.py # Keep-alive + checkpointing
│ ├── kaggle_trainer.py # Kaggle P100 training wrapper
│ └── gcp_setup.sh # GCP T4 VM provisioning
├── notebooks/
│ └── AnimeLoom_Colab_Training.ipynb # Full Colab pipeline
├── warehouse/ # Runtime asset storage
│ ├── models/ # Base model weights
│ ├── lora/ # Character LoRA adapters
│ ├── datasets/ # Training data
│ ├── outputs/ # Generated videos
│ └── checkpoints/ # Resume checkpoints
├── scripts/
│ └── download_models.py # Download required model weights
├── main.py # CLI entry point
├── setup.sh # One-command setup
├── requirements.txt # Python dependencies
├── sample_script.txt # Example story script
└── .env.example # Environment config template
- Open
notebooks/AnimeLoom_Colab_Training.ipynbin Google Colab - Set runtime to A100 GPU (Runtime → Change runtime type → A100)
- Run cells in order:
- Cell 1 — Setup environment, install dependencies, mount Google Drive
- Cell 2 — Upload/download character reference images (10-30 images recommended)
- Cell 3 — Auto-caption images with BLIP
- Cell 4 — Train character LoRA (~15-20 min on A100)
- Cell 5 — Test LoRA with sample images
- Cell 9 — Generate short anime clip (SDXL + CogVideoX 1.5)
- Cell 10 — Generate long anime video (2+ minutes)
- Cell 11 — Launch Gradio Interactive Studio (web UI)
Cell 11 launches a web UI with:
- Character selection (auto-discovers trained LoRAs)
- SDXL and CogVideoX parameter sliders
- Editable keyframe and motion prompts
- Preview — generates 1 test keyframe + estimated stats before full run
- Generate — full 4-phase pipeline with progress bar
- Shareable public URL via
share=True
git clone https://github.com/JoelJohnsonThomas/AnimeLoom.git
cd AnimeLoom
chmod +x setup.sh
./setup.shpython main.py --test # smoke test
python main.py --script script.txt # process a story
python main.py --api # start FastAPI server
python main.py --colab # Colab survival modefrom cloud.kaggle_trainer import KaggleTrainer
trainer = KaggleTrainer()
lora_path = trainer.train("Denji", ["/kaggle/input/charsheet/denji.png"], rank=16, max_steps=500)
trainer.export_lora(lora_path)For studio-quality character output:
| Image Count | Quality Level | Notes |
|---|---|---|
| 10-15 | Good for prototyping | Basic identity, may drift on angles |
| 20-30 | Studio quality | Cover diverse angles, expressions, poses, lighting |
| 30+ | Diminishing returns | Only needed for very complex character designs |
Best practices:
- Use official anime screenshots (not fan art)
- Include front, 3/4, and side profile views
- Mix expressions: neutral, happy, serious, surprised
- Include full body and close-up shots
- Avoid heavily compressed or low-resolution images (512px+ on shortest side)
AnimeLoom uses a simple text-based script format:
SCENE: Character introduction
CHAR: Denji
A young boy with blonde messy hair stands on a city street
SCENE: Walking scene
CHAR: Denji
POSE: walking_pose.mp4
Denji walks through the city, looking around
SCENE: Rooftop
CHAR: Denji
Denji stands on a rooftop at sunset, hair blowing in the wind
Directives:
SCENE:orSHOT:— starts a new shotCHAR:— declares a character in the shot (comma-separated for multiple)POSE:— references a pose video for motion transfer- Free text — scene description / generation prompt
| Method | Endpoint | Description |
|---|---|---|
POST |
/character/create |
Upload character sheet, train LoRA |
GET |
/character/list |
List all characters |
GET |
/character/{id} |
Get character details |
DELETE |
/character/{id} |
Delete a character |
POST |
/generate/shot |
Generate single shot |
POST |
/generate/sequence |
Generate multi-shot sequence |
GET |
/job/{job_id} |
Check generation job status |
curl -X POST http://localhost:8080/character/create \
-H "Content-Type: application/json" \
-d '{
"name": "Denji",
"description": "Young boy with blonde messy hair and sharp teeth",
"image_paths": ["./images/denji_front.png", "./images/denji_side.png"]
}'curl -X POST http://localhost:8080/generate/sequence \
-H "Content-Type: application/json" \
-d '{
"script": "SCENE: Intro\nCHAR: Denji\nDenji stands on a city street",
"story_id": "chainsaw_ep01"
}'| Category | Tools |
|---|---|
| Image Generation | SDXL + PEFT LoRA |
| Video Generation | CogVideoX 1.5 (int8 quantized via optimum-quanto) |
| Face Restoration | GFPGAN v1.4 |
| Frame Sharpening | Real-ESRGAN (x4plus_anime_6B) |
| Detection | GroundingDINO + SAM + CLIP |
| Video Processing | OpenCV, ffmpeg |
| Web UI | Gradio |
| API | FastAPI, Uvicorn, Pydantic |
| Queue | Celery, Redis |
| Infra | Google Colab, Kaggle, GCP |
| Resource | Cost | What You Get |
|---|---|---|
| Google Colab Pro | $10/month | A100/V100 GPU, longer runtimes |
| Kaggle | Free | 30h/week P100 GPU |
| Google Cloud | Free $300 credits | ~850 hours T4 GPU |
| Total | < $20 | Full pipeline capability |
-
Script Parsing —
DirectorAgentparses your script into individual shots, extracting characters, descriptions, and pose references -
Dependency Planning —
WorkflowGraphbuilds a DAG ensuring characters are trained before their shots are generated. Independent shots can run in parallel -
Character Training — For each new character, a LoRA adapter is trained from reference images using PEFT (rank 32, ~1000 steps)
-
Keyframe Generation — SDXL with character LoRA generates consistent keyframes in portrait orientation (512x768). Img2img with low denoising strength maintains continuity between frames
-
Video Animation — CogVideoX 1.5 animates each keyframe into motion clips (49 frames each). Int8 quantization keeps VRAM under control on A100
-
Post-Processing — GFPGAN restores facial details, Real-ESRGAN (anime model) sharpens all frames. Cross-fade stitching blends clips together
-
Quality Evaluation — Generated shots are scored on character consistency (CLIP cosine similarity), motion fidelity (optical flow), and visual quality (sharpness, colour stability). Shots below 0.85 are regenerated
-
Checkpointing — Every 5 minutes, full state is saved. After a Colab disconnect, resume exactly where you left off
-
Assembly — All passing shots are concatenated via ffmpeg into the final video
| Variable | Default | Description |
|---|---|---|
AI_CACHE_ROOT |
./warehouse |
Root directory for all assets |
GOOGLE_DRIVE_MOUNT |
/content/drive/MyDrive/AniLoom/warehouse |
Drive path for Colab persistence |
REDIS_URL |
redis://localhost:6379/0 |
Redis URL for Celery job queue |
PIXVERSE_API_KEY |
— | PixVerse API key (optional fallback) |
API_HOST |
0.0.0.0 |
FastAPI bind host |
API_PORT |
8080 |
FastAPI bind port |
- Python 3.9+
- CUDA-capable GPU (A100 recommended for CogVideoX 1.5, T4/P100/V100 for training only)
- ffmpeg (for video assembly)
- Redis (optional, for Celery job queue)
- ~20-25 GB VRAM for full video pipeline (CogVideoX 1.5)
Contributions welcome! Please open an issue or pull request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request