Developer Guide 🛠️

Welcome to the synctoon developer documentation! This guide provides comprehensive technical details for developers who want to understand, contribute to, or extend the synctoon codebase.

📋 Table of Contents

Architecture Overview
Core Components
Data Flow
Development Setup
Code Structure
API Integration
Asset Management
Frame Generation Pipeline
Testing
Contributing Guidelines
Troubleshooting

🏗️ Architecture Overview

Synctoon follows a modular pipeline architecture that processes text and audio inputs through several stages to produce animated video output.

┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│   Input     │    │ AI Analysis  │    │   Frame     │    │    Video     │
│ Text/Audio  │───▶│   & Sync     │───▶│ Generation  │───▶│  Compilation │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘
       │                   │                   │                   │
       ▼                   ▼                   ▼                   ▼
   Script.txt         Animation Cues      PNG Frames         Final.mp4
   Audio.mp3          Timing Data         Asset Composites

🔧 Core Components

1. Brain Requests (`core/brain_requests/`)

Purpose: Handles AI-powered text analysis and audio processing

Key Files:

prompts.py: Contains AI prompts for text analysis
text_aligner.py: Aligns processed text with timing data
speach_aligner.py: Handles speech-to-text alignment
utils.py: Utility functions for data processing
validater.py: Validates AI responses and data integrity

Technical Details:

# Example AI prompt structure
ANIMATION_PROMPT = {
    "system": "Analyze text for animation cues",
    "user_template": "Extract head movements, emotions, and dialogue from: {text}",
    "response_format": "JSON with timestamps and animation data"
}

2. Image Manager (`core/image_manager/`)

Purpose: Manages character assets and compositing

Key Files:

CharacterManager.py: Main class for character asset handling

Technical Implementation:

class CharacterManager:
    def __init__(self, character_path, metadata_path):
        self.assets = self.load_character_assets()
        self.metadata = self.load_metadata()
    
    def composite_frame(self, frame_data):
        # Layer assets: background → body → head → eyes → mouth
        pass
    
    def apply_emotion(self, emotion_type, intensity):
        # Dynamically select appropriate asset variations
        pass

3. Frame Generator (`core/frame_generator.py`)

Purpose: Creates individual animation frames

Process Flow:

Background Loading: Selects appropriate background based on scene context
Character Positioning: Places characters according to metadata coordinates
Asset Layering: Composites body parts in correct z-order
Effect Application: Applies zoom, transitions, and visual effects
Frame Export: Saves as PNG with sequential numbering

def generate_frame(frame_index, animation_data):
    # 1. Create base canvas
    canvas = Image.new('RGBA', (1920, 1080), (0, 0, 0, 0))
    
    # 2. Add background
    background = load_background(animation_data['scene'])
    canvas.paste(background, (0, 0))
    
    # 3. Composite characters
    for character in animation_data['characters']:
        character_composite = create_character_composite(character)
        canvas.paste(character_composite, character['position'])
    
    # 4. Apply effects
    if animation_data.get('zoom'):
        canvas = apply_zoom(canvas, animation_data['zoom'])
    
    return canvas

📊 Data Flow

Input Processing

Text Script ──┐
              ├─► AI Analysis ──► Animation Cues (JSON)
Audio File ───┘                         │
                                        ▼
Gentle Service ──► Transcription ──► Timing Alignment
                                        │
                                        ▼
                               Frame Instructions

Animation Data Structure

{
  "frames": [
    {
      "timestamp": 0.0,
      "duration": 0.033,
      "characters": [
        {
          "character_id": "character_1",
          "position": [640, 360],
          "head_direction": "center",
          "eye_emotion": "happy",
          "mouth_shape": "A",
          "body_pose": "standing"
        }
      ],
      "background": "living-room-1",
      "camera": {
        "zoom": 1.0,
        "focus": [640, 360]
      }
    }
  ]
}

🚀 Development Setup

Prerequisites

Python 3.8+
Docker & Docker Compose
FFmpeg (for video processing)
Google AI Studio API Key

Local Development

# 1. Clone and setup
git clone https://github.com/Automate-Animation/synctoon.git
cd synctoon

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# 3. Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt  # Development dependencies

# 4. Setup environment variables
cp .env.example .env
# Edit .env with your API keys

# 5. Start services
cd Docker
docker-compose up -d

# 6. Run tests
pytest tests/

Development Dependencies

# requirements-dev.txt
pytest>=7.0.0
black>=22.0.0
flake8>=4.0.0
mypy>=0.950
pre-commit>=2.15.0

📁 Code Structure

synctoon/
├── core/
│   ├── __init__.py
│   ├── core.py                    # Main orchestrator
│   ├── create_animation.py        # CLI entry point
│   ├── frame_generator.py         # Frame composition logic
│   ├── frame_to_video.py         # Video compilation
│   ├── check_models.py           # Model validation
│   ├── test.py                   # Integration tests
│   │
│   ├── brain_requests/           # AI & Audio Processing
│   │   ├── __init__.py
│   │   ├── prompts.py           # AI prompt templates
│   │   ├── text_aligner.py      # Text-audio alignment
│   │   ├── speach_aligner.py    # Speech processing
│   │   ├── utils.py             # Utility functions
│   │   └── validater.py         # Data validation
│   │
│   ├── image_manager/           # Asset Management
│   │   ├── __init__.py
│   │   └── CharacterManager.py  # Character asset handler
│   │
│   ├── images/                  # Asset Storage
│   │   ├── characters/          # Character assets
│   │   │   └── character_1/
│   │   │       ├── body/        # Body variations
│   │   │       ├── head/        # Head positions
│   │   │       ├── eyes/        # Eye emotions
│   │   │       ├── mouth/       # Mouth shapes
│   │   │       └── background/  # Scene backgrounds
│   │   └── metadata/
│   │       └── metadata.json    # Asset positioning data
│   │
│   └── utils/                   # Utility Scripts
│       ├── add_phonemes.py      # Phoneme processing
│       ├── constants.py         # Global constants
│       ├── frame_info_generator.py
│       ├── mouth_image.json     # Mouth shape mappings
│       └── update_character_asset_name.py
│
├── example/story/               # Sample content
├── Docker/                      # Service containers
├── tests/                       # Test suite
└── docs/                        # Documentation

🔌 API Integration

Google Generative AI

# core/brain_requests/utils.py
import google.generativeai as genai

class AIAnalyzer:
    def __init__(self, api_key):
        genai.configure(api_key=api_key)
        self.model = genai.GenerativeModel('gemini-pro')
    
    def analyze_text(self, text, prompt_template):
        """Analyze text for animation cues"""
        prompt = prompt_template.format(text=text)
        response = self.model.generate_content(prompt)
        return self.parse_response(response.text)

Gentle Transcription Service

# core/brain_requests/speach_aligner.py
import requests

class GentleClient:
    def __init__(self, base_url="http://localhost:49153"):
        self.base_url = base_url
    
    def transcribe(self, audio_file, transcript):
        """Align audio with transcript"""
        files = {
            'audio': open(audio_file, 'rb'),
            'transcript': transcript
        }
        response = requests.post(f"{self.base_url}/transcriptions", files=files)
        return response.json()

🎨 Asset Management

Character Asset Structure

character_1/
├── body/
│   ├── standing/
│   │   ├── body1.png
│   │   └── body2.png
│   └── sitting/
│       └── body_sitting.png
├── head/
│   ├── center/
│   ├── left/
│   └── right/
├── eyes/
│   ├── happy/
│   │   ├── eyes_happy.png
│   │   └── eyes_happy_blink/
│   ├── sad/
│   └── neutral/
└── mouth/
    ├── A/          # Phoneme shapes
    ├── E/
    ├── I/
    └── closed/

Metadata Configuration

{
  "character_1": {
    "body": {
      "position": [640, 800],
      "size": [400, 600],
      "anchor": "bottom-center"
    },
    "head": {
      "position": [640, 300],
      "size": [200, 250],
      "anchor": "center"
    },
    "eyes": {
      "position": [640, 250],
      "size": [150, 50],
      "anchor": "center"
    },
    "mouth": {
      "position": [640, 320],
      "size": [80, 40],
      "anchor": "center"
    }
  }
}

🎬 Frame Generation Pipeline

1. Scene Analysis

def analyze_scene(text_segment, timestamp):
    """Extract scene information from text"""
    scene_data = {
        'location': extract_location(text_segment),
        'characters': extract_characters(text_segment),
        'emotions': extract_emotions(text_segment),
        'actions': extract_actions(text_segment)
    }
    return scene_data

2. Asset Selection

def select_assets(character_data, emotion, phoneme):
    """Select appropriate assets for current frame"""
    assets = {
        'body': f"body/{character_data['pose']}/body1.png",
        'head': f"head/{character_data['head_direction']}/head.png",
        'eyes': f"eyes/{emotion}/eyes_{emotion}.png",
        'mouth': f"mouth/{phoneme}/mouth_{phoneme}.png"
    }
    return assets

3. Composition

def composite_frame(background, character_assets, metadata):
    """Composite all assets into final frame"""
    frame = load_background(background)
    
    # Layer order: background → body → head → eyes → mouth
    for asset_type in ['body', 'head', 'eyes', 'mouth']:
        asset = load_asset(character_assets[asset_type])
        position = metadata[asset_type]['position']
        frame = overlay_asset(frame, asset, position)
    
    return frame

🧪 Testing

Test Structure

tests/
├── unit/
│   ├── test_character_manager.py
│   ├── test_frame_generator.py
│   └── test_ai_analyzer.py
├── integration/
│   ├── test_full_pipeline.py
│   └── test_asset_loading.py
└── fixtures/
    ├── sample_audio.mp3
    ├── sample_script.txt
    └── test_assets/

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest tests/unit/
pytest tests/integration/

# Run with coverage
pytest --cov=core tests/

# Run performance tests
pytest tests/performance/ --benchmark-only

Example Test

# tests/unit/test_character_manager.py
import pytest
from core.image_manager.CharacterManager import CharacterManager

class TestCharacterManager:
    def setup_method(self):
        self.manager = CharacterManager("tests/fixtures/test_assets")
    
    def test_load_character_assets(self):
        assets = self.manager.load_character_assets()
        assert 'character_1' in assets
        assert 'body' in assets['character_1']
    
    def test_composite_frame(self):
        frame_data = {
            'character_id': 'character_1',
            'emotion': 'happy',
            'phoneme': 'A'
        }
        frame = self.manager.composite_frame(frame_data)
        assert frame is not None
        assert frame.size == (1920, 1080)

🤝 Contributing Guidelines

Code Style

# Format code
black core/ tests/

# Lint code
flake8 core/ tests/

# Type checking
mypy core/

Git Workflow

# 1. Create feature branch
git checkout -b feature/your-feature-name

# 2. Make changes and commit
git add .
git commit -m "feat: add new animation feature"

# 3. Push and create PR
git push origin feature/your-feature-name

Commit Convention

feat: add new feature
fix: bug fix
docs: documentation changes
style: formatting changes
refactor: code refactoring
test: adding tests
chore: maintenance tasks

Pull Request Checklist

🐛 Troubleshooting

Common Issues

1. Gentle Service Not Running

# Check service status
docker-compose ps

# Restart service
docker-compose restart gentle

# Check logs
docker-compose logs gentle

2. Asset Loading Errors

# Debug asset paths
import os
asset_path = "core/images/characters/character_1/body/body1.png"
print(f"Asset exists: {os.path.exists(asset_path)}")

3. Memory Issues with Large Animations

# Optimize memory usage
import gc

def generate_frames_batched(frame_data, batch_size=50):
    for i in range(0, len(frame_data), batch_size):
        batch = frame_data[i:i + batch_size]
        process_batch(batch)
        gc.collect()  # Force garbage collection

4. Audio Sync Issues

# Verify audio alignment
def check_audio_alignment(gentle_output):
    for word in gentle_output['words']:
        if word['case'] != 'success':
            print(f"Alignment issue: {word}")

Performance Optimization

1. Asset Caching

from functools import lru_cache

@lru_cache(maxsize=128)
def load_asset_cached(asset_path):
    """Cache frequently used assets"""
    return Image.open(asset_path)

2. Parallel Frame Generation

from multiprocessing import Pool

def generate_frames_parallel(frame_data):
    with Pool(processes=4) as pool:
        frames = pool.map(generate_single_frame, frame_data)
    return frames

3. Memory Profiling

# Install memory profiler
pip install memory-profiler

# Profile memory usage
python -m memory_profiler core/frame_generator.py

📚 Additional Resources

🔄 Version History

v1.0.0 (Current)

Initial release with basic animation pipeline
Character asset management system
AI-powered text analysis
Audio synchronization

Planned v1.1.0

Web interface
Enhanced character library
Background generation system
Performance improvements

Happy coding! 🚀

For questions or support, please open an issue on GitHub or contact the maintainers.

FilesExpand file tree

DEVELOPER.md

Latest commit

History

DEVELOPER.md

File metadata and controls

Developer Guide 🛠️

📋 Table of Contents

🏗️ Architecture Overview

🔧 Core Components

1. Brain Requests (core/brain_requests/)

Key Files:

Technical Details:

2. Image Manager (core/image_manager/)

Key Files:

Technical Implementation:

3. Frame Generator (core/frame_generator.py)

Process Flow:

📊 Data Flow

Input Processing

Animation Data Structure

🚀 Development Setup

Prerequisites

Local Development

Development Dependencies

📁 Code Structure

🔌 API Integration

Google Generative AI

Gentle Transcription Service

🎨 Asset Management

Character Asset Structure

Metadata Configuration

🎬 Frame Generation Pipeline

1. Scene Analysis

2. Asset Selection

3. Composition

🧪 Testing

Test Structure

Running Tests

Example Test

🤝 Contributing Guidelines

Code Style

Git Workflow

Commit Convention

Pull Request Checklist

🐛 Troubleshooting

Common Issues

1. Gentle Service Not Running

2. Asset Loading Errors

3. Memory Issues with Large Animations

4. Audio Sync Issues

Performance Optimization

1. Asset Caching

2. Parallel Frame Generation

3. Memory Profiling

📚 Additional Resources

🔄 Version History

v1.0.0 (Current)

Planned v1.1.0

1. Brain Requests (`core/brain_requests/`)

2. Image Manager (`core/image_manager/`)

3. Frame Generator (`core/frame_generator.py`)