Pipeline for building vertical manga recap videos with SRT as the master timeline and cinematic audio generation.
Detailed technical guide: DETAILED_DOCUMENTATION.md.
- Input: manga source (
pdf_pathor webtoon URL) + optional subtitle source SRT (srt_path). - Audio flow: SRT -> OpenAI dialogue analysis -> deterministic speech rendering -> ElevenLabs TTS per line -> ElevenLabs SFX -> ElevenLabs scene music -> cinematic FFmpeg mix.
- Video flow: panels are animated using SRT durations, then muxed with narration and subtitles.
- Provider failures are fatal (no local fallback for ElevenLabs SFX/Music/TTS).
manga_video_pipeline/
app/
main.py
config.py
routes/generate.py
services/
utils/
models/
outputs/
tests/
requirements.txt
- Python 3.10+
- FFmpeg and FFprobe in
PATH - Poppler (for PDF input)
- OpenAI API key (dialogue analysis)
- ElevenLabs API key (TTS + SFX + Music; optional STS)
Create .env in the repo root or under manga_video_pipeline/ (both are loaded).
Required for real runs:
OPENAI_API_KEY=your_openai_key
ELEVENLABS_API_KEY=your_elevenlabs_key
LOG_LEVEL=INFOCommon optional knobs:
# Video
VIDEO_PLAYBACK_SPEED=1.0
RUN_KEEP_INTERMEDIATE_CLIPS=false
# SRT master timeline
SRT_REQUIRE_INPUT=true
# BGM in final mux
BGM_DEFAULT_PATH=/absolute/path/to/bed.mp3
BGM_VOLUME=0.14
# Cinematic narration mix
AUDIO_BED_IN_NARRATION=true
AUDIO_MIXER_DUCKING_ENABLED=true
AUDIO_MIXER_MUSIC_GAIN=0.16
AUDIO_MIXER_SFX_GAIN=0.7
AUDIO_MIXER_VOICE_GAIN=1.0
# ElevenLabs generation
ELEVENLABS_SFX_ENABLED=true
ELEVENLABS_MUSIC_ENABLED=true
ELEVENLABS_STS_ENABLED=false
# Per-speaker ElevenLabs voice IDs (JSON)
ELEVENLABS_VOICE_MAP_JSON=
# Pipeline
MAX_PAGES=80
MIN_VIDEO_DURATION_SEC=30
MAX_VIDEO_DURATION_SEC=180
ENABLE_SUBTITLES_DEFAULT=true
ENABLE_CACHE=truepip install -r manga_video_pipeline/requirements.txtFrom the manga_video_pipeline directory (so app resolves):
cd manga_video_pipeline
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000Or from the repo root:
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000 --app-dir manga_video_pipeline- Open
http://127.0.0.1:8000/opsfor the operations dashboard. - Start jobs, browse runs, inspect
job_report.json, previewvideo.mp4. - Search local Webtoon catalog by genre/title and generate episode ranges sequentially.
- Refresh catalog index from crawler using the UI button.
srt_path is optional. If omitted, subtitles timeline is auto-generated from OCR.
{
"pdf_path": "/absolute/path/to/manga.pdf",
"srt_path": "/absolute/path/to/subtitles.srt",
"subtitles": true
}Success:
{
"status": "completed",
"video_path": "manga_video_pipeline/outputs/runs/<run_key>/final/video.mp4",
"audio_path": "manga_video_pipeline/outputs/runs/<run_key>/audio/narration.mp3",
"quality": "cinematic"
}Failure:
{
"status": "failed",
"stage": "srt_loader",
"error_code": "SRT_REQUIRED",
"error": "..."
}script_cleanerhas been removed from runtime; SRT is authoritative for timing/text.- ElevenLabs SFX/Music/TTS failures are surfaced as provider errors and stop the run.
meta/job_report.jsonincludessrt_analysisandaudio_event_timeline.- Optional OCR fallback remains when
srt_pathis not provided.
New ops APIs:
GET /ops/api/manga— searchable local catalogGET /ops/api/manga/{title_slug}/episodes— episodes for selected titleGET /ops/api/manga/status— catalog refresh statusPOST /ops/api/manga/refresh— manual crawl refreshPOST /ops/api/generate-range— sequential per-episode generation
Startup behavior:
- If
WEBTOON_CATALOG_ENABLED=true, the app performs a blocking catalog refresh at startup. - Startup serves requests only after refresh is done.
cd manga_video_pipeline && python3 -m pytest tests/ -q# Local catalog
WEBTOON_CATALOG_ENABLED=false
WEBTOON_CATALOG_FILE=webtoon_catalog.json
WEBTOON_CATALOG_GENRE_URLS_JSON=["https://www.webtoons.com/en/action", "https://www.webtoons.com/en/romance"]
WEBTOON_CATALOG_REQUEST_TIMEOUT_SEC=25
WEBTOON_CATALOG_MAX_TITLES_PER_GENRE=80
WEBTOON_CATALOG_MAX_EPISODES_PER_TITLE=250
# OpenAI load control
OPENAI_OCR_BATCH_SIZE=3
OPENAI_OCR_BATCH_PACE_SEC=0.35
OPENAI_DIALOGUE_BATCH_SIZE=3
OPENAI_DIALOGUE_BATCH_PACE_SEC=0.35