The runtime is now SRT-driven. Subtitles are the source of truth for timing and dialogue text.
Inputs:
pdf_path(or webtoon URL)- optional
srt_path(if missing, timeline is auto-generated from OCR)
Outputs:
final/video.mp4audio/narration.mp3meta/timeline.json,meta/audio_timeline.json,meta/job_report.json
preflightpdf_loader/webtoon_loaderpanel_extractor(PDF only)srt_loader(load_srt_timeline)narrator:dialogue_analyzer.analyze_srt_timeline(OpenAI)speech_renderer.render_speech(deterministic)tts_elevenlabs.generate_ttsper SRT line- optional STS modulation (
elevenlabs_sts) sfx_engine.materialize_sfx_for_line(ElevenLabs Sound Effects)music_engine.resolve_music_bed(ElevenLabs Music)audio_mixer.mix_audio(ducking + limiter/normalization)
timeline_builder(build_timeline_from_srt)panel_animatorsubtitle_generatorvideo_editorquality_checker
When WEBTOON_CATALOG_ENABLED=true, startup includes a blocking catalog refresh stage before the API begins serving traffic.
SrtTimelineLine:index,start_sec,end_sec,text,speaker,emotion,intensity.GenerateResponsesuccess includes:video_pathaudio_pathquality(cinematic)
EpisodeRangeGenerateRequestandEpisodeRangeGenerateResponsedrive sequential episode-range runs from local catalog.
Catalog file is stored under outputs/meta/<WEBTOON_CATALOG_FILE>.
Supported operations:
- crawl/refresh catalog
- search by
queryandgenre - list episodes by
title_slug
Ops endpoints:
GET /ops/api/mangaGET /ops/api/manga/{title_slug}/episodesGET /ops/api/manga/statusPOST /ops/api/manga/refreshPOST /ops/api/generate-range
Episode range runs are processed sequentially and return per-episode results.
- Voice is primary.
- Music is scene-level and mixed under voice.
- SFX are event-driven (
hit,fall,fear,movementmapping). - Mixer applies sidechain ducking (music attenuates while voice is present).
- Final limiter avoids clipping.
- No local fallback for ElevenLabs generation.
- SFX/Music/TTS failures raise provider errors and stop run.
- SRT:
SRT_REQUIRE_INPUT,SRT_MIN_LINE_DURATION_SEC,SRT_MAX_LINE_DURATION_SEC - OpenAI analysis:
OPENAI_DIALOGUE_ANALYSIS_ENABLED,OPENAI_DIALOGUE_ANALYSIS_MODEL - ElevenLabs TTS:
ELEVENLABS_TTS_MODEL,ELEVENLABS_OUTPUT_FORMAT,ELEVENLABS_VOICE_* - ElevenLabs SFX:
ELEVENLABS_SFX_ENABLED,ELEVENLABS_SFX_MODEL_ID,ELEVENLABS_SFX_PROMPT_INFLUENCE - ElevenLabs Music:
ELEVENLABS_MUSIC_ENABLED,ELEVENLABS_MUSIC_MODEL_ID,ELEVENLABS_MUSIC_FORCE_INSTRUMENTAL - Mixer:
AUDIO_MIXER_*(voice/sfx/music gain + ducking + normalize) - Optional STS:
ELEVENLABS_STS_* - Catalog:
WEBTOON_CATALOG_* - OpenAI pacing/batching:
OPENAI_OCR_BATCH_SIZE,OPENAI_OCR_BATCH_PACE_SEC,OPENAI_DIALOGUE_BATCH_SIZE,OPENAI_DIALOGUE_BATCH_PACE_SEC
Byte caches are used for:
tts_audioelevenlabs_sfxelevenlabs_musicelevenlabs_sts(when enabled)
INFO logs are emitted for:
- startup blocking catalog refresh begin/end + duration
- crawler progress (genre/title/episode counts and failures)
- sequential episode queue lifecycle
- SRT source selection (
providedvsauto_generated_from_ocr) - dialogue analyzer chunk progress
- OCR pacing intervals and batch progress
- audio pipeline milestones (line generation and mixer begin/end)
cd manga_video_pipeline
python3 -m pytest tests/ -q