feat: auto-extract multi-language DVB subtitles into per-language files (#447)#2243
feat: auto-extract multi-language DVB subtitles into per-language files (#447)#2243ujjwalr27 wants to merge 2 commits intoCCExtractor:masterfrom
Conversation
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d56a6be...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit d56a6be...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
cfsmp3
left a comment
There was a problem hiding this comment.
Deep Review Results
First off — this is a really well-done PR. The description is excellent, the repro instructions are clear, and the code is clean. This is the quality level we want from all contributors.
What works well
- Feature works correctly: The arte_multiaudio.ts sample now produces both
arte.srt(teletext) andarte_fra.srt(French DVB) in a single pass with no flags needed. - No repeating subtitles: The old attempt at this feature (PRs #1912/#2048/#2051/#2058) had bugs where subtitles repeated or timestamps started at zero. None of those bugs are present here.
- Content is byte-identical to master on all existing single-stream samples — the decoding logic is correct.
- Cleanup fixes are good: The split encoder/decoder cleanup, the
memsetfordec_ctx->prev, and thetranscript_settingsdeep-copy all fix real issues. - Output with
-oflag works correctly. - Tested across 12+ samples (CEA-608, DVB, DVR-MS, ASF, MP4, TS, MPG) — zero content regressions.
Issue found: filename regression on single-DVB-stream files
We ran all 25 CI test cases locally on both master and this PR. On 3 tests, the PR changes the output filename by adding a language suffix (_eng) even when there's only a single DVB stream:
| Test | Master filename | PR filename | Content |
|---|---|---|---|
1020459a86 --autoprogram --out=ttxt |
output.out |
output_eng.txt |
Byte-identical |
85271be4d2 --autoprogram --out=srt --quant 0 |
output.out |
output_eng.srt |
Byte-identical |
85271be4d2 --codec dvbsub --out=spupng |
output.out + output.d/ |
output_eng.xml + output_eng.d/ |
Byte-identical (all 28 PNGs) |
The content is correct — only the filename changes. But this breaks backward compatibility for existing users/scripts that expect the original filename.
Fix: Only add the language suffix when the program has 2 or more DVB subtitle PIDs. Single-DVB-stream recordings should keep the original filename.
Also needed
- Add a CHANGES.TXT entry — this is a user-facing feature.
Everything else looks good. Once the filename issue is fixed, this is ready to merge.
[FEATURE] Auto-extract multi-language DVB subtitles into per-language files
Closes #447
In raising this pull request, I confirm the following (please check boxes):
Reason for this PR:
Sanity check:
Description
Implements #447 — when a DVB/TS recording contains multiple DVB subtitle streams, CCExtractor now automatically detects each stream and writes subtitles to separate files named by ISO-639 language code. No manual configuration or pre-inspection of the file is required.
Before:
After:
No new CLI flags. Fully automatic. Single-stream recordings are unaffected.
Repro Instructions
Test 1 —
arte_multiaudio.ts(from issue #447)Download: https://www.dropbox.com/s/5oaqnjgqq1cqzky/arte_multiaudio.ts?dl=0
The file contains:
deudeu(no bitmap packets in this recording)fraBefore this PR (on
master):After this PR:
Also verified with
--codec dvbsub:Test 2 — DVB-only file with two subtitle streams (
deu+fra)A recording with no teletext, only two DVB subtitle PIDs:
deufraBoth files are produced automatically in a single pass, with no flags or prior knowledge of how many subtitle streams exist.
Implementation
Files changed
src/lib_ccx/ccx_demuxer.hchar lang[4]tocap_infostructsrc/lib_ccx/ts_tables.csrc/lib_ccx/ts_info.clanginupdate_capinfo(); protect DVB streams fromignore_other_stream()src/lib_ccx/lib_ccx.csrc/lib_ccx/general_loop.csrc/rust/src/demuxer/common_types.rslang: [i8; 4]toCapInfosrc/rust/src/ctorust.rslanginFromCType<cap_info>src/rust/src/common.rslanginCType<cap_info>Key design decisions
Per-PID decoders in single-program mode
Each DVB subtitle PID has its own
DVBSubContextwith differentcomposition_id/ancillary_idfrom the PMT. The existing single-decoder model was extended to always create a fresh decoder per DVB PID.Language-tagged output filenames
update_encoder_list_cinfo()usescinfo->langto suffix the output filename, matching existing behaviour for multi-program mode.Separate encoder/decoder cleanup
dinit_libraries()previously matched encoders by program number inside the decoder loop — with multiple DVB encoders sharing the same program number this caused double-free on exit. Fixed by splitting into two independent passes.dec_ctx->prevzero-initializationdec_ctx->prevwasmalloc'd but notmemset;free_decoder_context()during cleanup freed garbage pointers. Fixed withmemset(prev, 0, sizeof(...)).