Skip to content

fix: MKV subtitle track .(null) extension for KATE and unknown codec IDs#2250

Merged
cfsmp3 merged 3 commits intoCCExtractor:masterfrom
DhanushVarma-2:fix/mkv-vobsub-null-extension
Apr 4, 2026
Merged

fix: MKV subtitle track .(null) extension for KATE and unknown codec IDs#2250
cfsmp3 merged 3 commits intoCCExtractor:masterfrom
DhanushVarma-2:fix/mkv-vobsub-null-extension

Conversation

@DhanushVarma-2
Copy link
Copy Markdown
Contributor

Problem

Closes #972

MKV files with KATE subtitle tracks (or any unrecognized codec ID) produce output filenames like sample_eng.(null) instead of a proper extension.

Root Cause

matroska_track_text_subtitle_id_extensions[] in matroska.h has 7 entries for an 8-value enum (matroska_track_subtitle_codec_id). MATROSKA_TRACK_SUBTITLE_CODEC_ID_KATE sits at index 7 which is out of bounds — UB that reads NULL on most platforms. This NULL then flows into generate_filename_from_track() where strlen(extension) is called, producing .(null) in the output filename.

Fix

  • Add "kate" at index 7 in matroska_track_text_subtitle_id_extensions[] so the array aligns with the enum
  • Add a NULL guard in generate_filename_from_track() as a safety fallback (.bin) for any future unknown codec IDs

Testing

Tested with MKV files containing KATE subtitle tracks — output filename now correctly uses .kate extension instead of .(null).

Dhanush Varma added 2 commits April 2, 2026 04:14
The matroska_track_text_subtitle_id_extensions array had 7 entries for
an 8-value enum, leaving MATROSKA_TRACK_SUBTITLE_CODEC_ID_KATE (index 7)
out of bounds. On most platforms this read NULL, which then caused
strlen(NULL) UB and snprintf to emit .(null) in the output filename.

Two fixes:
- Add "kate" at index 7 in the extensions array so KATE tracks
  produce correct .kate output filenames
- Add a NULL guard in generate_filename_from_track() so any future
  unknown codec ID safely falls back to .bin instead of crashing or
  producing .(null)

Fixes CCExtractor#972
The matroska_track_text_subtitle_id_extensions array had 7 entries for
an 8-value enum, leaving MATROSKA_TRACK_SUBTITLE_CODEC_ID_KATE (index 7)
out of bounds. On most platforms this read NULL, which then caused
strlen(NULL) UB and snprintf to emit .(null) in the output filename.

Two fixes:
- Add "kate" at index 7 in the extensions array so KATE tracks
  produce correct .kate output filenames
- Add a NULL guard in generate_filename_from_track() so any future
  unknown codec ID safely falls back to .bin instead of crashing or
  producing .(null)

Fixes CCExtractor#972
@DhanushVarma-2 DhanushVarma-2 force-pushed the fix/mkv-vobsub-null-extension branch from 4493e5a to 4c1b7b7 Compare April 1, 2026 22:46
The matroska_track_text_subtitle_id_extensions array had 7 entries for
an 8-value enum, leaving MATROSKA_TRACK_SUBTITLE_CODEC_ID_KATE (index 7)
out of bounds. On most platforms this read NULL, which then caused
strlen(NULL) UB and snprintf to emit .(null) in the output filename.

Two fixes:
- Add "kate" at index 7 in the extensions array so KATE tracks
  produce correct .kate output filenames
- Add a NULL guard in generate_filename_from_track() so any future
  unknown codec ID safely falls back to .bin instead of crashing or
  producing .(null)

Fixes CCExtractor#972
@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d56a6be...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 3/7
DVD 3/3
DVR-MS 2/2
General 20/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 77/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit d56a6be...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 22/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit 395f9b3 into CCExtractor:master Apr 4, 2026
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QUESTION] how is extracting subtitles from mkv or mp4 supported?

3 participants