[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing by gaurav02081 · Pull Request #2191 · CCExtractor/ccextractor

gaurav02081 · 2026-03-08T19:13:10Z

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.
I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

Add optional FFmpeg-based MP4 parser as an alternative to GPAC

This PR introduces an alternative MP4 parsing backend using FFmpeg's libavformat, while keeping the existing GPAC-based implementation unchanged and as the default.

Motivation

In a previous discussion (Gsoc meeting 2 MARCH) we talked about updating the GPAC dependency used for MP4 processing in CCExtractor. One suggestion was to explore whether there is a Debian-friendly alternative rather than only focusing on upgrading GPAC.

FFmpeg is already used in other parts of the codebase (for example in the demuxing/decoding integration and in the HardsubX module), so extending its use for MP4 parsing seemed like a reasonable option to explore.

Implementation

A new implementation (mp4_ffmpeg.c) was added which uses FFmpeg's libavformat to open and parse MP4 containers.

The general workflow is:

Open the MP4 container with avformat_open_input()
Discover streams using avformat_find_stream_info()
Read packets sequentially using av_read_frame()
Dispatch packets based on stream type

Video packets (H.264 / HEVC) are passed to the existing do_NAL() processing logic, while caption tracks (CEA-608 / CEA-708) and subtitle tracks (tx3g) continue to use the existing CCExtractor parsing functions.

One difference from the GPAC implementation is that FFmpeg reads packets sequentially across all streams, whereas the GPAC implementation reads samples per track. The downstream caption extraction pipeline remains unchanged.

For H.264 / HEVC streams, codec configuration data is obtained from the stream extradata (avcC / hvcC) in order to determine the NAL unit length prefix size and extract SPS/PPS before processing packets.

Build configuration

This backend is optional and controlled through a compile-time flag:

-DUSE_FFMPEG_MP4=ON

Default build → uses GPAC (mp4.c)
FFmpeg build → uses the new implementation (mp4_ffmpeg.c)

The runtime behavior of CCExtractor remains unchanged — the difference only affects how the MP4 container is parsed internally.

Summary

This PR:

Adds an FFmpeg-based MP4 parser
Keeps GPAC as the default implementation
Introduces a compile-time option to switch between the two backends
Leaves the caption extraction pipeline unchanged

This provides a potential alternative MP4 backend using a widely available multimedia framework while preserving the existing behavior.

gaurav02081 · 2026-03-09T18:06:20Z

Update: Added a new CI job to build CCExtractor with the optional FFmpeg MP4 backend.

The workflow now performs two builds:

Default build using the GPAC implementation

FFmpeg build using -DUSE_FFMPEG_MP4=ON

Both builds run --version to verify the binaries execute correctly, and separate build directories are used to avoid CMake cache conflicts.

cfsmp3 · 2026-03-14T18:15:53Z

Thanks for the PR — the implementation is clean and well-structured, and the dedicated CI job is a nice touch.

However, we're actively moving the codebase from C to Rust, so we can't accept new C modules. We'd need the FFmpeg MP4 demuxer to be implemented in Rust. See #2170 for an example of how this can be done using rsmpeg (Rust FFmpeg bindings) with a thin C bridge for the callbacks into the existing C code.

If you'd like to rework this in Rust we'd be happy to review it. The overall approach (avformat_open_input → av_read_frame → dispatch by stream type) is sound, it's just the implementation language that needs to change.

gaurav02081 · 2026-03-15T21:01:54Z

FFmpeg-based implementation written in Rust using rsmpeg, activated via -DWITH_FFMPEG=ON. The GPAC path
remains the default and is completely untouched when FFmpeg is not enabled.

Architecture

The implementation follows the same 3-layer pattern as PR #2170:

= Layer 1 — Rust Core (src/rust/src/demuxer/mp4.rs)

Opens MP4 via rsmpeg::AVFormatContextInput
Classifies streams: AVC/H.264, HEVC/H.265, CEA-608, CEA-708, tx3g
Parses SPS/PPS/VPS from codec extradata
Main av_read_frame loop dispatches packets to C bridge functions

= Layer 2 — FFI Exports (src/rust/src/mp4_ffmpeg_exports.rs)

ccxr_processmp4() — replaces processmp4()
ccxr_dumpchapters() — replaces dumpchapters()

= Layer 3 — C Bridge (src/lib_ccx/mp4_rust_bridge.c)

Flat FFI-safe wrappers around existing C processing functions
ccx_mp4_process_avc_sample() → NAL parsing via do_NAL()
ccx_mp4_process_hevc_sample() → NAL parsing + store_hdcc() flush
ccx_mp4_process_cc_packet() → CEA-608 via process608(), CEA-708 via ccdp_find_data() + ccxr_dtvcc_process_data()
ccx_mp4_process_tx3g_packet() → 3GPP timed text
No GPAC structs cross the FFI boundary — only primitive types

Build System

cmake -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src

ENABLE_FFMPEG_MP4 compile flag gates all new code
Cargo feature enable_mp4_ffmpeg pulls in rsmpeg
Corrosion passes the feature to Cargo when WITH_FFMPEG is on
FFmpeg libs are added as INTERFACE_LINK_LIBRARIES of ccx_rust to ensure correct GNU ld link order
Bridge functions use --undefined linker flags (same pattern as existing decode_vbi/do_cb/store_hdcc)

Files

New (5):

src/rust/src/demuxer/mp4.rs         Rust FFmpeg demuxer core      
 src/rust/src/mp4_ffmpeg_exports.rs  C-callable FFI exports        
 src/lib_ccx/mp4_rust_bridge.c       Thin C bridge                 
 src/lib_ccx/mp4_rust_bridge.h       Bridge header                 
 src/lib_ccx/ccx_gpac_types.h        Minimal GPAC-compatible types

Modified (13):```
Cargo.toml, build.rs, lib.rs, demuxer/mod.rs, wrapper.h, rust/CMakeLists.txt, src/CMakeLists.txt, lib_ccx/CMakeLists.txt, ccx_mp4.h,
ccextractor.c, build_linux.yml, build_mac.yml


  Testing

  - GPAC-only build (cmake ../src) — compiles and links, no regression
  - FFmpeg build (cmake -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src) — compiles and links
  - Runtime: ./ccextractor tests/samples/BBC1.mp4 produces identical output to GPAC path
  - CI: Linux and macOS cmake_ffmpeg_mp4 jobs added

gaurav02081 · 2026-03-29T10:18:22Z

@cfsmp3 Hi,

this PR is failing in the Linux environment (in SP) so i tested it and found The CI Linux (sample platform) failures are pre-existing on master and not introduced by this PR.

All MP4 tests (3/3) pass.

cfsmp3

Comprehensive Comparison Review (Apr 18)

We tested this PR alongside #2170 (DhanushVarma-2), which implements the same FFmpeg MP4 demuxer feature. Both were built with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and tested against all 36 MP4/MOV samples in our collection.

Key finding: both PRs produce byte-identical output

#2170 and #2191 use the same rsmpeg backend and produce exactly the same output on all 36 samples. Neither is functionally superior to the other.

Critical: caption extraction gaps vs GPAC

The FFmpeg path is NOT a drop-in replacement for GPAC. 7 samples lose captions:

Sample	GPAC	FFmpeg (#2170 and #2191)	Issue
`132d7df7e993.mov`	108KB	104B	CEA-608 in H.264 SEI — no separate subtitle track
`1974a299f050.mov`	127KB	104B	Same — captions embedded in video NALs
`99e5eaafdc55.mov`	164KB	104B	Same
`8849331ddae9.mp4`	48KB (2352 lines, clean)	6KB (227 lines, garbled `÷` chars)	c608 track exists but decoded as raw bytes, not through CEA-608 decoder
`b2771c84c2a3.mp4`	2.6KB (130 lines)	284B (11 lines)	Same c608 garbling
`5df914ce773d.mp4`	1KB	0B	c608 track present but entirely missed
`1f3e951d516b.mp4`	5KB	0B	dvdsub (bitmap) in MP4 — not supported

The remaining 28 samples produce identical output (including 0B on both).

Three bugs to fix

Bug 1 — CEA-608 in H.264 SEI (3 MOV samples): These files have no separate subtitle track — captions are embedded as SEI user data in the H.264 video stream NAL units. GPAC handles this via process_avc_sample() which parses NALs. The FFmpeg demuxer needs to do the same — scan AVC samples for SEI-embedded CEA-608 data.

Bug 2 — c608 track garbled (2 samples): The c608 subtitle track exists and is found, but the data comes out as raw CC bytes (÷÷ ÷ ÷HI÷, ÷TH÷ER÷E.÷) instead of decoded text. The data needs to go through the CEA-608 character decoder, not be dumped as raw bytes.

Bug 3 — c608 track missed (1 sample): 5df914ce773d.mp4 has a c608 track visible to ffprobe but FFmpeg extracts nothing. Investigate why.

(dvdsub/bitmap in MP4 can be documented as a known limitation for now.)

PR-specific issues for #2191

Compared to #2170:

More verbose (1019 lines vs 437) — more code to maintain for identical functionality
12 commits with many linker fix iterations (vs 5 clean commits in #2170) — suggests the build integration was iterated rather than designed
Cargo.lock in diff — should not be committed, remove it
Still has ENABLE_HARDSUBX coupling — WITH_FFMPEG still defines ENABLE_HARDSUBX in lib_ccx/CMakeLists.txt. Our #2259 (merged) fixed this on master. You need to rebase and drop the coupling.
CI job is a plus — the cmake_ffmpeg_mp4 job in both Linux and Mac workflows is genuinely useful. #2170 doesn't have this.
CMake approach is cleaner — INTERFACE_LINK_LIBRARIES on the ccx_rust target is the right CMake idiom vs #2170's --no-as-needed hack
Bridge is much larger (306 lines vs 54) — worth investigating if the extra code adds value or is redundant

Samples tested

All 36 MP4/MOV/M4V files from ~/media_samples/completed/ and ~/media_samples/failed/, including files up to 4.5GB. No files were skipped.

Adds a cmake_ffmpeg_mp4 job to both build_linux.yml and build_mac.yml that configures CCExtractor with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and builds it, so the FFmpeg-based MP4 demuxer path is exercised on every PR. Linux job pulls the full set of ffmpeg/tesseract/leptonica dev packages (including libavdevice-dev); macOS job installs the corresponding Homebrew bottles.

Add the compile-time option to build CCExtractor's MP4 demuxer on top of libavformat: set -DENABLE_FFMPEG_MP4 and pull libswresample as a required dependency alongside libavformat/libavutil/libavcodec/ libavfilter/libswscale when -DWITH_FFMPEG=ON. Link-order handling. Corrosion places ccx_rust at the end of the ccextractor link line. On Linux (GNU ld), ccx_rust contains rsmpeg which references FFmpeg symbols like swr_get_out_samples, so the FFmpeg shared libs must appear after ccx_rust. Collect them into a separate EXTRA_FFMPEG_LIBS variable and attach them as INTERFACE_LINK_LIBRARIES on the ccx_rust target so CMake emits them right after ccx_rust. Make ccx's own link dependencies PRIVATE so the same libs don't propagate earlier and get deduplicated against the INTERFACE copy. Force bridge symbols to be pulled from libccx. GNU ld only pulls object files from a static archive when they resolve currently-needed symbols. Bridge functions in libccx.a (ccx_mp4_process_nal_sample, ccx_mp4_process_cc_packet, etc.) aren't needed until libccx_rust.a is processed, but libccx.a precedes it on the command line. Use -Wl,--undefined=<symbol> on ccextractor for each bridge entry point so the linker pulls them early — same pattern already used for decode_vbi/do_cb/store_hdcc. Rust crate wiring. Add the optional rsmpeg dependency guarded behind the enable_mp4_ffmpeg feature, with platform-specific feature flags (link_system_ffmpeg on Linux + macOS, link_vcpkg_ffmpeg on Windows, all using the ffmpeg7 bindings). Extend bindgen's wrapper.h to expose the bridge headers so Rust can call back into ccx from mp4_rust_bridge.c.

Alternative MP4 demuxing backend built on rsmpeg (Rust FFmpeg bindings) that matches GPAC's caption output on every sample reviewed. Activated via -DWITH_FFMPEG=ON at compile time; default build keeps the GPAC path untouched. Architecture - src/rust/src/demuxer/mp4.rs: opens the MP4 with rsmpeg, classifies tracks (AVC, HEVC, c608, c708, tx3g), and drives packet dispatch. - src/rust/src/mp4_ffmpeg_exports.rs: #[no_mangle] entry points (ccxr_processmp4, ccxr_dumpchapters) called from ccextractor.c. - src/lib_ccx/mp4_rust_bridge.c/.h: thin C shim around do_NAL, process608, process_cc_data, ccdp_find_data, store_hdcc, and encode_sub so the Rust side can feed decoded payloads into CCExtractor's existing CEA-608/708 pipeline. Caption parity GPAC-equivalent output on all six samples from the Apr 18 review: - 132d7df7e993.mov 108 290 B byte-identical - 1974a299f050.mov 127 828 B byte-identical - 99e5eaafdc55.mov 164 099 B byte-identical - 8849331ddae9.mp4 48 485 B identical size, content, caption count; uniform ~2 ms timing shift - b2771c84c2a3.mp4 2 607 B byte-identical - 5df914ce773d.mp4 1 164 B byte-identical SEI-embedded CEA-608 in H.264 video The last caption finishing on the final sample was never encoded because the interleaved av_read_frame loop exited without an equivalent of GPAC's per-track encode_sub. Drain sub.got_output at EOF. Intentionally avoid calling process_hdcc there: the last IDR's slice_header already flushed the HD-CC buffer, and re-running process_hdcc re-emits partial post-IDR caption state as trailing garbage. c608 payload handling libavformat delivers c608/c708 samples in two shapes: 1) atom-wrapped raw 608 pairs — [u32 length][4cc cdat|cdt2|ccdp] [payload]. Strip the 8-byte header to match GPAC's process_clcp. 2) bare cc_data triplets — [cc_info][b1][b2]. Detect via len % 3 == 0 and (payload[0] & 0xF8) == 0xF8. For c608 tracks, extract each field-1/field-2 pair from the triplet, set dec_ctx->current_field so process608 picks the right decoder context, and call process608 directly. Routing through do_cb would hit its CCX_H264 guard (set by interleaved H.264 packets) and suppress the cb_field increments process608 relies on for caption-boundary timing, which merged short captions into their successors. For c708, keep ccdp_find_data + process_cc_data. Bridge design Single unified entry point ccx_mp4_process_nal_sample(..., int is_hevc, ...) replaces the separate AVC/HEVC helpers — their NAL iteration was ~90% identical. Uses utility.h's RB16/RB32 macros instead of hand-rolled byte-swap helpers. Handles HEVC's end-of-sample cc_data flush inline. Supported and unsupported tracks are documented in the mp4.rs module header. Known limitation: dvdsub/bitmap subtitles in MP4 are not decoded (neither GPAC nor this backend handles them).

The format_rust CI job runs `cargo clippy --lib -- -D warnings` and rust-1.95.0 promoted a handful of lints that hadn't been tripped on master before this branch went through CI. Address all six: collapsible_match (4 sites): - src/demuxer/demux.rs: fold the nested `if matches!(...)` inside the `Some(false) =>` arm into a match guard on the arm itself. - src/encoder/common.rs: three header-write arms (Ccd, Scc, Raw) each had a nested `if write_raw(...) == -1 { return -1; }`. Collapse each into a match guard; the body now just logs and returns. unnecessary_cast (2 sites): - src/libccxr_exports/demuxer.rs: drop `as i64` from demux_ctx.get_filesize() — it already returns i64. - src/parser.rs: drop `as u64` from `t as u64` when writing to UTC_REFVALUE — t is already u64. Also two lints in the new MP4 demuxer file from this branch: - if_same_then_else at src/demuxer/mp4.rs:198 — the FOURCC_TX3G and AV_CODEC_ID_MOV_TEXT arms both produced TrackType::Tx3g; same for FOURCC_C608 and AV_CODEC_ID_EIA_608 → TrackType::Cea608. Fold each pair into a single `||` branch. - manual_is_multiple_of at src/demuxer/mp4.rs:399 — `packet_count % 100 == 0` → `packet_count.is_multiple_of(100)`. No behavior change. Caption parity against GPAC on all six mentor samples verified post-fix.

ccextractor-bot · 2026-04-19T00:38:40Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit c8932da...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	2/7
DVD	3/3
DVR-MS	2/2
General	22/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	77/86
Teletext	20/21
WTV	13/13
XDS	31/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --out=spupng c83f765c66...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8738
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8738
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8738
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8738
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8926
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service all da904de35d..., Last passed:
Test 8738
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8738
ccextractor --autoprogram --out=srt --latin1 f1422b8bfe..., Last passed:
Test 9208
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8738
ccextractor --service 1 c83f765c66..., Last passed:
Test 8738
ccextractor --myth c83f765c66..., Last passed:
Test 8738
ccextractor --in=raw fb79021542..., Last passed:
Test 8738
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8738
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8738
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8738

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

ccextractor-bot · 2026-04-19T01:02:33Z

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit c8932da...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	3/7
DVD	3/3
DVR-MS	2/2
General	22/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	20/21
WTV	13/13
XDS	31/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=srt --latin1 b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8611
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8611
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8611
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8611
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8943
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service all da904de35d..., Last passed:
Test 8611
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8611
ccextractor --autoprogram --out=srt --latin1 f1422b8bfe..., Last passed:
Test 9209
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8611
ccextractor --service 1 c83f765c66..., Last passed:
Test 8611
ccextractor --myth c83f765c66..., Last passed:
Test 8611
ccextractor --in=raw fb79021542..., Last passed:
Test 8611
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8611
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8611
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8611

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
ccextractor --out=spupng c83f765c66..., Last passed: Never
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

gaurav02081 · 2026-04-19T11:04:34Z

@cfsmp3 Thanks for the detailed review.

I have addressed all three reported issues and the PR-specific points.

BUG fix

== Bug 1 — CEA-608 in H.264 SEI (MOV samples)**
Fixed by draining sub->got_outputat EOF in the FFmpeg path.
All three MOV samples are now byte-identical to GPAC.

== Bug 2 — c608 garbled output (MP4 samples)**
Two issues were handled:

Strip ISO BMFF wrapper (cdat/cdt2/ccdp) before decoding
Handle cc_data triplet format ([cc_info][b1][b2]) directly via process608 the correct field routing

Results:

b2771c84c2a3.mp4 → byte-identical
8849331ddae9.mp4 → identical content and caption count (uniform ~2 ms timing offset noted below)

== Bug 3 — c608 track missed**
Covered by triplet handling → now byte-identical

dvdsub (bitmap subtitles)
Not supported (same as GPAC). Documented in the mp4.rsmodule doc comment.

PR cleanup (used git squash to bifurcate the commits into 3 parts)

Commit history reduced 12 → 3 clean commits
Cargo.lock cleaned (libc dependency removed via std::fs)
ENABLE_HARDSUBX decoupled (rebased onto latest master)
CI job for FFmpeg MP4 build kept and passing (Linux + macOS)
INTERFACE_LINK_LIBRARIES approach retained

Bridge design note

I kept the FFmpeg bridge self-contained, calling only public CCExtractor APIs (do_NAL, process608, process_cc_data, etc.), without exposing GPAC internals.

An alternative approach (as in #2170) reduces bridge size by reusing `mp4.c 'static helpers', but that couples the FFmpeg path back to GPAC. I preferred keeping the two backends independent.

Known difference vs GPAC

One sample (8849331ddae9.mp4) has a uniform ~2 ms timestamp offset
Caption content, count, and ordering are identical
Rooted in shared `set_fts "timing logic" (not MP4-specific)

Fixing this would require changes in shared timing infrastructure, so I have left it as-is for now.

`sampleplatform.ccextractor.org` CI status

The two red test/9266 (Linux) and test/9267 (Windows) checks show the same failing categories as master (same ratios as recent merged PRs like #2266 / #2267). The bot comment on this PR also notes these failures are present on master.
MP4, the category this PR implements, passes on both sides. All GitHub Actions CI checks are green.

Verification

Tested against all 6 samples from the review in my local
5/6 byte-identical, 1 with minor timing offset
No caption loss in any sample

Looking forward to your feedback.

cfsmp3 · 2026-04-19T17:09:31Z

Timestamp note

We investigated the ~2ms timing offset on 8849331ddae9.mp4. GPAC is correct, FFmpeg is wrong.

The MP4 container's stts box has DTS=113113 for the first subtitle sample (timescale 30000). GPAC reads this directly → 3771ms. FFmpeg reports PTS=113050 (63 ticks short) → 3769ms. The loss happens inside av_rescale_q() when FFmpeg rescales timestamps internally.

This affects all 492 subtitle timestamps uniformly (~2ms early). Content is identical. Full analysis documented in our repo at plans/FFMPEG_TIMESTAMP_ROUNDING.md. May be worth reporting upstream to FFmpeg.

Not a blocker for this PR — just documenting.

gaurav02081 · 2026-04-19T17:52:05Z

Thanks for the review, will raise an issue for this in ffmpeg.

gaurav02081 force-pushed the gaurav-ffmpeg branch 3 times, most recently from c83ebd9 to 1d96d2c Compare March 9, 2026 17:49

gaurav02081 force-pushed the gaurav-ffmpeg branch from d883224 to e1193d1 Compare March 15, 2026 17:57

cfsmp3 mentioned this pull request Apr 18, 2026

feat: replace GPAC with FFmpeg for MP4 demuxing #2170

Closed

10 tasks

cfsmp3 requested changes Apr 18, 2026

View reviewed changes

gaurav02081 added 3 commits April 19, 2026 04:54

gaurav02081 force-pushed the gaurav-ffmpeg branch from 95ba7f4 to a05c8ee Compare April 18, 2026 23:44

cfsmp3 merged commit e4443a7 into CCExtractor:master Apr 19, 2026
26 of 28 checks passed

cfsmp3 mentioned this pull request Apr 20, 2026

Fix/tx3g timestamp bug #2199

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191
cfsmp3 merged 4 commits intoCCExtractor:masterfrom
gaurav02081:gaurav-ffmpeg

gaurav02081 commented Mar 8, 2026 •

edited

Loading

Uh oh!

gaurav02081 commented Mar 9, 2026

Uh oh!

cfsmp3 commented Mar 14, 2026

Uh oh!

gaurav02081 commented Mar 15, 2026

Uh oh!

gaurav02081 commented Mar 29, 2026

Uh oh!

cfsmp3 left a comment

Uh oh!

ccextractor-bot commented Apr 19, 2026

Uh oh!

ccextractor-bot commented Apr 19, 2026

Uh oh!

gaurav02081 commented Apr 19, 2026

Uh oh!

cfsmp3 commented Apr 19, 2026

Uh oh!

gaurav02081 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gaurav02081 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add optional FFmpeg-based MP4 parser as an alternative to GPAC

Motivation

Implementation

Build configuration

Summary

Uh oh!

gaurav02081 commented Mar 9, 2026

Uh oh!

cfsmp3 commented Mar 14, 2026

Uh oh!

gaurav02081 commented Mar 15, 2026

Uh oh!

gaurav02081 commented Mar 29, 2026

Uh oh!

cfsmp3 left a comment

Choose a reason for hiding this comment

Comprehensive Comparison Review (Apr 18)

Key finding: both PRs produce byte-identical output

Critical: caption extraction gaps vs GPAC

Three bugs to fix

PR-specific issues for #2191

Samples tested

Uh oh!

ccextractor-bot commented Apr 19, 2026

Uh oh!

ccextractor-bot commented Apr 19, 2026

Uh oh!

gaurav02081 commented Apr 19, 2026

BUG fix

PR cleanup (used git squash to bifurcate the commits into 3 parts)

Bridge design note

Known difference vs GPAC

sampleplatform.ccextractor.org CI status

Verification

Uh oh!

cfsmp3 commented Apr 19, 2026

Timestamp note

Uh oh!

gaurav02081 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gaurav02081 commented Mar 8, 2026 •

edited

Loading

`sampleplatform.ccextractor.org` CI status