[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191
[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191cfsmp3 merged 4 commits intoCCExtractor:masterfrom
Conversation
c83ebd9 to
1d96d2c
Compare
|
Update: Added a new CI job to build CCExtractor with the optional FFmpeg MP4 backend. The workflow now performs two builds: Default build using the GPAC implementation FFmpeg build using -DUSE_FFMPEG_MP4=ON Both builds run --version to verify the binaries execute correctly, and separate build directories are used to avoid CMake cache conflicts. |
|
Thanks for the PR — the implementation is clean and well-structured, and the dedicated CI job is a nice touch. However, we're actively moving the codebase from C to Rust, so we can't accept new C modules. We'd need the FFmpeg MP4 demuxer to be implemented in Rust. See #2170 for an example of how this can be done using rsmpeg (Rust FFmpeg bindings) with a thin C bridge for the callbacks into the existing C code. If you'd like to rework this in Rust we'd be happy to review it. The overall approach (avformat_open_input → av_read_frame → dispatch by stream type) is sound, it's just the implementation language that needs to change. |
d883224 to
e1193d1
Compare
|
FFmpeg-based implementation written in Rust using rsmpeg, activated via -DWITH_FFMPEG=ON. The GPAC path Architecture The implementation follows the same 3-layer pattern as PR #2170: = Layer 1 — Rust Core (src/rust/src/demuxer/mp4.rs)
= Layer 2 — FFI Exports (src/rust/src/mp4_ffmpeg_exports.rs)
= Layer 3 — C Bridge (src/lib_ccx/mp4_rust_bridge.c)
Build System
Files New (5): Modified (13):``` |
|
@cfsmp3 Hi, this PR is failing in the Linux environment (in SP) so i tested it and found The CI Linux (sample platform) failures are pre-existing on master and not introduced by this PR. All MP4 tests (3/3) pass. |
cfsmp3
left a comment
There was a problem hiding this comment.
Comprehensive Comparison Review (Apr 18)
We tested this PR alongside #2170 (DhanushVarma-2), which implements the same FFmpeg MP4 demuxer feature. Both were built with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and tested against all 36 MP4/MOV samples in our collection.
Key finding: both PRs produce byte-identical output
#2170 and #2191 use the same rsmpeg backend and produce exactly the same output on all 36 samples. Neither is functionally superior to the other.
Critical: caption extraction gaps vs GPAC
The FFmpeg path is NOT a drop-in replacement for GPAC. 7 samples lose captions:
| Sample | GPAC | FFmpeg (#2170 and #2191) | Issue |
|---|---|---|---|
132d7df7e993.mov |
108KB | 104B | CEA-608 in H.264 SEI — no separate subtitle track |
1974a299f050.mov |
127KB | 104B | Same — captions embedded in video NALs |
99e5eaafdc55.mov |
164KB | 104B | Same |
8849331ddae9.mp4 |
48KB (2352 lines, clean) | 6KB (227 lines, garbled ÷ chars) |
c608 track exists but decoded as raw bytes, not through CEA-608 decoder |
b2771c84c2a3.mp4 |
2.6KB (130 lines) | 284B (11 lines) | Same c608 garbling |
5df914ce773d.mp4 |
1KB | 0B | c608 track present but entirely missed |
1f3e951d516b.mp4 |
5KB | 0B | dvdsub (bitmap) in MP4 — not supported |
One sample works BETTER with FFmpeg:
| ad9f9e03240e.m4v | 0B | 51KB | FFmpeg extracts what GPAC can't |
The remaining 28 samples produce identical output (including 0B on both).
Three bugs to fix
Bug 1 — CEA-608 in H.264 SEI (3 MOV samples): These files have no separate subtitle track — captions are embedded as SEI user data in the H.264 video stream NAL units. GPAC handles this via process_avc_sample() which parses NALs. The FFmpeg demuxer needs to do the same — scan AVC samples for SEI-embedded CEA-608 data.
Bug 2 — c608 track garbled (2 samples): The c608 subtitle track exists and is found, but the data comes out as raw CC bytes (÷÷ ÷ ÷HI÷, ÷TH÷ER÷E.÷) instead of decoded text. The data needs to go through the CEA-608 character decoder, not be dumped as raw bytes.
Bug 3 — c608 track missed (1 sample): 5df914ce773d.mp4 has a c608 track visible to ffprobe but FFmpeg extracts nothing. Investigate why.
(dvdsub/bitmap in MP4 can be documented as a known limitation for now.)
PR-specific issues for #2191
Compared to #2170:
- More verbose (1019 lines vs 437) — more code to maintain for identical functionality
- 12 commits with many linker fix iterations (vs 5 clean commits in #2170) — suggests the build integration was iterated rather than designed
Cargo.lockin diff — should not be committed, remove it- Still has
ENABLE_HARDSUBXcoupling —WITH_FFMPEGstill definesENABLE_HARDSUBXinlib_ccx/CMakeLists.txt. Our #2259 (merged) fixed this on master. You need to rebase and drop the coupling. - CI job is a plus — the
cmake_ffmpeg_mp4job in both Linux and Mac workflows is genuinely useful. #2170 doesn't have this. - CMake approach is cleaner —
INTERFACE_LINK_LIBRARIESon the ccx_rust target is the right CMake idiom vs #2170's--no-as-neededhack - Bridge is much larger (306 lines vs 54) — worth investigating if the extra code adds value or is redundant
Samples tested
All 36 MP4/MOV/M4V files from ~/media_samples/completed/ and ~/media_samples/failed/, including files up to 4.5GB. No files were skipped.
Adds a cmake_ffmpeg_mp4 job to both build_linux.yml and build_mac.yml that configures CCExtractor with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and builds it, so the FFmpeg-based MP4 demuxer path is exercised on every PR. Linux job pulls the full set of ffmpeg/tesseract/leptonica dev packages (including libavdevice-dev); macOS job installs the corresponding Homebrew bottles.
Add the compile-time option to build CCExtractor's MP4 demuxer on top of libavformat: set -DENABLE_FFMPEG_MP4 and pull libswresample as a required dependency alongside libavformat/libavutil/libavcodec/ libavfilter/libswscale when -DWITH_FFMPEG=ON. Link-order handling. Corrosion places ccx_rust at the end of the ccextractor link line. On Linux (GNU ld), ccx_rust contains rsmpeg which references FFmpeg symbols like swr_get_out_samples, so the FFmpeg shared libs must appear after ccx_rust. Collect them into a separate EXTRA_FFMPEG_LIBS variable and attach them as INTERFACE_LINK_LIBRARIES on the ccx_rust target so CMake emits them right after ccx_rust. Make ccx's own link dependencies PRIVATE so the same libs don't propagate earlier and get deduplicated against the INTERFACE copy. Force bridge symbols to be pulled from libccx. GNU ld only pulls object files from a static archive when they resolve currently-needed symbols. Bridge functions in libccx.a (ccx_mp4_process_nal_sample, ccx_mp4_process_cc_packet, etc.) aren't needed until libccx_rust.a is processed, but libccx.a precedes it on the command line. Use -Wl,--undefined=<symbol> on ccextractor for each bridge entry point so the linker pulls them early — same pattern already used for decode_vbi/do_cb/store_hdcc. Rust crate wiring. Add the optional rsmpeg dependency guarded behind the enable_mp4_ffmpeg feature, with platform-specific feature flags (link_system_ffmpeg on Linux + macOS, link_vcpkg_ffmpeg on Windows, all using the ffmpeg7 bindings). Extend bindgen's wrapper.h to expose the bridge headers so Rust can call back into ccx from mp4_rust_bridge.c.
Alternative MP4 demuxing backend built on rsmpeg (Rust FFmpeg bindings)
that matches GPAC's caption output on every sample reviewed. Activated
via -DWITH_FFMPEG=ON at compile time; default build keeps the GPAC
path untouched.
Architecture
- src/rust/src/demuxer/mp4.rs: opens the MP4 with rsmpeg, classifies
tracks (AVC, HEVC, c608, c708, tx3g), and drives packet dispatch.
- src/rust/src/mp4_ffmpeg_exports.rs: #[no_mangle] entry points
(ccxr_processmp4, ccxr_dumpchapters) called from ccextractor.c.
- src/lib_ccx/mp4_rust_bridge.c/.h: thin C shim around do_NAL,
process608, process_cc_data, ccdp_find_data, store_hdcc, and
encode_sub so the Rust side can feed decoded payloads into
CCExtractor's existing CEA-608/708 pipeline.
Caption parity
GPAC-equivalent output on all six samples from the Apr 18 review:
- 132d7df7e993.mov 108 290 B byte-identical
- 1974a299f050.mov 127 828 B byte-identical
- 99e5eaafdc55.mov 164 099 B byte-identical
- 8849331ddae9.mp4 48 485 B identical size, content, caption
count; uniform ~2 ms timing shift
- b2771c84c2a3.mp4 2 607 B byte-identical
- 5df914ce773d.mp4 1 164 B byte-identical
SEI-embedded CEA-608 in H.264 video
The last caption finishing on the final sample was never encoded
because the interleaved av_read_frame loop exited without an
equivalent of GPAC's per-track encode_sub. Drain sub.got_output at
EOF. Intentionally avoid calling process_hdcc there: the last IDR's
slice_header already flushed the HD-CC buffer, and re-running
process_hdcc re-emits partial post-IDR caption state as trailing
garbage.
c608 payload handling
libavformat delivers c608/c708 samples in two shapes:
1) atom-wrapped raw 608 pairs — [u32 length][4cc cdat|cdt2|ccdp]
[payload]. Strip the 8-byte header to match GPAC's process_clcp.
2) bare cc_data triplets — [cc_info][b1][b2]. Detect via len % 3 == 0
and (payload[0] & 0xF8) == 0xF8. For c608 tracks, extract each
field-1/field-2 pair from the triplet, set dec_ctx->current_field
so process608 picks the right decoder context, and call
process608 directly. Routing through do_cb would hit its
CCX_H264 guard (set by interleaved H.264 packets) and suppress
the cb_field increments process608 relies on for
caption-boundary timing, which merged short captions into their
successors. For c708, keep ccdp_find_data + process_cc_data.
Bridge design
Single unified entry point ccx_mp4_process_nal_sample(..., int
is_hevc, ...) replaces the separate AVC/HEVC helpers — their NAL
iteration was ~90% identical. Uses utility.h's RB16/RB32 macros
instead of hand-rolled byte-swap helpers. Handles HEVC's
end-of-sample cc_data flush inline.
Supported and unsupported tracks are documented in the mp4.rs
module header. Known limitation: dvdsub/bitmap subtitles in MP4
are not decoded (neither GPAC nor this backend handles them).
95ba7f4 to
a05c8ee
Compare
The format_rust CI job runs `cargo clippy --lib -- -D warnings` and
rust-1.95.0 promoted a handful of lints that hadn't been tripped on
master before this branch went through CI. Address all six:
collapsible_match (4 sites):
- src/demuxer/demux.rs: fold the nested `if matches!(...)` inside
the `Some(false) =>` arm into a match guard on the arm itself.
- src/encoder/common.rs: three header-write arms (Ccd, Scc, Raw)
each had a nested `if write_raw(...) == -1 { return -1; }`.
Collapse each into a match guard; the body now just logs and
returns.
unnecessary_cast (2 sites):
- src/libccxr_exports/demuxer.rs: drop `as i64` from
demux_ctx.get_filesize() — it already returns i64.
- src/parser.rs: drop `as u64` from `t as u64` when writing to
UTC_REFVALUE — t is already u64.
Also two lints in the new MP4 demuxer file from this branch:
- if_same_then_else at src/demuxer/mp4.rs:198 — the FOURCC_TX3G
and AV_CODEC_ID_MOV_TEXT arms both produced TrackType::Tx3g;
same for FOURCC_C608 and AV_CODEC_ID_EIA_608 → TrackType::Cea608.
Fold each pair into a single `||` branch.
- manual_is_multiple_of at src/demuxer/mp4.rs:399 —
`packet_count % 100 == 0` → `packet_count.is_multiple_of(100)`.
No behavior change. Caption parity against GPAC on all six mentor
samples verified post-fix.
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit c8932da...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit c8932da...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
|
@cfsmp3 Thanks for the detailed review. I have addressed all three reported issues and the PR-specific points. BUG fix== Bug 1 — CEA-608 in H.264 SEI (MOV samples)** == Bug 2 — c608 garbled output (MP4 samples)**
Results:
== Bug 3 — c608 track missed**
PR cleanup (used git squash to bifurcate the commits into 3 parts)
Bridge design noteI kept the FFmpeg bridge self-contained, calling only public CCExtractor APIs ( An alternative approach (as in #2170) reduces bridge size by reusing `mp4.c 'static helpers', but that couples the FFmpeg path back to GPAC. I preferred keeping the two backends independent. Known difference vs GPAC
Fixing this would require changes in shared timing infrastructure, so I have left it as-is for now.
|
Timestamp noteWe investigated the ~2ms timing offset on The MP4 container's This affects all 492 subtitle timestamps uniformly (~2ms early). Content is identical. Full analysis documented in our repo at Not a blocker for this PR — just documenting. |
|
Thanks for the review, will raise an issue for this in ffmpeg. |
[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Add optional FFmpeg-based MP4 parser as an alternative to GPAC
This PR introduces an alternative MP4 parsing backend using FFmpeg's libavformat, while keeping the existing GPAC-based implementation unchanged and as the default.
Motivation
In a previous discussion (Gsoc meeting 2 MARCH) we talked about updating the GPAC dependency used for MP4 processing in CCExtractor. One suggestion was to explore whether there is a Debian-friendly alternative rather than only focusing on upgrading GPAC.
FFmpeg is already used in other parts of the codebase (for example in the demuxing/decoding integration and in the HardsubX module), so extending its use for MP4 parsing seemed like a reasonable option to explore.
Implementation
A new implementation (
mp4_ffmpeg.c) was added which uses FFmpeg's libavformat to open and parse MP4 containers.The general workflow is:
avformat_open_input()avformat_find_stream_info()av_read_frame()Video packets (H.264 / HEVC) are passed to the existing
do_NAL()processing logic, while caption tracks (CEA-608 / CEA-708) and subtitle tracks (tx3g) continue to use the existing CCExtractor parsing functions.One difference from the GPAC implementation is that FFmpeg reads packets sequentially across all streams, whereas the GPAC implementation reads samples per track. The downstream caption extraction pipeline remains unchanged.
For H.264 / HEVC streams, codec configuration data is obtained from the stream extradata (
avcC/hvcC) in order to determine the NAL unit length prefix size and extract SPS/PPS before processing packets.Build configuration
This backend is optional and controlled through a compile-time flag:
mp4.c)mp4_ffmpeg.c)The runtime behavior of CCExtractor remains unchanged — the difference only affects how the MP4 container is parsed internally.
Summary
This PR:
This provides a potential alternative MP4 backend using a widely available multimedia framework while preserving the existing behavior.