Skip to content

PDFBOX-6197: TTFSubsetter: add support for custom cmap subtables via addCustomCmapEntry() / addCustomCmap()#446

Open
sz5000 wants to merge 1 commit intoapache:trunkfrom
sz5000:ttf_subsetter_extension
Open

PDFBOX-6197: TTFSubsetter: add support for custom cmap subtables via addCustomCmapEntry() / addCustomCmap()#446
sz5000 wants to merge 1 commit intoapache:trunkfrom
sz5000:ttf_subsetter_extension

Conversation

@sz5000
Copy link
Copy Markdown

@sz5000 sz5000 commented Apr 21, 2026

Summary

Extends TTFSubsetter with two new public methods that allow callers to inject custom cmap
subtables into the subset TTF. This enables correct re-subsetting of TrueType fonts that use
non-Unicode cmap encodings — in particular fonts produced by Ghostscript with TT_BIAS=0xF000,
where the Mac Roman cmap (platform 1, encoding 0) is the primary rendering cmap used by viewers.

Fixes PDFBOX-XXXXX.


Changes

TTFSubsetter.java

  • New field customCmapEntries — accumulates entries added via the new API
  • buildCmapTable() extended:
    • now produces output even when uniToGID is empty (i.e. when only addGlyphIds() was used)
    • writes one Format 4 subtable per distinct (platformId, platformEncodingId) pair
    • GIDs are translated to renumbered subset GIDs via the existing getNewGlyphId() mechanism
  • New private helpers buildFormat4Subtable() and buildFormat4SubtableNewGids() extracted
    from the previous monolithic buildCmapTable() implementation
  • New public method addCustomCmapEntry(platformId, platformEncodingId, charCode, gid)
  • New public method addCustomCmap(platformId, platformEncodingId, Map<Integer,Integer> codeToGid)
  • Javadoc added for addGlyphIds() (previously undocumented)

Backwards Compatibility

Fully backwards-compatible. Callers that do not use the new API get identical behaviour to
the previous implementation.


Testing

Verified against PDFs produced by Ghostscript 10.x containing Thai and other non-Latin scripts
subsetted with TT_BIAS=0xF000. Before this change, re-subsetting such fonts produced a TTF
with a broken or missing Mac Roman cmap, causing viewers to render blank glyphs. After this
change, both the Mac Roman and Windows Symbol subtables are correctly preserved in the subset,
and glyph rendering is identical to the original.

two new public method added, addCustomCmapEntry and addCustomCmap

TTFSubsetter currently only writes a single Windows Unicode BMP cmap subtable (platform 3, encoding 1) and only when addAll() has been called. There is no way for callers to inject additional cmap subtables — for example a Mac Roman subtable (platform 1, encoding 0) or a Windows Symbol subtable (platform 3, encoding 0).
This limitation makes it impossible to correctly re-subset TrueType fonts that were originally subsetted by Ghostscript using its TT_BIAS=0xF000 strategy, where the font's Mac Roman cmap is the primary rendering cmap used by viewers.
@sz5000 sz5000 changed the title TTFSubsetter: add support for custom cmap subtables via addCustomCmapEntry() / addCustomCmap() PDFBOX-6197: TTFSubsetter: add support for custom cmap subtables via addCustomCmapEntry() / addCustomCmap() Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant