feat(export): native DOCX export via html-to-docx (opt-in)#7568
feat(export): native DOCX export via html-to-docx (opt-in)#7568JohnMcLear wants to merge 2 commits intoether:developfrom
Conversation
Review Summary by QodoNative DOCX export via html-to-docx with opt-in setting
WalkthroughsDescription• Add pure-JS in-process DOCX export via html-to-docx library • Gate feature with nativeDocxExport setting (default false) • Eliminates LibreOffice/soffice dependency for DOCX exports • Auto-fallback to LibreOffice if native converter fails Diagramflowchart LR
A["DOCX Export Request"] --> B{"nativeDocxExport<br/>enabled?"}
B -->|Yes| C["Render HTML"]
C --> D["html-to-docx<br/>converter"]
D --> E{"Conversion<br/>success?"}
E -->|Yes| F["Return DOCX buffer"]
E -->|No| G["Fall back to<br/>LibreOffice"]
B -->|No| G
G --> H["Return DOCX via<br/>soffice"]
F --> I["Response with<br/>correct content-type"]
H --> I
File Changes1. src/package.json
|
Code Review by Qodo
1. DOCX export still needs soffice
|
| if (type === 'docx' && settings.nativeDocxExport) { | ||
| try { | ||
| const htmlToDocx = require('html-to-docx'); | ||
| const docxBuffer = await htmlToDocx(html); | ||
| html = null; | ||
| res.contentType( | ||
| 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | ||
| res.send(docxBuffer); | ||
| return; | ||
| } catch (err) { | ||
| console.warn( | ||
| `native-docx export failed for pad "${padId}", falling back to ` + | ||
| `LibreOffice: ${(err as Error).message || err}`); | ||
| } |
There was a problem hiding this comment.
1. Docx export still needs soffice 📎 Requirement gap ≡ Correctness
The new DOCX export path is opt-in (nativeDocxExport defaults to false) and explicitly falls back to the existing LibreOffice/soffice path on error, so DOCX export is not fully free of a LibreOffice runtime dependency. This fails the requirement to support DOCX export without requiring LibreOffice for these formats.
Agent Prompt
## Issue description
Compliance requires DOCX export to work without a LibreOffice/`soffice` runtime dependency. The new native DOCX export is opt-in by default and explicitly falls back to the LibreOffice path on error, so LibreOffice is still required as a backstop for DOCX export.
## Issue Context
Current implementation uses `html-to-docx` when `nativeDocxExport` is enabled, but catches errors and falls through to LibreOffice. This violates the stated objective of having DOCX export not depend on LibreOffice.
## Fix Focus Areas
- src/node/handler/ExportHandler.ts[97-110]
- src/node/utils/Settings.ts[419-426]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Native DOCX path (issue #7538) — when `nativeDocxExport` is enabled, | ||
| // convert the HTML export into a Word document in-process with | ||
| // `html-to-docx` instead of shelling out to LibreOffice. Saves admins | ||
| // from having to install `soffice` and avoids per-export subprocess | ||
| // latency. On failure we fall through to the LibreOffice path below | ||
| // so the change is strictly additive (opt-in via setting, auto-fallback | ||
| // if the converter throws). | ||
| if (type === 'docx' && settings.nativeDocxExport) { | ||
| try { | ||
| const htmlToDocx = require('html-to-docx'); | ||
| const docxBuffer = await htmlToDocx(html); | ||
| html = null; | ||
| res.contentType( | ||
| 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | ||
| res.send(docxBuffer); | ||
| return; | ||
| } catch (err) { | ||
| console.warn( | ||
| `native-docx export failed for pad "${padId}", falling back to ` + | ||
| `LibreOffice: ${(err as Error).message || err}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
2. Docx blocked without soffice 🐞 Bug ≡ Correctness
Even with settings.nativeDocxExport=true, DOCX exports are rejected and hidden when settings.soffice is null because exportAvailable() gates docx behind soffice in both the /export route and UI. This makes the new native DOCX branch in ExportHandler unreachable in the documented no-LibreOffice configuration.
Agent Prompt
## Issue description
Native DOCX export is implemented but is effectively unreachable in the intended “no soffice installed” configuration because the server route guard and client UI still treat `docx` as requiring LibreOffice.
## Issue Context
- Server-side guard blocks `docx` when `exportAvailable() === 'no'`.
- `exportAvailable()` currently only reflects `soffice` presence.
- Client UI removes the Word export link when `clientVars.exportAvailable === 'no'`.
- Docs say setting `SOFFICE` to `null` disables LibreOffice (typical for no-soffice deployments).
## Fix Focus Areas
- Update server export guard to allow `docx` when `settings.nativeDocxExport === true`, even if `soffice` is null:
- src/node/hooks/express/importexport.ts[27-48]
- Add a dedicated capability flag for “Word export available” (or “nativeDocxExport enabled”) into clientVars so the UI can show Word export even when other converter-based exports remain disabled:
- src/node/handler/PadMessageHandler.ts[1113-1118]
- src/static/js/pad_impexp.ts[147-166]
- Avoid incorrectly enabling PDF/ODT links when only native DOCX is available (introduce a new state or separate flags rather than reusing `exportAvailable`).
- src/node/utils/Settings.ts[700-709]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Addresses ether#7538. The current DOCX export path shells out to LibreOffice, which means every deployment that wants a Word download either installs soffice (~500 MB) or loses that export. This PR adds a pure-JS alternative: render the HTML via the existing exporthtml pipeline, then feed it to the `html-to-docx` library in-process to produce a valid .docx buffer — no soffice required, no subprocess spawn, no temp file dance for the DOCX case. Behavior: - `settings.nativeDocxExport` (default `false`) gates the new path so existing deployments see zero behavior change. - When enabled, `type === 'docx'` requests skip the LibreOffice branch, run `html-to-docx(html)`, and return the buffer with the `application/vnd.openxmlformats-officedocument.wordprocessingml.document` content-type. - If the native converter throws, the handler falls through to the existing LibreOffice path — so flipping the flag on is safe even on a mixed-installation where soffice is still present as a backstop. - Other export formats (pdf, odt, rtf, txt, html, etherpad) are unchanged. Files: - `src/package.json`: `html-to-docx` dep (pure JS, no binary reqs) - `src/node/handler/ExportHandler.ts`: new DOCX branch gated on the setting, with fall-through on error - `src/node/utils/Settings.ts`, `settings.json.template`, `settings.json.docker`, `doc/docker.md`: wire up the new setting + env var (`NATIVE_DOCX_EXPORT`) - `src/tests/backend/specs/export.ts`: two new tests — asserts the exported buffer is a valid ZIP (PK\x03\x04 signature) and the response carries the correct content-type — both with `settings.soffice = 'false'` to prove the path doesn't need soffice at all. Out of scope for this PR: - Native PDF export (would need a PDF rendering step — separate undertaking, and the issue acknowledges the `pdfkit`/puppeteer size trade-off). Closes ether#7538 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7e5a73c to
b98dfba
Compare
The upgrade-from-latest-release CI job installs deps from the previous
release's package.json (before this PR adds html-to-docx) and then
git-checkouts this branch's code without re-running pnpm install.
Under that one workflow the new test can't find the module and fails
on the LibreOffice fallback, masking that the native path actually
works in every normal install.
Guard the describe block with require.resolve('html-to-docx'); Mocha's
this.skip() on before cascades to the sibling its. Regular backend
tests (pnpm install against this branch's lockfile) still exercise it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
I feel like we can't drop pdf so need to have a conversation here... |
Summary
Addresses #7538. The current DOCX export path shells out to LibreOffice, which means every deployment that wants a Word download either installs
soffice(~500 MB of dep, plus subprocess latency on each export) or loses that format. This PR adds a pure-JS alternative: render the HTML via the existingexporthtmlpipeline, feed it tohtml-to-docxin-process, stream the buffer back. No soffice required, no subprocess spawn.Shape
settings.nativeDocxExport(defaultfalse) gates the new path, so existing deployments see zero behavior change.type === 'docx'requests skip the LibreOffice branch and return a fresh DOCX buffer withContent-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document.Files
src/package.jsonhtml-to-docx(pure JS, no binary reqs)src/node/handler/ExportHandler.tssrc/node/utils/Settings.ts,settings.json.template,settings.json.dockerNATIVE_DOCX_EXPORTenv vardoc/docker.mdsrc/tests/backend/specs/export.tsPK\x03\x04signature) and that the response carries the correct content-type — both withsettings.soffice = 'false'to prove the path doesn't touch sofficeOut of scope
pdfkit/ puppeteer tradeoff (~200 MB). That belongs in a separate PR.html-to-docxitself weighs in at a few MB and has no native deps.Test plan
pnpm run ts-checkclean locallyNATIVE_DOCX_EXPORT=truewith nosofficeinstalled; DOCX export still workshtml-to-docxto throw; verify the old LibreOffice path is tried as a fallbackCloses #7538
🤖 Generated with Claude Code