Skip to content

feat(export): native DOCX export via html-to-docx (opt-in)#7568

Open
JohnMcLear wants to merge 2 commits intoether:developfrom
JohnMcLear:feat/native-docx-export-7538
Open

feat(export): native DOCX export via html-to-docx (opt-in)#7568
JohnMcLear wants to merge 2 commits intoether:developfrom
JohnMcLear:feat/native-docx-export-7538

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Addresses #7538. The current DOCX export path shells out to LibreOffice, which means every deployment that wants a Word download either installs soffice (~500 MB of dep, plus subprocess latency on each export) or loses that format. This PR adds a pure-JS alternative: render the HTML via the existing exporthtml pipeline, feed it to html-to-docx in-process, stream the buffer back. No soffice required, no subprocess spawn.

Shape

  • settings.nativeDocxExport (default false) gates the new path, so existing deployments see zero behavior change.
  • When enabled, type === 'docx' requests skip the LibreOffice branch and return a fresh DOCX buffer with Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document.
  • If the native converter throws, the handler falls through to the existing LibreOffice path — so flipping this on is safe even on a mixed install where soffice is present as a backstop.
  • Other formats (pdf, odt, rtf, txt, html, etherpad) are untouched.

Files

File Change
src/package.json add html-to-docx (pure JS, no binary reqs)
src/node/handler/ExportHandler.ts new DOCX branch gated on the setting, fall-through on error
src/node/utils/Settings.ts, settings.json.template, settings.json.docker wire up the flag + NATIVE_DOCX_EXPORT env var
doc/docker.md env-var row
src/tests/backend/specs/export.ts two new tests: asserts the exported buffer is a valid ZIP (PK\x03\x04 signature) and that the response carries the correct content-type — both with settings.soffice = 'false' to prove the path doesn't touch soffice

Out of scope

  • Native PDF export — the issue explicitly acknowledges the pdfkit / puppeteer tradeoff (~200 MB). That belongs in a separate PR.
  • The ~200 MB figure the issue mentions is about puppeteer; html-to-docx itself weighs in at a few MB and has no native deps.

Test plan

  • pnpm run ts-check clean locally
  • New backend tests validate the ZIP signature + content-type
  • CI green
  • Manual: set NATIVE_DOCX_EXPORT=true with no soffice installed; DOCX export still works
  • Manual: force html-to-docx to throw; verify the old LibreOffice path is tried as a fallback

Closes #7538

🤖 Generated with Claude Code

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Native DOCX export via html-to-docx with opt-in setting

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add pure-JS in-process DOCX export via html-to-docx library
• Gate feature with nativeDocxExport setting (default false)
• Eliminates LibreOffice/soffice dependency for DOCX exports
• Auto-fallback to LibreOffice if native converter fails
Diagram
flowchart LR
  A["DOCX Export Request"] --> B{"nativeDocxExport<br/>enabled?"}
  B -->|Yes| C["Render HTML"]
  C --> D["html-to-docx<br/>converter"]
  D --> E{"Conversion<br/>success?"}
  E -->|Yes| F["Return DOCX buffer"]
  E -->|No| G["Fall back to<br/>LibreOffice"]
  B -->|No| G
  G --> H["Return DOCX via<br/>soffice"]
  F --> I["Response with<br/>correct content-type"]
  H --> I
Loading

Grey Divider

File Changes

1. src/package.json Dependencies +1/-0

Add html-to-docx dependency

• Add html-to-docx dependency at version ^1.8.0
• Pure JavaScript library with no binary requirements

src/package.json


2. src/node/handler/ExportHandler.ts ✨ Enhancement +23/-0

Implement native DOCX export with fallback

• New DOCX export branch gated on settings.nativeDocxExport
• Converts HTML to DOCX buffer in-process using html-to-docx
• Sets correct application/vnd.openxmlformats-officedocument.wordprocessingml.document
 content-type
• Falls through to LibreOffice path on converter error with warning log

src/node/handler/ExportHandler.ts


3. src/node/utils/Settings.ts ⚙️ Configuration changes +9/-0

Add nativeDocxExport setting type and default

• Add nativeDocxExport: boolean field to SettingsType
• Initialize with default value false to preserve existing behavior
• Include JSDoc explaining opt-in nature and auto-fallback behavior

src/node/utils/Settings.ts


View more (5)
4. settings.json.template ⚙️ Configuration changes +10/-0

Add nativeDocxExport to template config

• Add nativeDocxExport configuration option with default false
• Include documentation explaining in-process conversion and fallback behavior

settings.json.template


5. settings.json.docker ⚙️ Configuration changes +6/-0

Add nativeDocxExport to Docker config

• Add nativeDocxExport with environment variable binding ${NATIVE_DOCX_EXPORT:false}
• Include comment explaining the feature and auto-fallback mechanism

settings.json.docker


6. doc/docker.md 📝 Documentation +1/-0

Document NATIVE_DOCX_EXPORT environment variable

• Document NATIVE_DOCX_EXPORT environment variable in configuration table
• Explain in-process conversion, soffice bypass, and auto-fallback behavior
• Set default value to false

doc/docker.md


7. src/tests/backend/specs/export.ts 🧪 Tests +39/-0

Add native DOCX export validation tests

• Import assert module for test assertions
• Backup and restore nativeDocxExport setting in test lifecycle
• Add new test suite for native DOCX export validating ZIP signature
• Test that exported buffer starts with ZIP header 0x504b0304 (PK signature)
• Test correct application/vnd.openxmlformats-officedocument.wordprocessingml.document
 content-type
• Both tests run with soffice: false to prove path doesn't require LibreOffice

src/tests/backend/specs/export.ts


8. pnpm-lock.yaml Dependencies +476/-0

Lock html-to-docx and transitive dependencies

• Lock file entries for html-to-docx@1.8.0 and all transitive dependencies
• Includes DOM parsing libraries (@oozcitak/dom, @oozcitak/util), ZIP handling (jszip), image
 processing, and XML building utilities
• No native binary dependencies added

pnpm-lock.yaml


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects bot commented Apr 20, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (1)

Grey Divider


Action required

1. DOCX export still needs soffice 📎 Requirement gap ≡ Correctness
Description
The new DOCX export path is opt-in (nativeDocxExport defaults to false) and explicitly falls
back to the existing LibreOffice/soffice path on error, so DOCX export is not fully free of a
LibreOffice runtime dependency. This fails the requirement to support DOCX export without requiring
LibreOffice for these formats.
Code

src/node/handler/ExportHandler.ts[R97-110]

+    if (type === 'docx' && settings.nativeDocxExport) {
+      try {
+        const htmlToDocx = require('html-to-docx');
+        const docxBuffer = await htmlToDocx(html);
+        html = null;
+        res.contentType(
+            'application/vnd.openxmlformats-officedocument.wordprocessingml.document');
+        res.send(docxBuffer);
+        return;
+      } catch (err) {
+        console.warn(
+            `native-docx export failed for pad "${padId}", falling back to ` +
+            `LibreOffice: ${(err as Error).message || err}`);
+      }
Evidence
PR Compliance ID 1 requires DOCX/PDF support using native/local tooling with no runtime dependency
on LibreOffice. The added DOCX branch is gated behind settings.nativeDocxExport and, if conversion
fails, logs a warning and falls through to the LibreOffice export path, meaning LibreOffice remains
a required backstop in the DOCX export flow.

Native DOCX/PDF import/export support without Abiword/LibreOffice dependency
src/node/handler/ExportHandler.ts[97-110]
src/node/utils/Settings.ts[419-426]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Compliance requires DOCX export to work without a LibreOffice/`soffice` runtime dependency. The new native DOCX export is opt-in by default and explicitly falls back to the LibreOffice path on error, so LibreOffice is still required as a backstop for DOCX export.

## Issue Context
Current implementation uses `html-to-docx` when `nativeDocxExport` is enabled, but catches errors and falls through to LibreOffice. This violates the stated objective of having DOCX export not depend on LibreOffice.

## Fix Focus Areas
- src/node/handler/ExportHandler.ts[97-110]
- src/node/utils/Settings.ts[419-426]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. DOCX blocked without soffice 🐞 Bug ≡ Correctness
Description
Even with settings.nativeDocxExport=true, DOCX exports are rejected and hidden when settings.soffice
is null because exportAvailable() gates docx behind soffice in both the /export route and UI. This
makes the new native DOCX branch in ExportHandler unreachable in the documented no-LibreOffice
configuration.
Code

src/node/handler/ExportHandler.ts[R90-111]

+    // Native DOCX path (issue #7538) — when `nativeDocxExport` is enabled,
+    // convert the HTML export into a Word document in-process with
+    // `html-to-docx` instead of shelling out to LibreOffice. Saves admins
+    // from having to install `soffice` and avoids per-export subprocess
+    // latency. On failure we fall through to the LibreOffice path below
+    // so the change is strictly additive (opt-in via setting, auto-fallback
+    // if the converter throws).
+    if (type === 'docx' && settings.nativeDocxExport) {
+      try {
+        const htmlToDocx = require('html-to-docx');
+        const docxBuffer = await htmlToDocx(html);
+        html = null;
+        res.contentType(
+            'application/vnd.openxmlformats-officedocument.wordprocessingml.document');
+        res.send(docxBuffer);
+        return;
+      } catch (err) {
+        console.warn(
+            `native-docx export failed for pad "${padId}", falling back to ` +
+            `LibreOffice: ${(err as Error).message || err}`);
+      }
+    }
Evidence
The PR adds a native DOCX branch, but requests for /export/docx are blocked earlier when LibreOffice
is disabled (soffice=null), and the UI removes the DOCX link under the same condition.
exportAvailable() only reflects soffice availability, so enabling nativeDocxExport alone won’t
expose or allow DOCX export.

src/node/handler/ExportHandler.ts[90-111]
src/node/hooks/express/importexport.ts[27-48]
src/static/js/pad_impexp.ts[147-166]
src/node/utils/Settings.ts[700-709]
doc/docker.md[190-194]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Native DOCX export is implemented but is effectively unreachable in the intended “no soffice installed” configuration because the server route guard and client UI still treat `docx` as requiring LibreOffice.

## Issue Context
- Server-side guard blocks `docx` when `exportAvailable() === 'no'`.
- `exportAvailable()` currently only reflects `soffice` presence.
- Client UI removes the Word export link when `clientVars.exportAvailable === 'no'`.
- Docs say setting `SOFFICE` to `null` disables LibreOffice (typical for no-soffice deployments).

## Fix Focus Areas
- Update server export guard to allow `docx` when `settings.nativeDocxExport === true`, even if `soffice` is null:
 - src/node/hooks/express/importexport.ts[27-48]
- Add a dedicated capability flag for “Word export available” (or “nativeDocxExport enabled”) into clientVars so the UI can show Word export even when other converter-based exports remain disabled:
 - src/node/handler/PadMessageHandler.ts[1113-1118]
 - src/static/js/pad_impexp.ts[147-166]
- Avoid incorrectly enabling PDF/ODT links when only native DOCX is available (introduce a new state or separate flags rather than reusing `exportAvailable`).
 - src/node/utils/Settings.ts[700-709]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Native DOCX test bypass 🐞 Bug ☼ Reliability
Description
The new native DOCX tests set settings.soffice='false' (a non-null string), which prevents
exportAvailable() from returning 'no' and sidesteps the server-side DOCX export block. This can make
tests pass while a real deployment with soffice=null (as documented) still cannot export DOCX.
Code

src/tests/backend/specs/export.ts[R36-39]

+    before(function () {
+      settings.soffice = 'false';
+      settings.nativeDocxExport = true;
+    });
Evidence
The tests configure soffice with a non-null string, but the documented way to disable LibreOffice is
null. Additionally, Settings reload logic will null out invalid soffice paths, meaning the test
configuration doesn’t reflect real behavior; the server route guard blocks docx when
exportAvailable() is 'no'.

src/tests/backend/specs/export.ts[32-39]
doc/docker.md[190-194]
src/node/utils/Settings.ts[700-709]
src/node/utils/Settings.ts[1019-1030]
src/node/hooks/express/importexport.ts[37-48]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The native DOCX tests use `settings.soffice = 'false'`, which is non-null and therefore does not simulate a true “no soffice” deployment (`soffice: null`). This can let the tests pass even if the feature is broken for real deployments.

## Issue Context
- Docs describe disabling LibreOffice by setting `SOFFICE` to `null`.
- Server-side export route blocks docx when `exportAvailable() === 'no'`.

## Fix Focus Areas
- Update the native DOCX tests to simulate a real no-soffice deployment (`settings.soffice = null`) and assert DOCX export still succeeds when `nativeDocxExport = true`:
 - src/tests/backend/specs/export.ts[32-65]
- After fixing the route/UI gating (see other finding), add a regression assertion that `/export/docx` works with `soffice = null` and fails (or is blocked) appropriately when nativeDocxExport is false.
 - src/tests/backend/specs/export.ts[22-66]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Unrestricted HTML-to-DOCX I/O 🐞 Bug ⛨ Security
Description
ExportHandler passes exported HTML directly into html-to-docx and buffers the entire DOCX in memory
for res.send(), and the dependency graph includes image-to-base64→node-fetch enabling outbound
network access from conversion code. Because HTML export can be plugin-modified, enabling
nativeDocxExport can allow untrusted pad/plugin output to trigger server-side requests and increase
memory pressure.
Code

src/node/handler/ExportHandler.ts[R99-105]

+        const htmlToDocx = require('html-to-docx');
+        const docxBuffer = await htmlToDocx(html);
+        html = null;
+        res.contentType(
+            'application/vnd.openxmlformats-officedocument.wordprocessingml.document');
+        res.send(docxBuffer);
+        return;
Evidence
The new code converts HTML in-process via html-to-docx and sends the resulting buffer. The lockfile
shows html-to-docx includes image-to-base64 (node-fetch), and ExportHtml provides a plugin hook that
can modify generated HTML, meaning untrusted plugin/pad output can influence the converter input and
potentially induce server-side I/O.

src/node/handler/ExportHandler.ts[97-105]
pnpm-lock.yaml[8709-8718]
pnpm-lock.yaml[8804-8807]
src/node/utils/ExportHtml.ts[321-337]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Native DOCX export converts plugin-modifiable HTML via `html-to-docx` with no constraints. The dependency tree includes `node-fetch`, so conversion code may perform outbound network access, and the handler buffers the full DOCX in memory.

## Issue Context
- ExportHandler calls `htmlToDocx(html)` and `res.send(docxBuffer)`.
- ExportHtml allows plugins to modify exported HTML.
- pnpm-lock indicates `html-to-docx` pulls in `image-to-base64` and `node-fetch`.

## Fix Focus Areas
- Investigate `html-to-docx` options to disable remote fetching / external resource resolution (or strip/deny `<img src>` and other fetchable URLs from HTML before conversion).
 - src/node/handler/ExportHandler.ts[97-105]
- Add guardrails: size limits for generated DOCX, timeouts/cancellation, and (if possible) run conversion in a constrained environment (worker/thread or sandbox) to reduce SSRF and DoS impact.
 - src/node/handler/ExportHandler.ts[97-111]
- Consider writing the buffer to a temp file and using `res.sendFile()` (or streaming) to reduce peak memory usage.
 - src/node/handler/ExportHandler.ts[99-105]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +97 to +110
if (type === 'docx' && settings.nativeDocxExport) {
try {
const htmlToDocx = require('html-to-docx');
const docxBuffer = await htmlToDocx(html);
html = null;
res.contentType(
'application/vnd.openxmlformats-officedocument.wordprocessingml.document');
res.send(docxBuffer);
return;
} catch (err) {
console.warn(
`native-docx export failed for pad "${padId}", falling back to ` +
`LibreOffice: ${(err as Error).message || err}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Docx export still needs soffice 📎 Requirement gap ≡ Correctness

The new DOCX export path is opt-in (nativeDocxExport defaults to false) and explicitly falls
back to the existing LibreOffice/soffice path on error, so DOCX export is not fully free of a
LibreOffice runtime dependency. This fails the requirement to support DOCX export without requiring
LibreOffice for these formats.
Agent Prompt
## Issue description
Compliance requires DOCX export to work without a LibreOffice/`soffice` runtime dependency. The new native DOCX export is opt-in by default and explicitly falls back to the LibreOffice path on error, so LibreOffice is still required as a backstop for DOCX export.

## Issue Context
Current implementation uses `html-to-docx` when `nativeDocxExport` is enabled, but catches errors and falls through to LibreOffice. This violates the stated objective of having DOCX export not depend on LibreOffice.

## Fix Focus Areas
- src/node/handler/ExportHandler.ts[97-110]
- src/node/utils/Settings.ts[419-426]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +90 to +111
// Native DOCX path (issue #7538) — when `nativeDocxExport` is enabled,
// convert the HTML export into a Word document in-process with
// `html-to-docx` instead of shelling out to LibreOffice. Saves admins
// from having to install `soffice` and avoids per-export subprocess
// latency. On failure we fall through to the LibreOffice path below
// so the change is strictly additive (opt-in via setting, auto-fallback
// if the converter throws).
if (type === 'docx' && settings.nativeDocxExport) {
try {
const htmlToDocx = require('html-to-docx');
const docxBuffer = await htmlToDocx(html);
html = null;
res.contentType(
'application/vnd.openxmlformats-officedocument.wordprocessingml.document');
res.send(docxBuffer);
return;
} catch (err) {
console.warn(
`native-docx export failed for pad "${padId}", falling back to ` +
`LibreOffice: ${(err as Error).message || err}`);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Docx blocked without soffice 🐞 Bug ≡ Correctness

Even with settings.nativeDocxExport=true, DOCX exports are rejected and hidden when settings.soffice
is null because exportAvailable() gates docx behind soffice in both the /export route and UI. This
makes the new native DOCX branch in ExportHandler unreachable in the documented no-LibreOffice
configuration.
Agent Prompt
## Issue description
Native DOCX export is implemented but is effectively unreachable in the intended “no soffice installed” configuration because the server route guard and client UI still treat `docx` as requiring LibreOffice.

## Issue Context
- Server-side guard blocks `docx` when `exportAvailable() === 'no'`.
- `exportAvailable()` currently only reflects `soffice` presence.
- Client UI removes the Word export link when `clientVars.exportAvailable === 'no'`.
- Docs say setting `SOFFICE` to `null` disables LibreOffice (typical for no-soffice deployments).

## Fix Focus Areas
- Update server export guard to allow `docx` when `settings.nativeDocxExport === true`, even if `soffice` is null:
  - src/node/hooks/express/importexport.ts[27-48]
- Add a dedicated capability flag for “Word export available” (or “nativeDocxExport enabled”) into clientVars so the UI can show Word export even when other converter-based exports remain disabled:
  - src/node/handler/PadMessageHandler.ts[1113-1118]
  - src/static/js/pad_impexp.ts[147-166]
- Avoid incorrectly enabling PDF/ODT links when only native DOCX is available (introduce a new state or separate flags rather than reusing `exportAvailable`).
  - src/node/utils/Settings.ts[700-709]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Addresses ether#7538. The current DOCX export path shells out to LibreOffice,
which means every deployment that wants a Word download either installs
soffice (~500 MB) or loses that export. This PR adds a pure-JS
alternative: render the HTML via the existing exporthtml pipeline, then
feed it to the `html-to-docx` library in-process to produce a valid
.docx buffer — no soffice required, no subprocess spawn, no temp file
dance for the DOCX case.

Behavior:
- `settings.nativeDocxExport` (default `false`) gates the new path so
  existing deployments see zero behavior change.
- When enabled, `type === 'docx'` requests skip the LibreOffice branch,
  run `html-to-docx(html)`, and return the buffer with the
  `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
  content-type.
- If the native converter throws, the handler falls through to the
  existing LibreOffice path — so flipping the flag on is safe even on a
  mixed-installation where soffice is still present as a backstop.
- Other export formats (pdf, odt, rtf, txt, html, etherpad) are
  unchanged.

Files:
- `src/package.json`: `html-to-docx` dep (pure JS, no binary reqs)
- `src/node/handler/ExportHandler.ts`: new DOCX branch gated on the
  setting, with fall-through on error
- `src/node/utils/Settings.ts`, `settings.json.template`,
  `settings.json.docker`, `doc/docker.md`: wire up the new setting +
  env var (`NATIVE_DOCX_EXPORT`)
- `src/tests/backend/specs/export.ts`: two new tests — asserts the
  exported buffer is a valid ZIP (PK\x03\x04 signature) and the
  response carries the correct content-type — both with
  `settings.soffice = 'false'` to prove the path doesn't need soffice
  at all.

Out of scope for this PR:
- Native PDF export (would need a PDF rendering step — separate
  undertaking, and the issue acknowledges the `pdfkit`/puppeteer size
  trade-off).

Closes ether#7538

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear JohnMcLear force-pushed the feat/native-docx-export-7538 branch from 7e5a73c to b98dfba Compare April 20, 2026 08:44
The upgrade-from-latest-release CI job installs deps from the previous
release's package.json (before this PR adds html-to-docx) and then
git-checkouts this branch's code without re-running pnpm install.
Under that one workflow the new test can't find the module and fails
on the LibreOffice fallback, masking that the native path actually
works in every normal install.

Guard the describe block with require.resolve('html-to-docx'); Mocha's
this.skip() on before cascades to the sibling its. Regular backend
tests (pnpm install against this branch's lockfile) still exercise it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JohnMcLear
Copy link
Copy Markdown
Member Author

I feel like we can't drop pdf so need to have a conversation here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support docx/pdf import/export natively

1 participant