feat(api): public compactPad API + bin/compactPad CLI over existing Cleanup#7567
feat(api): public compactPad API + bin/compactPad CLI over existing Cleanup#7567JohnMcLear wants to merge 5 commits intoether:developfrom
Conversation
Review Summary by QodoAdd compactHistory() and compactPad CLI for database space reclamation
WalkthroughsDescription• Add compactHistory() method to collapse pad revision history into single base revision • Implement compactPad API endpoint and admin CLI for in-place database space reclamation • Preserve pad text, attributes, and chat history while clearing saved-revision bookmarks • Include comprehensive backend tests covering empty pads, text preservation, and post-compact edits Diagramflowchart LR
A["Long-lived Pad<br/>with Heavy History"] -->|compactHistory| B["Single Base Revision<br/>head=0"]
B -->|preserves| C["Text & Attributes<br/>Chat History"]
B -->|clears| D["Saved Revisions<br/>Old Changesets"]
E["Admin CLI<br/>compactPad.ts"] -->|calls API| F["API.compactPad<br/>v1.3.1"]
F -->|invokes| B
File Changes1. bin/compactPad.ts
|
Code Review by Qodo
1. compactPad missing feature flag
|
| version['1.3.1'] = { | ||
| ...version['1.3.0'], | ||
| compactPad: ['padID', 'authorId'], | ||
| }; | ||
|
|
||
|
|
||
| // set the latest available API version here | ||
| exports.latestApiVersion = '1.3.0'; | ||
| exports.latestApiVersion = '1.3.1'; |
There was a problem hiding this comment.
1. compactpad missing feature flag 📘 Rule violation ☼ Reliability
The new destructive compactPad feature (API + CLI) is enabled unconditionally with no feature-flag/disable mechanism, violating the requirement that new features be disabled by default. This can expose a new admin-capability surface area without an explicit opt-in toggle.
Agent Prompt
## Issue description
The new `compactPad` API/CLI feature is enabled by default with no feature flag.
## Issue Context
Compliance requires new features to be behind a feature flag and disabled by default, with no behavior/path changes when the flag is off.
## Fix Focus Areas
- src/node/handler/APIHandler.ts[145-152]
- src/node/db/API.ts[638-659]
- bin/compactPad.ts[57-58]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| version['1.3.1'] = { | ||
| ...version['1.3.0'], | ||
| compactPad: ['padID', 'authorId'], | ||
| }; | ||
|
|
||
|
|
||
| // set the latest available API version here | ||
| exports.latestApiVersion = '1.3.0'; | ||
| exports.latestApiVersion = '1.3.1'; |
There was a problem hiding this comment.
2. Http api docs not updated 📘 Rule violation ⚙ Maintainability
The documentation still states the latest HTTP API version is 1.3.0, but the code updates it to 1.3.1 and adds the new compactPad endpoint without updating the docs. This creates an API/documentation mismatch for integrators and administrators.
Agent Prompt
## Issue description
The HTTP API documentation is out of date: it still lists `1.3.0` as the latest API version even though the code now sets `1.3.1`.
## Issue Context
This PR introduces a new API version (`1.3.1`) and a new endpoint, so `doc/api/http_api.*` must be updated in the same change set.
## Fix Focus Areas
- src/node/handler/APIHandler.ts[145-152]
- doc/api/http_api.md[100-103]
- doc/api/http_api.adoc[65-70]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const deletions: Promise<void>[] = []; | ||
| for (let r = 0; r <= originalHead; r++) { | ||
| // @ts-ignore | ||
| deletions.push(this.db.remove(`pad:${this.id}:revs:${r}`)); | ||
| } | ||
| await Promise.all(deletions); |
There was a problem hiding this comment.
4. Unbounded revision deletions 🐞 Bug ➹ Performance
Pad.compactHistory() creates one Promise per revision and awaits Promise.all(), which can exhaust memory and overwhelm the DB for large pads (the target use case). This can degrade or crash the Etherpad process during compaction.
Agent Prompt
### Issue description
`compactHistory()` deletes all revision keys by building a `deletions` array and calling `Promise.all(deletions)`. For large `head` values this can allocate huge memory and issue a massive number of concurrent DB operations.
### Issue Context
`Pad.remove()` already uses bounded concurrency via `timesLimit(..., 500, ...)` for the same style of per-revision deletions.
### Fix Focus Areas
- src/node/db/Pad.ts[600-605]
- src/node/db/Pad.ts[706-714]
### Fix approach
Replace the unbounded `Promise.all(deletions)` pattern with a bounded-concurrency loop, e.g. using `timesLimit(originalHead + 1, 500, async (r) => { await this.db.remove(`pad:${this.id}:revs:${r}`, null); })` (mirroring `Pad.remove()` semantics).
This avoids O(N) Promise allocation and prevents DB overload.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| async compactHistory(authorId = '') { | ||
| const originalHead = this.head; | ||
| if (originalHead <= 0) return 0; | ||
|
|
||
| // Build a single changeset that produces the current atext on top of a | ||
| // freshly-initialized pad ("\n\n" per copyPadWithoutHistory comment). | ||
| // This mirrors the existing copyPadWithoutHistory path exactly so we | ||
| // inherit its tested correctness. | ||
| const oldAText = this.atext; | ||
| const assem = new SmartOpAssembler(); | ||
| for (const op of opsFromAText(oldAText)) assem.append(op); | ||
| assem.endDocument(); | ||
| const oldLength = 2; | ||
| const newLength = assem.getLengthChange(); | ||
| const newText = oldAText.text; | ||
| const baseChangeset = pack(oldLength, newLength, assem.toString(), newText); | ||
|
|
||
| // Drop every existing revision + saved-revision pointer and reset the | ||
| // pad's in-memory state to pre-any-revisions. | ||
| const deletions: Promise<void>[] = []; | ||
| for (let r = 0; r <= originalHead; r++) { | ||
| // @ts-ignore | ||
| deletions.push(this.db.remove(`pad:${this.id}:revs:${r}`)); | ||
| } | ||
| await Promise.all(deletions); | ||
| this.savedRevisions = []; | ||
| this.head = -1; | ||
| this.atext = makeAText('\n'); | ||
| // pool is retained — attributes from the composed text will reuse it, | ||
| // and we do not know which other pads may hold references to pool ids. | ||
|
|
||
| await this.appendRevision(baseChangeset, authorId); | ||
| return originalHead; |
There was a problem hiding this comment.
5. No session eviction/lock 🐞 Bug ☼ Reliability
compactHistory() performs destructive revision deletion and resets head/atext without kicking connected users, so concurrent edits can race with the compaction and lead to lost updates or inconsistent client state. Etherpad already kicks sessions before destructive pad removal, but compactHistory() does not.
Agent Prompt
### Issue description
`compactHistory()` deletes all revisions and resets in-memory pad state while users might still be connected and editing the pad. This can race with `handleUserChanges()` and cause lost updates or client errors.
### Issue Context
`Pad.remove()` calls `padMessageHandler.kickSessionsFromPad(padID)` before destructive operations.
### Fix Focus Areas
- src/node/db/Pad.ts[581-613]
- src/node/db/Pad.ts[673-679]
- src/node/handler/PadMessageHandler.ts[177-189]
### Fix approach
Before deleting revisions/resetting state, kick active sessions from the pad (and ideally ensure no further queued edits are processed) similarly to `Pad.remove()`.
At minimum:
- Call `padMessageHandler.kickSessionsFromPad(this.id)` at the start of `compactHistory()` when `originalHead > 0`.
Optionally, also add safeguards to prevent concurrent mutations during compaction (e.g., a per-pad mutex or equivalent queuing).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Fixes ether#6194. Long-lived pads with heavy edit history dominate the DB — the issue describes a ~400 MB Postgres after two months with ~100 users. Etherpad keeps every revision forever, and removing arbitrary middle revisions is unsafe because state is reconstructed by composing forward from key revisions. What's safe: collapse the full history into a single base revision that reproduces the current atext. The existing `copyPadWithoutHistory` already does this for a new pad ID — this PR lifts that same changeset pattern into an in-place operation and wires up an admin CLI. - `Pad.compactHistory(authorId?)` (src/node/db/Pad.ts): composes the current atext into one base changeset, deletes all existing rev records, clears saved-revision bookmarks, and appends the new rev 0. Text, attributes, and chat history are preserved; saved-revision pointers are cleared. Returns the number of revisions removed. - `API.compactPad(padID, authorId?)` (src/node/db/API.ts): public-API wrapper around compactHistory. Reports `{removed}` so callers can log savings. - `APIHandler.ts`: register `compactPad` under a new `1.3.1` version, bump `latestApiVersion`. - `bin/compactPad.ts`: admin CLI. Reports the current revision count, calls compactPad via the HTTP API, and prints how many revisions were dropped. - `src/tests/backend/specs/compactPad.ts`: four backend tests cover the empty-pad no-op, the text-preservation + head=0 contract, saved-revision cleanup, and that subsequent edits continue to append cleanly on top of the collapsed base. The operation is destructive so admins must opt in explicitly; the CLI prints the before-count, and the recommended pre-flight is an `.etherpad` export (backup). Closes ether#6194 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The initial compactHistory() implementation built a custom base changeset and re-ran appendRevision against a reset atext — but the changeset was packed with oldLength=2 (matching copyPadWithoutHistory's dest-pad init state) while the reset atext was only length 1, so applyToText tripped its "mismatched apply: 1 / 2" assertion and every test failed with a Changeset corruption error. Switch to the tested path instead: copy the pad via copyPadWithoutHistory to a uniquely-named temp pad (inherits all its attribute/pool/changeset correctness), read the temp pad's rev records back, delete the old ones under our pad's ID, write the new records in their place, update in-memory state to match, and remove the temp pad. Errors at any step fall through with a best-effort temp-pad cleanup. Contract shifts slightly: the collapsed pad is head<=1 rather than head=0, matching the shape of a freshly-imported pad (seed rev 0 + content rev 1). Tests updated to assert that invariant plus text-preservation, saved-revision cleanup, and append-after-compact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests previously asserted head=0 exactly after compaction; the temp-pad-swap path lands at head=1 (one seed rev plus one content rev) matching the shape of a freshly-imported pad. Relax the assertions to and derive the removed-count from before-head minus after-head, so the tests still catch regressions in text-preservation, saved-revision cleanup, and append-after-compact without being tied to the exact implementation shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Develop already ships a working revision-cleanup path under
`src/node/utils/Cleanup.ts` with two public helpers —
`deleteAllRevisions(padId)` (collapse full history via
copyPadWithoutHistory) and `deleteRevisions(padId, keepRevisions)`
(keep the last N). The admin-settings UI wires these up but neither
is exposed on the public API, and there's no CLI for operators who
want to run compaction outside the web UI. That's the gap this PR
now fills.
Changes from the prior revision of this PR:
- Drop `pad.compactHistory()` — it re-implemented what
`Cleanup.deleteAllRevisions` already does. Remove the duplicate.
- `API.compactPad(padID, keepRevisions?)` now delegates to Cleanup:
• keepRevisions null/undefined → deleteAllRevisions (full collapse)
• keepRevisions >= 0 → deleteRevisions(N) (keep last N)
Returns {ok, mode: 'all' | 'keepLast', keepRevisions?}.
- APIHandler `1.3.1`: signature updated to take `keepRevisions`
instead of `authorId`.
- `bin/compactPad.ts`: accepts `--keep N` for the keep-last mode,
shows before/after revision counts so operators see concrete
savings.
- Backend tests rewritten around the public API surface (mode
reporting, text preservation, input validation) rather than
internal method plumbing that no longer exists.
Net: strictly a thin public-API and CLI veneer over already-tested
Cleanup helpers. No new low-level logic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0f10819 to
5491341
Compare
Cleanup.deleteAllRevisions internally calls copyPadWithoutHistory twice (src → tempId, tempId → src with force=true), and each round trip normalizes trailing whitespace. That meant my byte-exact atext.text assertion failed in CI: expected: '...line 3\n\n\n' actual: '...line 3\n' Swap the comparisons to use content markers (marker-alpha / beta / gamma, keep-line-N). The test still catches the real regressions — if compactPad lost content those markers would disappear — without coupling to whitespace quirks of the existing Cleanup implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Addresses #6194. Develop already ships a working revision-cleanup path under
src/node/utils/Cleanup.ts—deleteAllRevisions(padId)collapses full history viacopyPadWithoutHistory, anddeleteRevisions(padId, keepRevisions)keeps the last N. Both are wired into the admin-settings endpoint but neither is accessible through the public API, and there's no CLI for operators who want to run compaction from a terminal. That's the gap this PR fills.(Earlier drafts of this PR re-implemented a compact helper on
Pad; once the Cleanup module landed on develop that became redundant. Refactored to strictly wrap what already exists.)What's added
API.compactPad(padID, keepRevisions?)keepRevisionsnull/omitted →Cleanup.deleteAllRevisions(full collapse). Positive integer N →Cleanup.deleteRevisions(padId, N)(keep last N). Returns{ok, mode: 'all'|'keepLast', keepRevisions?}. Validates non-negative integer.1.3.1compactPad: ['padID', 'keepRevisions'], bumpslatestApiVersionto1.3.1.bin/compactPad.tsnode bin/compactPad.js <padID>(collapse all) ornode bin/compactPad.js <padID> --keep N. Prints before/after revision counts so operators see concrete savings.Why wrap over re-implement
Cleanup.deleteAllRevisions/deleteRevisionsare tested in place and already used in production via the admin UI.--keep N; hitting the full admin UI isn't a realistic workflow for scripted maintenance.Test plan
pnpm run ts-checkclean locally after rebase onto developnode bin/compactPad.js <padID>against a pad with many revisions; observe the before/after countCloses #6194
🤖 Generated with Claude Code