Rewrite deployment guide with production operations guidance#697
Rewrite deployment guide with production operations guidance#697dahlia wants to merge 22 commits intofedify-dev:mainfrom
Conversation
…fundamentals
The existing deployment guide described only a thin set of platform-specific
setups and left out most of what a first-time Fedify operator needs to get
right before launch. This rewrites the opening of the guide around
Fedify-specific concerns:
- A framing intro that states what this guide is and isn't, and names the
two audiences it targets (Fedify developers who have never deployed, and
experienced operators new to Fedify).
- A runtime selection matrix that makes the operational trade-offs between
Node.js, Deno, Bun, and Cloudflare Workers explicit, including the
current Bun memory-leak caveat.
- A "Configuration fundamentals" section covering the three decisions every
Fedify operator has to make up front: the canonical origin (and when to
rely on x-forwarded-fetch instead), the persistent KV/MQ backend, and the
actor key lifecycle.
Later sections (traditional deployments, containers, worker separation,
serverless, security, observability, ActivityPub-specific operations, and a
checklist) will follow in subsequent commits.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
… managers
Expands the deployment guide with the operational layer that sits between a
running Fedify process and the internet. The material is intentionally
focused on the Fedify-specific details:
- A systemd service unit template with the hardening flags and the
`EnvironmentFile=` convention that keep database credentials and actor
private keys off the world-readable filesystem.
- nginx and Caddy reverse-proxy configurations sized for Fedify's payloads
(activity documents routinely exceed the 1 MiB body limit) and its
slower remote timeouts (inbox deliveries from struggling peers can take
well over 60 seconds). The notes call out the `Accept`/`Content-Type`
pass-through requirement, which is the most common silent cause of
broken federation behind misconfigured CDNs.
- A brief note on PM2 versus systemd explaining why systemd should be
the default on Linux.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Covers containerized Fedify deployments across the main orchestration
layers operators choose:
- Minimal Dockerfiles for both Node.js and Deno that run the process as
a non-root user and leave supervision to the orchestrator, with a note
about multi-arch builds for operators running on ARM VPSes.
- A Compose file that splits web and worker services off the same image
via NODE_TYPE, wires up Postgres and Redis with healthchecks, and
binds the app port to 127.0.0.1 so the canonical-origin invariant
isn't broken by direct upstream traffic.
- A short Kubernetes sketch that names only the Fedify-specific pieces
(two Deployments, HPA on queue depth, PVC choices) and defers mechanics
to upstream docs.
- A PaaS index (Fly.io, AWS ECS/EKS, Cloud Run, Render, Railway) with the
Fedify-relevant caveat for each rather than reimplementing each
vendor's quickstart.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Adds the section on splitting web and worker roles, which is the single
most important scaling pattern for busy Fedify servers and the one most
commonly missed by first-time operators. Highlights:
- The manuallyStartQueue/startQueue/NODE_TYPE trio, with the full example
in the Message queue chapter called out as the reference and only the
deployment-specific wiring (Compose, systemd, Kubernetes) described here.
- A warning against placing worker nodes behind a load balancer, which
silently breaks the enqueue/process split.
- An explicit call to audit the codebase for `immediate: true` before
launch, with the three reasons it's dangerous in production (blocks the
request, no retries, couples delivery to request lifetime).
- A reminder about sizing the connection pool for ParallelMessageQueue,
since the failure mode (stalled jobs, not errors) is easy to misread
as a slow queue.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Replaces the earlier thin Cloudflare Workers and Deno Deploy sections with
expanded guidance shaped by what operators actually hit in production:
- Cloudflare Workers: the builder pattern, manual queue() export wiring,
Node.js compatibility flag, and the native-retry delegation through
MessageQueue.nativeRetrial. Adds operational notes that are absent from
upstream Cloudflare docs: storing credentials with `wrangler secret put`
rather than in `vars`, and the WAF skip rules needed to keep Cloudflare's
default Bot Protection from challenging fediverse traffic carrying
`application/activity+json` bodies. Remote servers don't solve CAPTCHAs,
so this silent-failure mode is one of the most common launch blockers on
Workers.
- Deno Deploy: calls out the EA/Classic split explicitly. Deno Deploy
Classic is deprecated and in maintenance mode; new deployments should
target Deno Deploy EA. Keeps the zero-infrastructure Deno KV example
and mentions DenoKvMessageQueue's native retry delegation.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Adds the security section, focused on the three threats that disproportionately
affect Fedify apps compared to other web applications:
- XSS through federated HTML. ActivityPub carries post content as HTML
from untrusted remote servers, and Fedify deliberately does not sanitize
it for you (because what's safe depends on the rendering context). The
section gives allowlist-based sanitizer examples for Node.js
(sanitize-html) and Deno (xss), warns against regex-based sanitizers,
recommends a CSP as defense in depth, and calls out the common
"signatures mean trust" mistake.
- SSRF through follow-on fetches. Documents the built-in protection in
Fedify's document loaders, warns explicitly against
allowPrivateAddress: true in production, and enumerates the common
application-code scenarios (avatar downloads, attachment fetching,
link previews, webhook delivery) where application code bypasses that
protection and has to defend itself with validatePublicUrl() or an
ssrfcheck-style guard. Also notes the redirect-following pitfall.
- Secret and key management. Separates instance-wide secrets (env/secret
manager) from per-actor key pairs (database rows), with concrete notes
for systemd, Docker, Kubernetes, and Cloudflare Workers.
A shorter "other practices" subsection collects the HTTPS requirement,
the skipSignatureVerification warning, inbox-level blocklists, and the
clock-sync reminder.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
…rics
Production observability for Fedify is different from a typical web app
because federation failures are usually silent: a falling-behind outbox,
a remote server rejecting your signatures, or a crashed queue consumer
don't show up in basic HTTP monitoring, but the trust-cache damage they
cause is painful to recover from. The section points readers to the
existing Logging and OpenTelemetry chapters for the API details and only
adds the pieces that are deployment-specific:
- LogTape: structured logs to stderr, default log level, per-category
configuration, and the redaction-at-the-sink pattern for PII and
secrets.
- OpenTelemetry: OTLP exporter configuration, a pragmatic sampling
recommendation (1–10% head-based + tail-based error sampling rather
than 100%), the Deno --unstable-otel flag, and the Workers
Observability alternative on Cloudflare.
- Sentry via @logtape/sentry as the typical error-aggregation pattern,
so operators keep one logging surface instead of two instrumentation
stacks.
- Four metrics that correlate most directly with federation health
(queue depth, inbox processing latency, outbox delivery success rate,
remote 410/404 rate) and the usual process-level signals to include
alongside them.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Adds the section that separates this guide from any generic web-app
deployment document. These are the habits a Fedify operator has to form
that nothing else on the web prepares them for:
- Domain name permanence. Once actors have federated out, their URIs
(which embed the domain) are stored in remote databases. A domain
change breaks follows, thread continuity, and actor identity. The
`Move` activity helps but doesn't fully recover these. The guide
recommends choosing the final domain before launch and, when
uncertain, deploying to a stable subdomain from the start.
- Graceful shutdown and service retirement. Cold shutdown leaves
followers on remote instances still trying to deliver activities and
still holding cached signatures. The section lays out the
five-step retirement sequence (freeze writes, broadcast `Delete`
activities with `orderingKey`, switch actor dispatchers to return
`Tombstone` so Fedify serves 410 Gone, keep 410 online for weeks or
months, then take down DNS).
- Handling inbound failures. Notes on `permanentFailureStatusCodes`,
clock drift against the signature time window, the DNS-rebinding
protection boundary between Fedify's document loaders and
application-level fetches, and per-remote rate limiting.
Progresses fedify-dev#689
Assisted-by: Claude Code:claude-opus-4-7[1m]
Closes out the deployment guide with a condensed pass through every pitfall it flags, grouped by category (configuration, scaling, runtime, security, observability, ActivityPub). The intent is not to be exhaustive—each item points back to the section where its rationale lives—but to give a first-time operator a single place to verify they haven't left anything load-bearing unset before opening federation traffic. Progresses fedify-dev#689 Assisted-by: Claude Code:claude-opus-4-7[1m]
The pre-launch checklist added in the deployment guide uses GitHub-flavored task-list syntax (`- [ ] item`), but VitePress's stock markdown-it config does not recognize it and renders the `[ ]` literally. Install and register @hackmd/markdown-it-task-lists so the checklist actually shows as checkboxes, matching how the same syntax already displays on the GitHub README. Future docs that want to use task lists will work without further configuration. Assisted-by: Claude Code:claude-opus-4-7[1m]
The docs convention (CLAUDE.md, docs/README.md) reserves backticks for
code-level identifiers and italicizes file names and paths. Two spots in
the deployment guide slipped through as backticks:
- The *fedify@.service* systemd template file name.
- A reference to *.gitignore* used as a verb, which was also awkward
English; rephrased to "list .env in your .gitignore".
Assisted-by: Claude Code:claude-opus-4-7[1m]
The "Running the process" section previously showed only @hono/node-server as the Node.js HTTP adapter. srvx (https://srvx.h3.dev/) uses the same web-standard fetch-handler interface but works across Node.js, Deno, and Bun without modification, making it a useful cross-runtime option. Changes: - Runtime selection section: mention srvx alongside @hono/node-server as an adapter option for Node.js - "Running the process": replace the single @hono/node-server code block with a ::: code-group showing both @hono/node-server and srvx examples - Add [srvx] reference-style link definition - Add srvx ^0.11.15 to docs/package.json so twoslash type-checks the import Closes fedify-dev#689 (partial) Assisted-by: Claude Code:claude-opus-4-7[1m]
[`~MessageQueue.nativeRetrial`] was used as a reference-style link (both as the link label in prose and as the definition key), but the ~ prefix is markdown-it-jsr-ref extension syntax and must not appear in manually written reference labels or definitions. Strip the ~ from all three occurrences (two uses + one definition). Inline code spans like `~MessageQueue.nativeRetrial` that rely on the jsr-ref auto-link extension are unaffected. Assisted-by: Claude Code:claude-opus-4-7[1m]
…; wrap HTTP status codes in backticks - All **bold** emphasis converted to *italics* to match the convention used throughout the rest of the manual - Pre-launch checklist group labels (Configuration, Scaling, Runtime and infrastructure, Security, Observability, ActivityPub) promoted from italic prose to ### ATX headings so they appear in the page outline - HTTP status codes (410, 404, 403, 410 Gone, 404 Not Found) wrapped in backticks throughout prose for consistent code-style formatting Assisted-by: Claude Code:claude-opus-4-7[1m]
…o proper nouns Slash-separated terms in prose and table cells now use no surrounding spaces (MySQL/MariaDB, Single-node/embedded, Docker Compose/Podman Compose, AWS ECS/AWS EKS, Render/Railway). Docker Compose and Podman Compose are added to .hongdown.toml proper_nouns so that hongdown's sentence-case normalizer preserves the capital C in both product names. Assisted-by: Claude Code:claude-opus-4-7[1m]
- "Dockerfile" in prose was missing italics in two sentences; fixed - Kubernetes resource type names (ConfigMap, Secret, Ingress, HorizontalPodAutoscaler, StatefulSet, PVC, Deployment, Service) were in italics or plain text; converted to backtick code spans throughout the Kubernetes section as they are API identifiers, not prose words Assisted-by: Claude Code:claude-opus-4-7[1m]
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded two proper-noun entries ("Docker Compose", "Podman Compose") to heading config, enabled Markdown task-list parsing in VitePress and added dev deps, and replaced the short deployment overview with a comprehensive production deployment runbook (~1.3k lines added). Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly expands the production deployment guide, providing detailed instructions for various runtimes, reverse proxies, containerization, and serverless platforms, alongside security and observability best practices. It also enables task list support in the documentation and updates the proper nouns configuration. Feedback was provided to improve the Dockerfile example by using corepack for pnpm installation and to ensure consistent API referencing for documentation links.
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…tusCodes Soften DNS-rebinding language: "defeats DNS-rebinding attacks" overstates what validatePublicUrl() guarantees, since the actual fetch() call will re-resolve DNS. Changed both occurrences to "helps mitigate DNS-rebinding-based SSRF". Addresses Copilot review on line 1104. Add ~FederationOptions. prefix to bare permanentFailureStatusCodes reference at line 1324 so the jsr-ref plugin generates a cross-reference link, consistent with the other two uses of the same option in the file. Addresses gemini-code-assist review on line 1321. Assisted-by: Claude Code:claude-opus-4-7[1m]
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request provides a comprehensive overhaul of the deployment documentation, adding detailed guidance on runtimes, reverse proxies, containerization, serverless platforms, and security practices like XSS and SSRF protection. It also introduces task list support to the documentation site and updates the proper nouns configuration. Feedback was provided regarding the sanitize-html code example, where the target attribute needs to be explicitly allowed to prevent it from being stripped during transformation.
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Addresses Copilot review on line 204. Assisted-by: Claude Code:claude-opus-4-7[1m]
transformTags sets target="_blank" on every link, but sanitize-html strips attributes that are not explicitly listed in allowedAttributes. Adding "target" to the a entry makes the example match its actual output. Addresses Copilot review on line 1034 and gemini-code-assist review on line 1025. Assisted-by: Claude Code:claude-opus-4-7[1m]
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly expands the production deployment documentation for Fedify, providing a comprehensive guide that covers runtime selection, configuration fundamentals, and various deployment strategies including traditional servers, containers, and serverless platforms. It also introduces detailed sections on security (XSS and SSRF protection), observability, and ActivityPub-specific operational concerns like domain permanence and graceful shutdown. Additionally, the VitePress configuration is updated to support task lists, and the srvx package is added as a dependency. I have no feedback to provide.
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Three incorrect uses of markdown-it-jsr-ref notation: - FederationOptions.manuallyStartQueue: true — the ~A.b syntax cannot carry a value suffix like `: true`, so the ~ prefix was removed - getAuthenticatedDocumentLoader() — this is a standalone function, not a Context method; ~Context.getAuthenticatedDocumentLoader() referred to a nonexistent member, so the Context. prefix was dropped - Context.getDocumentLoader() — ~ on A.b elides the A. in display, making it hard for readers to know which class the method belongs to; keeping the qualified form without ~ is clearer here Relates to fedify-dev#689 Assisted-by: Claude Code:claude-sonnet-4-6[2m]
|
Pre-release has been published for this pull request: Packages
DocumentationThe docs for this pull request have been published: |
Closes #689.
Read it at https://0dd31e42.fedify.pages.dev/manual/deploy.
Background
docs/manual/deploy.md described how to wire up platform-specific packages but said nothing about choosing a runtime, hardening the app, keeping delivery healthy at scale, or shutting down a server without orphaning followers. Issue #689 identified these gaps and proposed splitting the content into a top-level Deployment nav category with separate pages for traditional deployments, Cloudflare Workers, Deno Deploy, security, reliability, and observability.
This PR rewrites docs/manual/deploy.md in place instead. Keeping it as a single page preserves the existing URL and sidebar entry, so external links keep working. The structural split remains a viable follow-up once the content has settled.
What's covered
The rewritten page starts with a runtime selection guide (Node.js, Deno, Bun with its memory-leak caveat, Cloudflare Workers), then covers configuration fundamentals that apply to every deployment target: pinning a canonical origin,
x-forwarded-fetchfor multi-domain setups, a KV/MQ backend selection table, and actor key lifecycle.The bulk of the guide covers traditional and container deployments (systemd units, nginx and Caddy reverse-proxy configurations, Node.js and Deno Dockerfiles, Docker Compose/Podman Compose, Kubernetes), web/worker separation, and serverless platforms (Cloudflare Workers and Deno Deploy EA vs. Classic).
The final sections cover the topics the old page omitted entirely: security (HTML sanitization for federated content, SSRF coverage, key and secret management), observability (LogTape, OpenTelemetry, Sentry via
@logtape/sentry, metrics to track and alert on), and ActivityPub-specific operations (domain permanence, graceful shutdown runbook withDeletebroadcast andTombstone/410 Gone, inbound failure handling). A pre-launch checklist closes the page.Incidental changes
@hackmd/markdown-it-task-liststo render GFM task-list checkboxes in VitePress.srvxto docs dev dependencies for twoslash type-checking.proper_nouns.