Skip to content

Proposal: RC docs-sync (daily poll + AI-drafted doc PRs)#388

Draft
enricobattocchi wants to merge 4 commits intomainfrom
rc-docs-sync-proposal
Draft

Proposal: RC docs-sync (daily poll + AI-drafted doc PRs)#388
enricobattocchi wants to merge 4 commits intomainfrom
rc-docs-sync-proposal

Conversation

@enricobattocchi
Copy link
Copy Markdown
Member

@enricobattocchi enricobattocchi commented Apr 24, 2026

Proposal — RC docs-sync

A daily workflow that polls Yoast product repos for new RC tags, runs a Claude agent against the developer-portal docs, and opens draft PRs when docs need updating. Initial scope: Yoast SEO (free) only; more products added once this settles.

This is a proposal for review — not ready to merge as-is. Even if merged, nothing fires until the TRACKING_ISSUE_WORDPRESS_SEO repo variable and ANTHROPIC_API_KEY secret are set, and the agent-invocation step in the workflow (currently a placeholder) is wired to a real Claude action.

What this PR adds

  • AGENT_MAP.md (repo root) — source of truth for the feature-area taxonomy: docs paths per area, per-product source-path globs, symbol namespaces, and the Product → source repo table (with human-readable display names). The agent loads this at runtime to triage an RC diff and route proposed changes into PRs.
  • .github/claude-agent/run.md — the agent orchestration prompt. Triage + authoring + PR-creation flow, style rules, authoring discipline, PR title format, and the machine-readable marker for the run-summary comment (which doubles as state for the next run). Also includes a coverage-gap self-reporting step so every RC run surfaces any public surface observed in the diff but not yet covered by AGENT_MAP.md.
  • .github/workflows/rc-docs-sync.yml — the daily workflow. schedule: 0 6 * * * + workflow_dispatch for manual backfill.

How it works (per daily run)

  1. For each opted-in product (just wordpress-seo right now), query the product repo's tags via the public GitHub API — anonymous, no token.
  2. Read the tracking issue's comments on this repo, looking for the most recent machine-readable marker:
    <!-- rc-docs-sync:v1 product=wordpress-seo rc_tag=<tag> -->
    
    That tag is the "last processed." If none is found, seed one for the current latest RC and process nothing historically (first-run safety).
  3. For each RC tag newer than the marker (sorted by version), in chronological order:
    • Shallow-clone Yoast/wordpress-seo at the RC tag (anonymous).
    • Compute full + noise-filtered diffs against the previous stable release.
    • Extract the product's readme.txt changelog entry.
    • Build a symbol index from the current docs/ tree.
    • Invoke the Claude agent with .github/claude-agent/run.md. The agent triages, opens one PR per affected feature area (branch rc-sync/<product>/<rc_tag>/<area>, title Yoast SEO <base-version> — docs(<area>): <title>), surfaces any AGENT_MAP.md coverage gaps it observed, and posts a summary comment back to the tracking issue. The comment's marker is the state for tomorrow's run.
  4. If the filtered diff is empty (only tests/translations/lockfiles changed), post a one-liner no-op comment and move on — no PR, no agent invocation.

Why this architecture

  • No GitHub App, no PAT, no cross-repo secrets. Product repos are public → anonymous cloning; writes to this repo use the built-in GITHUB_TOKEN. The only external secret is ANTHROPIC_API_KEY.
  • Never writes to main. State lives in tracking-issue comments (not a committed state file), so the workflow is indifferent to main being protected.
  • Cloudflare Pages preview is the PR check. The agent doesn't re-run yarn build locally — broken Docusaurus builds fail the CF Pages deploy and surface on the PR.
  • No auto-merge. GITHUB_TOKEN can't approve PRs, and the workflow never calls gh pr merge. Human reviewer is the sole merge gate.

Validation

Three manual spikes were run against past RCs of Yoast/wordpress-seo:

Spike Case Result
A Narrow positive — wpseo_llmstxt_link_description filter in 26.3-RC1 1 PR plan, area llms-txt, authoring semantically matched the ground-truth doc PR.
N Negative — 26.1.1-RC1 bugfix hotfix Correctly returned 0 PR plans with a clean "no doc changes needed" rationale.
C Hard positive — Schema Aggregator in 27.1-RC1 (216 new source files, 11+ new filters, new REST + CLI surfaces) 2 scoped PR plans: schema-aggregator (3 new docs + sidebars.js) and a bonus catch for robots-txt (Schemamap directive + wpseo_disable_robots_schemamap filter). The robots-txt update is something the team actually had to address in follow-up doc commits after the original schema-aggregator PR — the agent would have caught it at RC time.

End-to-end dry-runs of the orchestration prompt on Spikes A and C produced the expected proposed-docs.patch, proposed-sidebars.patch, and run-summary.md artifacts (with correct marker, correct PR-title format, correct area placement).

What still needs to happen before activation

  1. Merge this PR.
  2. Create a tracking issue titled "RC docs-sync audit log — Yoast SEO" in this repo. Pin it. Note its number.
  3. Set repository variable TRACKING_ISSUE_WORDPRESS_SEO = that issue's number (Settings → Secrets and variables → Actions → Variables tab).
  4. Set repository secret ANTHROPIC_API_KEY (coordinated with DevOps; separate short request doc available).
  5. Wire the real agent invocation in the workflow. Right now the step is a placeholder echo "TODO: invoke agent ..." — needs to become a step using Anthropic's Claude Code Action (or equivalent) with the prompt from .github/claude-agent/run.md and the bundled inputs.
  6. Manually dispatch the workflow once against a recent past RC via workflow_dispatch with product=wordpress-seo and an explicit rc_tag to validate end-to-end before relying on the cron.
  7. Let the daily cron do its work thereafter.

Known caveats / TODOs before production

  • The agent invocation in the workflow is currently a placeholder so the overall shape can be reviewed first. Needs a concrete uses: step.
  • Tool allowlist for the agent (scoped to Read, Grep, Glob, Edit, Write, Bash(git *), Bash(gh ...)) should be pinned once the Claude action is chosen.
  • PRODUCTS dict in the workflow currently lists only wordpress-seo. Adding a product = add slug + source repos + display name + tracking-issue var name here and create its tracking issue + variable.

Rollout plan

  • V1 (this PR): wordpress-seo only.
  • V2: once 2–3 RCs look good, add wordpress-seo-premium and wordpress-seo-local.
  • V3+: the remaining documented products — wpseo-news, wpseo-video, wpseo-woocommerce, shopify-seo, duplicate-post. Each is a pure config addition (slug + repo + display name in AGENT_MAP.md, tracking issue created, repo variable set).

🤖 Planning and artifact drafting assisted by Claude.

Adds the plumbing for a daily GitHub Action that polls Yoast product repos
for new RC tags, runs a Claude agent against the developer-portal docs, and
opens draft PRs where doc updates are warranted.

Scope for this first phase: Yoast SEO (free) only. More products added
iteratively by extending AGENT_MAP.md and the PRODUCTS dict in the workflow.

Architecture:
- No GitHub App or PAT required; product repos are public so anonymous
  cloning works and all writes to this repo use GITHUB_TOKEN.
- Never writes to main; state lives in tracking-issue comments, identified
  by a machine-readable marker embedded in every run-summary comment.
- Cloudflare Pages preview deploy on PR is the per-PR validation.
- PRs are never auto-merged; branch protection's PR-review rule is the gate.

Validated through three manual spikes (narrow positive, negative hotfix,
multi-file new feature) plus end-to-end dry-runs of the orchestration prompt.

Activation requirements (handled post-merge, see PR body): create a tracking
issue, set TRACKING_ISSUE_WORDPRESS_SEO repo variable, set ANTHROPIC_API_KEY
secret (coordinated with devops).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 24, 2026

Deploying yoast-developer with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8766e1c
Status: ✅  Deploy successful!
Preview URL: https://181a9e67.yoast-developer.pages.dev
Branch Preview URL: https://rc-docs-sync-proposal.yoast-developer.pages.dev

View logs

enricobattocchi and others added 3 commits April 24, 2026 17:18
Two refinements in response to review:

- Agent now detects and reports "AGENT_MAP.md coverage gaps" during every RC
  run. A coverage gap is a hunk that looks like public surface (new
  apply_filters / do_action, new REST route, new top-level src/<subsystem>/
  file with public classes) whose path or symbol isn't covered by any area's
  source_paths or symbol_namespaces. Listed in the run-summary comment under
  a "Coverage gaps observed" section only when present. Informational; does
  not block the run. Turns every RC into a free audit of the map.

- Removed AI Brand Insights from AGENT_MAP.md's Product table and from the
  `ai` area's product list. Rationale: the developer portal currently has
  no feature-spec docs for it (only a changelog), so every docs-sync run on
  the product would reliably produce zero PRs. Keeping it in the map would
  waste compute and review attention on unambiguously no-op runs. Added a
  note in the `ai` area describing how to re-add it (product table entry
  plus source paths for ai-insights-api and ai-insights-frontend, plus a
  split-product rule) when/if feature docs land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the "off-by-default" flag from the duplicate-post area. It has docs
in this repo (docs/duplicate-post/**) and an active release cadence in
Yoast/duplicate-post, so there's no reason to treat it differently from
any other documented product.

Simplify the agent's "never touch" rule to docs/development/** only —
the per-area docs_paths matching already ensures docs/duplicate-post/ is
only touched when PRODUCT=duplicate-post.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The product has no feature docs in this repo and will not gain any, so
per-product notes about how to re-integrate it later are noise. Replaced
the two AI-Brand-Insights-specific notes (Product table + ai area) with
a generic rule stating only products with feature docs belong in the
table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant