docs: add least-privilege deployment roles and deployment guide#46
docs: add least-privilege deployment roles and deployment guide#46scottschreckengaust wants to merge 22 commits intomainfrom
Conversation
Add DEPLOYMENT_ROLES.md with least-privilege IAM policy for the CloudFormation execution role (IaCRole-ABCA), derived from analysis of all CDK constructs and handler code in the current single-stack architecture. Includes optional ECS statements when Fargate is enabled. Add DEPLOYMENT_GUIDE.md covering compute backend choices (AgentCore vs opt-in ECS Fargate via ComputeStrategy), scale-to-zero analysis, and complete AWS services inventory. Update COST_MODEL.md with scale-to-zero characteristics section, corrected baseline to ~$85-95/month, and updated references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f0c077f to
9babf85
Compare
Append new references at the bottom instead of reordering the existing list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original had COMPUTE.md listed twice intentionally — once for the network architecture section and once for compute billing. Restore this pattern instead of merging into one entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single entry with anchor link to the network architecture section instead of listing the same file twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use AWS-native IAM Access Analyzer policy generation instead of third-party tooling for iterative policy tightening. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sync-starlight.mjs script generates mirror files under docs/src/content/docs/ from source docs. These generated files were missing from prior commits, causing the CI mutation check to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR#46 build failed because Starlight mirror files under docs/src/content/docs/ were not regenerated after editing source docs. The pre-commit hooks had no step to catch this locally. - Add `docs-sync` pre-commit hook that auto-runs sync-starlight.mjs and stages the generated mirrors when docs sources change - Strengthen AGENTS.md boundary and common-mistakes sections to explicitly warn that CI rejects stale mirrors and name the exact command to regenerate them Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ODEL - Session timeout: 8 hours → 9 hours (matches task-orchestrator.ts:173) - Concurrency limit: 2 → 3 (matches task-orchestrator.ts:163 default) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents local plugin state from the remember and MCP plugins from being tracked in version control. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l notes On a fresh AWS account, `aws xray update-trace-segment-destination` fails with AccessDeniedException because X-Ray needs a CloudWatch Logs resource policy before it can write spans. Added the prerequisite `aws logs put-resource-policy` command to Quick Start Step 3. Also documented that `mise run build` requires AWS credentials with ec2:DescribeAvailabilityZones for CDK synthesis, and added common error table entries for the X-Ray, build credential, and non-TTY deploy issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ref to /deploy The /setup skill's Phase 3 only ran `aws xray update-trace-segment-destination` which fails with AccessDeniedException on fresh accounts. Added the prerequisite `aws logs put-resource-policy` command. Added a "Least-Privilege Deployment" section to the /deploy skill linking to DEPLOYMENT_ROLES.md with the re-bootstrap command for scoped execution policies. Updated CLAUDE.md to reference the abca-plugin and its available skills so Claude Code sessions discover the guided workflows without requiring --plugin-dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
903fa5f to
544cbf2
Compare
Replace the single monolithic IAM policy (which exceeded the 6,144-char IAM managed policy limit) with three validated policies: - IaCRole-ABCA-Infrastructure (CFN, IAM, VPC, DNS Firewall) - IaCRole-ABCA-Application (DDB, Lambda, APIGW, Cognito, WAF, EB, SM) - IaCRole-ABCA-Observability (Bedrock, CW, X-Ray, S3, ECR, KMS, SSM, STS) All three policies were validated against a live deployment in us-east-1 (create, update, task execution, and destroy). CloudTrail analysis found 36 additional actions beyond the initial code review, and 7 deployment iterations refined the policies. Key additions: - KMS (entirely missing from original) - lambda:InvokeFunction for AwsCustomResource - bedrock-agentcore:* (CFN handler uses internal action names) - Legacy CW Logs delivery actions for Route53 Resolver - Various Describe/List/Get actions for read-only CFN operations Updated the origin disclaimer, Resource-level permission constraints table, and ECS section to reference the Application policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify in the ECS section that adding the ECS statement to IaCRole-ABCA-Application keeps the combined policy under the 6,144-character IAM managed policy limit (4,212 of 6,144 chars). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall this is a high-quality, deployment-validated PR that replaces AdministratorAccess with scoped IAM policies — a significant security improvement. However, all three reviewers flagged issues Critical (must fix)
File: docs/design/DEPLOYMENT_ROLES.md (and Starlight mirror) The SecretsManager statement lists "*" as a second resource alongside the scoped ARN, completely negating least-privilege. Actions like GetSecretValue, DeleteSecret, and PutSecretValue become Fix: Split GetRandomPassword (the only action that requires "") into its own statement:
File: docs/abca-plugin/skills/deploy/SKILL.md The skill tells users to bootstrap with a single IaCRole-ABCA-Policy, but DEPLOYMENT_ROLES.md defines three policies (-Infrastructure, -Application, -Observability). Users following this skill will Fix: Update to reference all three --cloudformation-execution-policies flags, matching DEPLOYMENT_ROLES.md.
File: docs/src/content/docs/architecture/Cost-model.md The Starlight mirror of COST_MODEL.md contains Deployment guide as a raw relative path because the sync script has no route mapping for it. This is a broken link on Fix: Either add DEPLOYMENT_GUIDE to explicitGuideRoutes in sync-starlight.mjs and create a mirror, or remove the link from content that gets mirrored. High (strongly recommend fixing)
File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement Without an iam:PassedToService condition on PassRole and no policy restriction on AttachRolePolicy, the execution role could create a backgroundagent-dev-* role, attach AdministratorAccess to it, pass Fix: Add iam:PassedToService condition limiting to lambda.amazonaws.com, ecs-tasks.amazonaws.com, apigateway.amazonaws.com, logs.amazonaws.com, bedrock.amazonaws.com.
File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement Combined with iam:CreateServiceLinkedRole, the resource arn:aws:iam:::role/aws-service-role/ allows creating service-linked roles for any AWS service. Fix: Add an iam:AWSServiceName condition scoped to only the services ABCA actually uses.
The CDK bootstrap key alias (alias/cdk-hnb659fds-*) is deterministic. Consider adding a kms:ResourceAliases condition to scope this.
Fix: Scope to arn:aws:logs::ACCOUNT_ID:log-group:aws/spans and arn:aws:logs::ACCOUNT_ID:log-group:aws/spans:*.
The bootstrap command uses ACCOUNT, the trust policy uses ACCOUNT_ID, and the IAM ARNs use * for the account field. Should be unified with a single ACCOUNT_ID placeholder and a note at the top Medium (should fix)
Files: QUICK_START.md, setup/SKILL.md (and Starlight mirror) ACCOUNT_ID=$(aws sts get-caller-identity ...) is set but never referenced. Either remove it or use it to scope the resource policy ARN.
These are actually two different limits (AgentCore service limit vs. orchestrator timeout), but this needs explicit clarification.
The PR claims ~$50/month for 7 endpoints across 2 AZs. AWS pricing: 7 x 2 AZs x $0.01/hr x 730 hrs = ~$102/month. This would push the total baseline to ~$135-145/month. (Pre-existing issue in
All ARNs use * for region despite ABCA deploying to a single region. The "Iterative tightening" section mentions this but it could be a stronger recommendation. |
…constraints table GetRandomPassword is an account-level API with no secret ARN, so it requires Resource:"*". Document this in the Resource-level permission constraints table alongside other services that require "*". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skill referenced a non-existent IaCRole-ABCA-Policy. Update to the three actual policy names (Infrastructure, Application, Observability) matching DEPLOYMENT_ROLES.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit route mapping, mirrorMarkdownFile call, and sidebar entry so the Deployment Guide renders on the docs site and cross-doc links from COST_MODEL.md resolve correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The
Fixed — updated to reference all three policies (
Fixed — added |
Isolate the account-level GetRandomPassword action (which requires Resource:*) from the scoped SecretsManager statement. With ECS the Application policy is still only ~4K of the 6,144-char IAM limit, leaving ~2K headroom for future services. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…otes Separate iam:PassRole into its own statement with iam:PassedToService condition limiting to the 7 services ABCA passes roles to. Add iterative tightening items for AttachRolePolicy (iam:PolicyARN) and CreateServiceLinkedRole (iam:AWSServiceName) conditions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ceholders - Scope X-Ray resource policy Resource from * to arn:aws:logs:*:ACCOUNT_ID:log-group:aws/spans in QUICK_START.md and setup SKILL.md (item 7) - Add KMS kms:ResourceAliases tightening recommendation (item 6) - Unify placeholder to ACCOUNT_ID everywhere with substitution note (item 8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…timeouts VPC endpoint cost was ~$50/mo (1 AZ math), actual is ~$102/mo (7 endpoints x 2 AZs x $0.01/hr x 730 hrs). Update baseline totals from ~$85-95 to ~$140-150 in COST_MODEL.md and DEPLOYMENT_GUIDE.md. Clarify the two distinct timeout limits: AgentCore 8-hour service limit vs orchestrator 9-hour executionTimeout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Separated For
Added as Iterative tightening item #5 — recommending
Added as Iterative tightening item #6 — recommending
Fixed — scoped to
Unified to
Keeping as-is — the variable is now used by the scoped X-Ray resource policy (item 7 fix).
Clarified that these are two distinct limits:
Updated both DEPLOYMENT_GUIDE.md and COST_MODEL.md to distinguish the two explicitly.
Fixed — the
Agree this could be stronger. All policy ARNs use |
V1: Admin / No ECS — Create⏳ In progress — CDK bootstrap complete, Pre-deploy setup:
Note: The scoped X-Ray resource policy ( |
V1: Admin / No ECS — Create✅ Passed — Stack Note: Had to delete an orphaned Now running smoke test task (CODEOWNERS file creation). |
V1: Admin / No ECS — Task✅ Passed — Task
|
V1: Admin / No ECS — Update✅ Passed — Stack reached |
V1: Admin / No ECS — Destroy⏳ In progress (retry) — First delete attempt hit the known AgentCore ENI cleanup timing issue (security group + 2 subnets retained by |
V1: Admin / No ECS — Destroy✅ Passed (with known ENI retry) — Stack deleted. AgentCore-managed ENIs ( V1 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅). |
V2: Admin / ECS — Create✅ Passed — Stack reached |
V2: Admin / ECS — Task✅ Passed — Task completed in 130s, $0.16.
|
V2: Admin / ECS — Update✅ Passed — Stack reached |
Summary
us-east-1). The original single monolithic policy was replaced with a 3-way split to stay under the IAM managed policy 6,144-character limit:IaCRole-ABCA-Infrastructure— CloudFormation, IAM, VPC, Route 53 Resolver DNS FirewallIaCRole-ABCA-Application— DynamoDB, Lambda, API Gateway, Cognito, WAFv2, EventBridge, Secrets Manager (+ optional ECS)IaCRole-ABCA-Observability— Bedrock AgentCore, Bedrock Guardrails, CloudWatch, X-Ray, S3, KMS, ECR, SSM, STSupdate-trace-segment-destinationfails on fresh accounts without a CloudWatch Logs resource policy — added prerequisiteaws logs put-resource-policycommandmise run buildfails without AWS credentials (CDK synth does AZ lookups) — added note and common error entryAWS_PROFILEguidance for multi-profile users/setupskill Phase 3 was missing thelogs put-resource-policyprerequisite (same X-Ray bug)/deployskill had no least-privilege guidance — added section with re-bootstrap command and reference toDEPLOYMENT_ROLES.mdCLAUDE.mdhad no reference to the plugin — added pointer so sessions discover guided workflowsdocs/guides/DEPLOYMENT_GUIDE.mdcovering architecture, scale-to-zero analysis (~$140-150/month idle), and complete AWS services inventorydocs/design/COST_MODEL.mdwith corrected baseline, scale-to-zero section, and updated references.gitignoreentries for Claude Code plugin artifacts (.mcp.json,.remember/)docs-syncpre-commit hook to auto-regenerate Starlight mirrorsReview feedback addressed
Changes made after code review:
Resource: "*"GetRandomPasswordinto own statement; all other actions scoped tobackgroundagent-*IaCRole-ABCA-PolicyDEPLOYMENT_GUIDE.mdhas no Starlight mirroriam:PassRolewithout conditionsIAMPassRolestatement withiam:PassedToServicecondition (7 services);AttachRolePolicyrestriction added as iterative tightening itemaws-service-role/*allows any service-linked roleiam:AWSServiceNameas iterative tightening itemkms:CreateGrantonResource: "*"kms:ResourceAliasesas iterative tightening itemResource: "*"arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spansACCOUNT_IDwith substitution noteACCOUNT_IDcaptured but never usedexecutionTimeoutDeployment validation test matrix
Four configurations are tested across the full stack lifecycle (create → task → update → destroy):
AdministratorAccess(default bootstrap)AdministratorAccessLifecycle steps per variation
Each variation runs through:
cdk deployfrom clean state (no existing stack). Validates all resource creation permissions.cdk destroyto tear down all resources. Validates deletion permissions.Pre-test setup (once per account)
--cloudformation-execution-policiesfor V3/V4)aws logs put-resource-policy)Pass criteria
CREATE_COMPLETEwith no permission errors in CloudFormation eventsCOMPLETEDstatus with a PR URL in the outputUPDATE_COMPLETEDELETE_COMPLETE(AgentCore ENI timing retries are acceptable)Progress is reported as individual PR comments (one per lifecycle step per variation, ~16 total).
Files changed
docs/design/DEPLOYMENT_ROLES.mdIAMPassRolewithPassedToServicecondition; split SecretsManager; iterative tightening items for AttachRolePolicy, CreateServiceLinkedRole, KMS; unifiedACCOUNT_IDplaceholderdocs/guides/DEPLOYMENT_GUIDE.mddocs/guides/QUICK_START.mdaws/spans; build credential note;AWS_PROFILEguidance; 4 new common errorsdocs/design/COST_MODEL.mddocs/abca-plugin/skills/setup/SKILL.mddocs/abca-plugin/skills/deploy/SKILL.mddocs/scripts/sync-starlight.mjsDEPLOYMENT_GUIDEroute mapping and mirrordocs/astro.config.mjsCLAUDE.mdAGENTS.md.gitignore.mcp.json,.remember/.pre-commit-config.yamldocs-syncpre-commit hookTest plan
AdministratorAccesson clean account — passed (initial validation)🤖 Generated with Claude Code