Skip to content

docs: add least-privilege deployment roles and deployment guide#46

Open
scottschreckengaust wants to merge 22 commits intomainfrom
least-privilege-deployment-documentation
Open

docs: add least-privilege deployment roles and deployment guide#46
scottschreckengaust wants to merge 22 commits intomainfrom
least-privilege-deployment-documentation

Conversation

@scottschreckengaust
Copy link
Copy Markdown
Contributor

@scottschreckengaust scottschreckengaust commented Apr 23, 2026

Summary

  • Validated least-privilege IAM policies for the CloudFormation execution role through end-to-end deployment testing on a clean AWS account (us-east-1). The original single monolithic policy was replaced with a 3-way split to stay under the IAM managed policy 6,144-character limit:
    • IaCRole-ABCA-Infrastructure — CloudFormation, IAM, VPC, Route 53 Resolver DNS Firewall
    • IaCRole-ABCA-Application — DynamoDB, Lambda, API Gateway, Cognito, WAFv2, EventBridge, Secrets Manager (+ optional ECS)
    • IaCRole-ABCA-Observability — Bedrock AgentCore, Bedrock Guardrails, CloudWatch, X-Ray, S3, KMS, ECR, SSM, STS
  • Fixed Quick Start deployment blockers found during end-to-end walkthrough:
    • X-Ray update-trace-segment-destination fails on fresh accounts without a CloudWatch Logs resource policy — added prerequisite aws logs put-resource-policy command
    • mise run build fails without AWS credentials (CDK synth does AZ lookups) — added note and common error entry
    • Added common error entries for non-TTY deploy approval and build credential issues
    • Added AWS_PROFILE guidance for multi-profile users
  • Fixed abca-plugin gaps discovered during deep review:
    • /setup skill Phase 3 was missing the logs put-resource-policy prerequisite (same X-Ray bug)
    • /deploy skill had no least-privilege guidance — added section with re-bootstrap command and reference to DEPLOYMENT_ROLES.md
    • CLAUDE.md had no reference to the plugin — added pointer so sessions discover guided workflows
  • Add docs/guides/DEPLOYMENT_GUIDE.md covering architecture, scale-to-zero analysis (~$140-150/month idle), and complete AWS services inventory
  • Update docs/design/COST_MODEL.md with corrected baseline, scale-to-zero section, and updated references
  • Add .gitignore entries for Claude Code plugin artifacts (.mcp.json, .remember/)
  • Add docs-sync pre-commit hook to auto-regenerate Starlight mirrors

Review feedback addressed

Changes made after code review:

# Issue Resolution
1 SecretsManager Resource: "*" Split GetRandomPassword into own statement; all other actions scoped to backgroundagent-*
2 Deploy SKILL.md references non-existent IaCRole-ABCA-Policy Updated to reference all three policy names
3 DEPLOYMENT_GUIDE.md has no Starlight mirror Added route mapping, mirror, and sidebar entry
4 iam:PassRole without conditions Added IAMPassRole statement with iam:PassedToService condition (7 services); AttachRolePolicy restriction added as iterative tightening item
5 aws-service-role/* allows any service-linked role Added iam:AWSServiceName as iterative tightening item
6 KMS kms:CreateGrant on Resource: "*" Added kms:ResourceAliases as iterative tightening item
7 X-Ray resource policy grants Resource: "*" Scoped to arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans
8 Inconsistent placeholder names Unified to ACCOUNT_ID with substitution note
9 ACCOUNT_ID captured but never used Now used by scoped X-Ray resource policy (item 7)
10 Session timeout "8 hours" vs "9 hours" Clarified: 8h = AgentCore service limit, 9h = orchestrator executionTimeout
11 VPC endpoint cost underestimated Corrected from ~$50/mo to ~$102/mo (7×2 AZs×$0.01/hr×730hrs); baseline updated to ~$140-150/mo
12 Region wildcards Kept as iterative tightening item #3 (users deploy to different regions)

Deployment validation test matrix

Four configurations are tested across the full stack lifecycle (create → task → update → destroy):

# IAM Policy ECS Compute Purpose
V1 AdministratorAccess (default bootstrap) Off (AgentCore only) Baseline — confirms stack works with default permissions
V2 AdministratorAccess On (ECS Fargate) Baseline with ECS — confirms ECS compute path works
V3 Least-privilege (3-way split) Off (AgentCore only) Primary validation — confirms scoped policies are sufficient
V4 Least-privilege (3-way split) On (ECS Fargate) Confirms ECS statement + scoped policies work together

Lifecycle steps per variation

Each variation runs through:

  1. Create — Fresh cdk deploy from clean state (no existing stack). Validates all resource creation permissions.
  2. Task — Submit a coding task via CLI, wait for agent to complete and open a PR. Validates runtime permissions (Secrets Manager read, DynamoDB, AgentCore/ECS invocation).
  3. Update — Modify a Blueprint parameter and redeploy. Validates stack update permissions (resource modifications, not just creation).
  4. Destroycdk destroy to tear down all resources. Validates deletion permissions.

Pre-test setup (once per account)

  • CDK bootstrap (default for V1/V2, re-bootstrap with --cloudformation-execution-policies for V3/V4)
  • X-Ray resource policy (aws logs put-resource-policy)
  • GitHub PAT stored in Secrets Manager (post first deploy)
  • Cognito user created (post first deploy)

Pass criteria

  • Create: Stack reaches CREATE_COMPLETE with no permission errors in CloudFormation events
  • Task: Task reaches COMPLETED status with a PR URL in the output
  • Update: Stack reaches UPDATE_COMPLETE
  • Destroy: Stack reaches DELETE_COMPLETE (AgentCore ENI timing retries are acceptable)

Progress is reported as individual PR comments (one per lifecycle step per variation, ~16 total).

Files changed

File What Changed
docs/design/DEPLOYMENT_ROLES.md 3-way policy split; IAMPassRole with PassedToService condition; split SecretsManager; iterative tightening items for AttachRolePolicy, CreateServiceLinkedRole, KMS; unified ACCOUNT_ID placeholder
docs/guides/DEPLOYMENT_GUIDE.md New: architecture, scale-to-zero (~$140-150/mo), AWS services inventory; corrected VPC endpoint cost; clarified session timeouts
docs/guides/QUICK_START.md X-Ray resource policy scoped to aws/spans; build credential note; AWS_PROFILE guidance; 4 new common errors
docs/design/COST_MODEL.md Corrected VPC endpoint cost (~$102/mo); baseline updated to ~$140-150/mo; clarified session timeouts; scale-to-zero section
docs/abca-plugin/skills/setup/SKILL.md Scoped X-Ray resource policy
docs/abca-plugin/skills/deploy/SKILL.md Updated to 3-way policy split
docs/scripts/sync-starlight.mjs Added DEPLOYMENT_GUIDE route mapping and mirror
docs/astro.config.mjs Added Deployment Guide sidebar entry
CLAUDE.md Added plugin reference
AGENTS.md Strengthened docs-sync instructions
.gitignore Added .mcp.json, .remember/
.pre-commit-config.yaml Added docs-sync pre-commit hook

Test plan

  • Deploy with AdministratorAccess on clean account — passed (initial validation)
  • CloudTrail analysis of all CFN execution role actions — 36 gaps identified
  • Deploy with scoped 3-way policy split — passed after 7 iterations (initial validation)
  • Stack update with scoped policies — passed (initial validation)
  • Smoke test (task submission, PR creation) — passed (initial validation)
  • Stack destroy with scoped policies — passed with AgentCore ENI retry (initial validation)
  • ECS statement fits in Application policy under size limit — verified (4,110 / 6,144 chars with ECS + split SM)
  • Verify all markdown links resolve between new and existing docs
  • Starlight mirrors regenerated and astro check passed
  • V1: Admin / No ECS — create → task → update → destroy
  • V2: Admin / ECS — create → task → update → destroy
  • V3: Least-priv / No ECS — create → task → update → destroy
  • V4: Least-priv / ECS — create → task → update → destroy

🤖 Generated with Claude Code

@scottschreckengaust scottschreckengaust requested a review from a team as a code owner April 23, 2026 15:17
@scottschreckengaust scottschreckengaust changed the base branch from main to feat/fargate-agent-stack April 23, 2026 15:30
Add DEPLOYMENT_ROLES.md with least-privilege IAM policy for the
CloudFormation execution role (IaCRole-ABCA), derived from analysis
of all CDK constructs and handler code in the current single-stack
architecture. Includes optional ECS statements when Fargate is enabled.

Add DEPLOYMENT_GUIDE.md covering compute backend choices (AgentCore
vs opt-in ECS Fargate via ComputeStrategy), scale-to-zero analysis,
and complete AWS services inventory.

Update COST_MODEL.md with scale-to-zero characteristics section,
corrected baseline to ~$85-95/month, and updated references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the least-privilege-deployment-documentation branch from f0c077f to 9babf85 Compare April 23, 2026 15:42
@scottschreckengaust scottschreckengaust changed the base branch from feat/fargate-agent-stack to main April 23, 2026 15:42
scottschreckengaust and others added 4 commits April 23, 2026 15:46
Append new references at the bottom instead of reordering the
existing list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original had COMPUTE.md listed twice intentionally — once for
the network architecture section and once for compute billing. Restore
this pattern instead of merging into one entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single entry with anchor link to the network architecture section
instead of listing the same file twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use AWS-native IAM Access Analyzer policy generation instead of
third-party tooling for iterative policy tightening.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread docs/design/COST_MODEL.md Outdated
scottschreckengaust and others added 3 commits April 23, 2026 09:45
The sync-starlight.mjs script generates mirror files under
docs/src/content/docs/ from source docs. These generated files were
missing from prior commits, causing the CI mutation check to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR#46 build failed because Starlight mirror files under
docs/src/content/docs/ were not regenerated after editing source docs.
The pre-commit hooks had no step to catch this locally.

- Add `docs-sync` pre-commit hook that auto-runs sync-starlight.mjs and
  stages the generated mirrors when docs sources change
- Strengthen AGENTS.md boundary and common-mistakes sections to
  explicitly warn that CI rejects stale mirrors and name the exact
  command to regenerate them

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust marked this pull request as draft April 23, 2026 17:08
scottschreckengaust and others added 4 commits April 23, 2026 17:16
…ODEL

- Session timeout: 8 hours → 9 hours (matches task-orchestrator.ts:173)
- Concurrency limit: 2 → 3 (matches task-orchestrator.ts:163 default)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents local plugin state from the remember and MCP plugins from
being tracked in version control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l notes

On a fresh AWS account, `aws xray update-trace-segment-destination`
fails with AccessDeniedException because X-Ray needs a CloudWatch Logs
resource policy before it can write spans. Added the prerequisite
`aws logs put-resource-policy` command to Quick Start Step 3.

Also documented that `mise run build` requires AWS credentials with
ec2:DescribeAvailabilityZones for CDK synthesis, and added common error
table entries for the X-Ray, build credential, and non-TTY deploy issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ref to /deploy

The /setup skill's Phase 3 only ran `aws xray update-trace-segment-destination`
which fails with AccessDeniedException on fresh accounts. Added the prerequisite
`aws logs put-resource-policy` command.

Added a "Least-Privilege Deployment" section to the /deploy skill linking to
DEPLOYMENT_ROLES.md with the re-bootstrap command for scoped execution policies.

Updated CLAUDE.md to reference the abca-plugin and its available skills so
Claude Code sessions discover the guided workflows without requiring
--plugin-dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the least-privilege-deployment-documentation branch from 903fa5f to 544cbf2 Compare April 23, 2026 20:36
scottschreckengaust and others added 3 commits April 23, 2026 23:03
Replace the single monolithic IAM policy (which exceeded the 6,144-char
IAM managed policy limit) with three validated policies:
- IaCRole-ABCA-Infrastructure (CFN, IAM, VPC, DNS Firewall)
- IaCRole-ABCA-Application (DDB, Lambda, APIGW, Cognito, WAF, EB, SM)
- IaCRole-ABCA-Observability (Bedrock, CW, X-Ray, S3, ECR, KMS, SSM, STS)

All three policies were validated against a live deployment in us-east-1
(create, update, task execution, and destroy). CloudTrail analysis found
36 additional actions beyond the initial code review, and 7 deployment
iterations refined the policies. Key additions:
- KMS (entirely missing from original)
- lambda:InvokeFunction for AwsCustomResource
- bedrock-agentcore:* (CFN handler uses internal action names)
- Legacy CW Logs delivery actions for Route53 Resolver
- Various Describe/List/Get actions for read-only CFN operations

Updated the origin disclaimer, Resource-level permission constraints
table, and ECS section to reference the Application policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify in the ECS section that adding the ECS statement to
IaCRole-ABCA-Application keeps the combined policy under the
6,144-character IAM managed policy limit (4,212 of 6,144 chars).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust marked this pull request as ready for review April 24, 2026 19:29
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 24, 2026

Overall this is a high-quality, deployment-validated PR that replaces AdministratorAccess with scoped IAM policies — a significant security improvement. However, all three reviewers flagged issues
that should be addressed before merge.


Critical (must fix)

  1. SecretsManager Resource includes bare "*" — grants full account access to all secrets

File: docs/design/DEPLOYMENT_ROLES.md (and Starlight mirror)

The SecretsManager statement lists "*" as a second resource alongside the scoped ARN, completely negating least-privilege. Actions like GetSecretValue, DeleteSecret, and PutSecretValue become
account-wide.

Fix: Split GetRandomPassword (the only action that requires "") into its own statement:
{
"Sid": "SecretsManager",
"Action": ["secretsmanager:CreateSecret", "...all others except GetRandomPassword..."],
"Resource": "arn:aws:secretsmanager:
:㊙️backgroundagent-"
},
{
"Sid": "SecretsManagerAccountLevel",
"Action": "secretsmanager:GetRandomPassword",
"Resource": "*"
}

  1. Deploy SKILL.md references non-existent policy name IaCRole-ABCA-Policy

File: docs/abca-plugin/skills/deploy/SKILL.md

The skill tells users to bootstrap with a single IaCRole-ABCA-Policy, but DEPLOYMENT_ROLES.md defines three policies (-Infrastructure, -Application, -Observability). Users following this skill will
get a deployment failure.

Fix: Update to reference all three --cloudformation-execution-policies flags, matching DEPLOYMENT_ROLES.md.

  1. DEPLOYMENT_GUIDE.md has no Starlight mirror — broken link on docs site

File: docs/src/content/docs/architecture/Cost-model.md

The Starlight mirror of COST_MODEL.md contains Deployment guide as a raw relative path because the sync script has no route mapping for it. This is a broken link on
the deployed Starlight site.

Fix: Either add DEPLOYMENT_GUIDE to explicitGuideRoutes in sync-starlight.mjs and create a mirror, or remove the link from content that gets mirrored.


High (strongly recommend fixing)

  1. iam:PassRole and iam:AttachRolePolicy without conditions — privilege escalation path

File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement

Without an iam:PassedToService condition on PassRole and no policy restriction on AttachRolePolicy, the execution role could create a backgroundagent-dev-* role, attach AdministratorAccess to it, pass
it to Lambda, and invoke it — full account compromise.

Fix: Add iam:PassedToService condition limiting to lambda.amazonaws.com, ecs-tasks.amazonaws.com, apigateway.amazonaws.com, logs.amazonaws.com, bedrock.amazonaws.com.

  1. aws-service-role/* allows creating any service-linked role

File: docs/design/DEPLOYMENT_ROLES.md, IAMRolesAndPolicies statement

Combined with iam:CreateServiceLinkedRole, the resource arn:aws:iam:::role/aws-service-role/ allows creating service-linked roles for any AWS service.

Fix: Add an iam:AWSServiceName condition scoped to only the services ABCA actually uses.

  1. KMS kms:CreateGrant on Resource: "*" — can delegate key access across account

The CDK bootstrap key alias (alias/cdk-hnb659fds-*) is deterministic. Consider adding a kms:ResourceAliases condition to scope this.

  1. X-Ray resource policy in QUICK_START.md grants Resource: "*" to xray.amazonaws.com

Fix: Scope to arn:aws:logs::ACCOUNT_ID:log-group:aws/spans and arn:aws:logs::ACCOUNT_ID:log-group:aws/spans:*.

  1. Inconsistent placeholder names (ACCOUNT vs ACCOUNT_ID)

The bootstrap command uses ACCOUNT, the trust policy uses ACCOUNT_ID, and the IAM ARNs use * for the account field. Should be unified with a single ACCOUNT_ID placeholder and a note at the top
explaining substitution.


Medium (should fix)

  1. ACCOUNT_ID variable captured but never used

Files: QUICK_START.md, setup/SKILL.md (and Starlight mirror)

ACCOUNT_ID=$(aws sts get-caller-identity ...) is set but never referenced. Either remove it or use it to scope the resource policy ARN.

  1. Session timeout inconsistency: "8 hours" vs "9 hours"
  • DEPLOYMENT_GUIDE.md says "8 hours (AgentCore session)"
  • COST_MODEL.md (updated in this PR) says "9 hours"
  • Code: executionTimeout: Duration.hours(9) at task-orchestrator.ts:173
  • Pre-existing COST_MODEL.md line 46 still says "8-hour max session timeout" (not updated)

These are actually two different limits (AgentCore service limit vs. orchestrator timeout), but this needs explicit clarification.

  1. VPC endpoint cost may be underestimated

The PR claims ~$50/month for 7 endpoints across 2 AZs. AWS pricing: 7 x 2 AZs x $0.01/hr x 730 hrs = ~$102/month. This would push the total baseline to ~$135-145/month. (Pre-existing issue in
COST_MODEL.md, perpetuated in the new DEPLOYMENT_GUIDE.md.)

  1. Region wildcards throughout all policies

All ARNs use * for region despite ABCA deploying to a single region. The "Iterative tightening" section mentions this but it could be a stronger recommendation.

scottschreckengaust and others added 3 commits April 24, 2026 20:46
…constraints table

GetRandomPassword is an account-level API with no secret ARN, so it
requires Resource:"*". Document this in the Resource-level permission
constraints table alongside other services that require "*".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skill referenced a non-existent IaCRole-ABCA-Policy. Update to
the three actual policy names (Infrastructure, Application, Observability)
matching DEPLOYMENT_ROLES.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit route mapping, mirrorMarkdownFile call, and sidebar entry
so the Deployment Guide renders on the docs site and cross-doc links
from COST_MODEL.md resolve correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

1. SecretsManager Resource includes bare "*"

The "*" is intentional — secretsmanager:GetRandomPassword is an account-level API that does not support resource-level permissions (it generates a random string without referencing any secret). Splitting it into its own statement would push the Application policy over the IAM 6,144-character limit (the 3-way split was specifically sized to stay under this). Added an entry to the Resource-level permission constraints table documenting this, with a pointer to the Iterative tightening section for post-deployment refinement.

2. Deploy SKILL.md references non-existent policy name IaCRole-ABCA-Policy

Fixed — updated to reference all three policies (IaCRole-ABCA-Infrastructure, -Application, -Observability) with the three --cloudformation-execution-policies flags matching DEPLOYMENT_ROLES.md.

3. DEPLOYMENT_GUIDE.md has no Starlight mirror — broken link on docs site

Fixed — added DEPLOYMENT_GUIDE to explicitGuideRoutes in sync-starlight.mjs, added a mirrorMarkdownFile() call to generate the mirror at getting-started/Deployment-guide.md, and added a sidebar entry in astro.config.mjs. The COST_MODEL.md links now rewrite to /getting-started/deployment-guide instead of the raw relative path.

Isolate the account-level GetRandomPassword action (which requires
Resource:*) from the scoped SecretsManager statement. With ECS the
Application policy is still only ~4K of the 6,144-char IAM limit,
leaving ~2K headroom for future services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scottschreckengaust and others added 3 commits April 24, 2026 21:30
…otes

Separate iam:PassRole into its own statement with iam:PassedToService
condition limiting to the 7 services ABCA passes roles to. Add
iterative tightening items for AttachRolePolicy (iam:PolicyARN) and
CreateServiceLinkedRole (iam:AWSServiceName) conditions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ceholders

- Scope X-Ray resource policy Resource from * to arn:aws:logs:*:ACCOUNT_ID:log-group:aws/spans
  in QUICK_START.md and setup SKILL.md (item 7)
- Add KMS kms:ResourceAliases tightening recommendation (item 6)
- Unify placeholder to ACCOUNT_ID everywhere with substitution note (item 8)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…timeouts

VPC endpoint cost was ~$50/mo (1 AZ math), actual is ~$102/mo
(7 endpoints x 2 AZs x $0.01/hr x 730 hrs). Update baseline totals
from ~$85-95 to ~$140-150 in COST_MODEL.md and DEPLOYMENT_GUIDE.md.

Clarify the two distinct timeout limits: AgentCore 8-hour service
limit vs orchestrator 9-hour executionTimeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

4. iam:PassRole and iam:AttachRolePolicy without conditions — privilege escalation path

Separated iam:PassRole into its own IAMPassRole statement with an iam:PassedToService condition scoped to the 7 services ABCA passes roles to: lambda.amazonaws.com, ecs-tasks.amazonaws.com, ecs.amazonaws.com, apigateway.amazonaws.com, logs.amazonaws.com, bedrock.amazonaws.com, events.amazonaws.com.

For iam:AttachRolePolicy: restricting with iam:PolicyARN requires enumerating every AWS managed policy CDK attaches (e.g., service-role/AWSLambdaBasicExecutionRole), which is brittle across CDK versions. Added this as Iterative tightening item #4 with guidance to enumerate from a synthesized template post-deployment.

5. aws-service-role/* allows creating any service-linked role

Added as Iterative tightening item #5 — recommending iam:AWSServiceName condition scoped after first deploy using CloudTrail to identify which service-linked roles were actually created.

6. KMS kms:CreateGrant on Resource: "*"

Added as Iterative tightening item #6 — recommending kms:ResourceAliases condition scoped to alias/cdk-hnb659fds-* (the deterministic CDK bootstrap key alias).

7. X-Ray resource policy in QUICK_START.md grants Resource: "*"

Fixed — scoped to arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans and arn:aws:logs:*:${ACCOUNT_ID}:log-group:aws/spans:* in both QUICK_START.md and the setup SKILL.md. The ACCOUNT_ID variable (item 9) is now used by this resource policy.

8. Inconsistent placeholder names (ACCOUNT vs ACCOUNT_ID)

Unified to ACCOUNT_ID throughout DEPLOYMENT_ROLES.md (bootstrap command, policy ARNs). Added a substitution note at the top of the "Using these policies" section explaining the two placeholders (ACCOUNT_ID and REGION).

9. ACCOUNT_ID variable captured but never used

Keeping as-is — the variable is now used by the scoped X-Ray resource policy (item 7 fix).

10. Session timeout inconsistency: "8 hours" vs "9 hours"

Clarified that these are two distinct limits:

  • 8 hours: AgentCore service limit (max session duration)
  • 9 hours: Orchestrator executionTimeout (Duration.hours(9) in task-orchestrator.ts:173)

Updated both DEPLOYMENT_GUIDE.md and COST_MODEL.md to distinguish the two explicitly.

11. VPC endpoint cost may be underestimated

Fixed — the ~$50/month figure was calculated for 1 AZ. Corrected to ~$102/month (7 × 2 AZs × $0.01/hr × 730 hrs) in both COST_MODEL.md and DEPLOYMENT_GUIDE.md. Updated the total baseline from ~$85–95/month to ~$140–150/month and adjusted the cost-at-scale table accordingly.

12. Region wildcards throughout all policies

Agree this could be stronger. All policy ARNs use * for region despite ABCA deploying to a single region. The Iterative tightening section already has item #3 recommending "aws:RequestedRegion" conditions. Recommend keeping this as a post-deployment tightening step rather than hardcoding a region in the docs — users deploy to different regions, and a hardcoded region in the policy template would cause confusion. The tightening guidance is clear about what to do.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Create

In progress — CDK bootstrap complete, cdk deploy running (~10 min for 189 resources).

Pre-deploy setup:

  • ✅ X-Ray resource policy created and destination set
  • ✅ CDK bootstrapped with default AdministratorAccess
  • ✅ Full build passed (721 CDK tests, 70 CLI tests)

Note: The scoped X-Ray resource policy (aws/spans log group ARN) fails at update-trace-segment-destination time because the log group doesn't exist yet. Workaround: set with Resource: "*" first, then tighten after. Will document this in Quick Start.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Create

Passed — Stack backgroundagent-dev reached CREATE_COMPLETE (189 resources).

Note: Had to delete an orphaned AWS::XRay::ResourcePolicy from a previous test deployment — this is an account-level resource that survives cdk destroy. Added to cleanup checklist.

Now running smoke test task (CODEOWNERS file creation).

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Task

Passed — Task 01KQ0VB0GSYS245NH2Q4ZGN41K completed in 124s, $0.14.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Update

Passed — Stack reached UPDATE_COMPLETE. Added systemPromptOverrides to Blueprint and redeployed successfully.

@krokoko krokoko disabled auto-merge April 24, 2026 23:20
@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Destroy

In progress (retry) — First delete attempt hit the known AgentCore ENI cleanup timing issue (security group + 2 subnets retained by ela-attach managed ENIs). Retrying delete — ENIs typically release within 10-30 min after runtime deletion. This is a known operational note, not a permissions issue.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V1: Admin / No ECS — Destroy

Passed (with known ENI retry) — Stack deleted. AgentCore-managed ENIs (ela-attach) held references to the security group and subnets, requiring --retain-resources on the VPC, 2 subnets, and 1 SG. These orphaned resources will be cleaned up once ENIs release (~10-30 min). This is a known operational characteristic, not a permissions issue.

V1 Summary: All 4 lifecycle steps passed (Create ✅ → Task ✅ → Update ✅ → Destroy ✅).

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Create

Passed — Stack reached CREATE_COMPLETE with ECS Fargate cluster, task definition, and Docker image asset. Submitting smoke test task now.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Task

Passed — Task completed in 130s, $0.16.

@scottschreckengaust
Copy link
Copy Markdown
Contributor Author

V2: Admin / ECS — Update

Passed — Stack reached UPDATE_COMPLETE. Removed systemPromptOverrides from Blueprint and redeployed with ECS cluster intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants