Add setup-aws.sh: idempotent AWS provisioning for the EKS MVP (#309)#317
Merged
Conversation
Provisions everything upstream of setup-experiment-helm.sh with create-if-absent semantics (issue #309): EKS verify-or-create (+ OIDC provider + aws-ebs-csi-driver addon), ECR repo + reference-image build/push, RDS Postgres in the cluster VPC (AWS-managed master password, re-read from Secrets Manager on re-run) or an operator --postgres-dsn, S3 bucket + IRSA role scoped to the chart's task-store-server ServiceAccount, and a generated Helm values file + the exact setup-experiment-helm.sh handoff invocation. --dry-run prints every mutating command verbatim; all state reads go through mockable probe_* functions (EDEN_SETUP_AWS_MOCK), driven by the offline test harness test-setup-aws.sh (86 checks across all-absent / all-present / partial / flag-validation fixtures). Bash-3.2-clean and shellcheck-clean. Chart README + helm.md prerequisites now point at the script, keeping the manual checklist as the substrate-agnostic fallback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
P1: the EBS CSI addon probe accepted any addon status — a CREATE_FAILED/DEGRADED addon skip-converged as healthy. Now requires ACTIVE, waits on CREATING/UPDATING (aws eks wait addon-active), and fails loud on broken states. P2s: converge an interrupted 'eksctl create cluster' (CREATING → wait cluster-active, not the fatal branch); atomic tmp+rename values write so a crash can't leave a partial file that silently rotates secrets on the next run; --dry-run preview now redacts all secret material (<preserved>/<generated> markers; DSN userinfo masked); an existing same-named IAM policy is content-validated against the bucket ARN instead of silently adopted; the IRSA trust check is a semantic JSON comparison (single statement, federated principal, :sub AND :aud) instead of substring matching; the DB ingress probe now requires IsEgress=false + tcp + FromPort=ToPort=5432. Test harness: mocks updated for the strengthened probes; new cases for cluster-CREATING convergence and foreign-policy fail-loud; redaction assertions (97 checks, all passing under bash 3.2). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The existing-policy acceptance test now parses the document and requires Allow statements granting s3:GetObject + s3:PutObject on the bucket's objects and s3:ListBucket on the bucket (NotAction/NotResource statements excluded), instead of a substring scan that a Deny merely mentioning the ARN would have passed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reference/scripts/setup-aws/setup-aws.sh: idempotent, create-if-absent provisioning of everything the EDEN Helm chart needs upstream ofsetup-experiment-helm.shon AWS (issue Idempotent AWS provisioning script for the EKS MVP (setup-aws.sh) #309, AWS MVP milestone). Replaces the chart README's manual AWS checklist with a one-pass script; the manual steps stay as the substrate-agnostic fallback.setup-experiment.sh/repo_initidempotency idiom): EKS verify-or-create (+ IAM OIDC provider + theaws-ebs-csi-driveraddon that PVC provisioning needs on modern EKS), ECR repo + reference-image build/push, RDS Postgres in the cluster VPC (AWS-managed master password, re-read from Secrets Manager on re-run) or an operator--postgres-dsn, S3 bucket + IRSA role scoped to exactly the chart's<fullname>-task-store-serverServiceAccount, and a generated 0600 Helm values file + the exactsetup-experiment-helm.shhandoff invocation.--dry-runprints every mutating command verbatim through a single gate and redacts all secret material from the preview. State reads go throughprobe_*functions overridable viaEDEN_SETUP_AWS_MOCK, which drives the offlinetest-setup-aws.sh(97 checks, bash-3.2-clean).Advances the AWS MVP milestone (the chart deploys on EKS as of 13a/13c/13d; this makes the upstream AWS provisioning a script instead of a manual checklist).
What this does NOT cover
test-setup-aws.sh(97 checks) and shellcheck on these scripts run only manually today; no CI job executes them. Tracked in CI job for shellcheck + bash test harnesses (setup-aws dry-run suite has no CI executor) #315.--dry-run+ the mock harness; the first real AWS run is operator-attended by design (provisioning costs money and touches a shared account). No live-AWS assertion in this PR.setup-gcp.shmirrors (noted in Idempotent AWS provisioning script for the EKS MVP (setup-aws.sh) #309's out-of-scope; not separately filed).Fresh-operator walkthrough
--help(usage lists required vs create-only-required flags); ran with a missing required flag → fails loud naming it (setup-aws.sh: --ecr-repo is required); ran a full fresh-account--dry-runagainst the mock — emits the 5 steps in dependency order, the values-file preview shows secrets as<generated>and the DSN userinfo as<redacted>, and the handoff prints the exactsetup-experiment-helm.sh --valuesinvocation with--namespace/--releasematching the IRSA trust subject. Passed cleanly.Test plan
shellcheckclean on both scripts./bin/bash3.2.57 (nomapfile/declare -A).bash reference/scripts/setup-aws/test-setup-aws.sh— all 97 checks pass (all-absent / all-present / partial-state / interrupted-create / foreign-policy / flag-validation fixtures).--dry-runend-to-end against the mock — correct command sequence + skip paths + secret redaction.npx markdownlint-cli2@0.14.0(CI-pinned) — 0 errors.docs/plans/review/setup-aws/impl/.Related issues
🤖 Generated with Claude Code