diff --git a/docs/integration/hcg-tier2-rollout-runbook.md b/docs/integration/hcg-tier2-rollout-runbook.md index b68e95f..0626e4d 100644 --- a/docs/integration/hcg-tier2-rollout-runbook.md +++ b/docs/integration/hcg-tier2-rollout-runbook.md @@ -3,9 +3,9 @@ # HCG tier-2 — rollout & rollback runbook -**Version:** 0.4 (policy-deny smoke script landed, Phase E in-progress) -**Date:** 2026-06-10 (rev. from 2026-06-09) -**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage). Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken. +**Version:** 0.5 (smoke-script verb-canary expansion, Phase E in-progress) +**Date:** 2026-06-13 (rev. from 2026-06-10) +**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage); §1.5 verb-canary block extended to cover OPTIONS, regex-route DELETE, and wrong-verb-on-listed-path so the operator pre-check fails closed against more verb-governance regression classes. Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken. **ADR:** [`docs/decisions/0004-adopt-http-capability-gateway.md`](../decisions/0004-adopt-http-capability-gateway.md) **Plan:** [`docs/integration/http-capability-gateway-plan.md`](http-capability-gateway-plan.md) (§ Phase E) **Contract:** [`docs/integration/http-capability-gateway-boj-contract.md`](http-capability-gateway-boj-contract.md) @@ -88,7 +88,7 @@ These cannot be inferred from the code/contract; the owner must fill them before - [x] `container/gateway-deploy.k9.ncl` exists in the gateway repo (plan §E1) — http-capability-gateway#38 (2026-06-03). Five-level k9-svc pedigree (Snout / Scent / Leash / Gut / Muscle) modelled on `boj-server:container/deploy.k9.ncl`; per-environment `BACKEND_URL` (`http://127.0.0.1:7700` staging, `http://unix:/run/boj/gnosis.sock:/` production); trust source `"header"` staging → `"mtls"` production after §2.4 rehearsal; `max_unavailable = 0`; `failure_mode = "fail-closed"` matching the `[SEAMS] gateway-boj-gnosis` declaration. - [x] Gateway policy file in place: `config/gateway-policy-boj-example.yaml`, covering all BoJ surface routes (`/.well-known/boj-node-pubkey`, `/health`, `/menu`, `/cartridges`, `/cartridge/:name`, `/cartridge/:name/invoke`, `/cartridge/:name/sse`, plus any added since contract v1.0). Re-verified 2026-05-28 against `BojRest.Router`; the `POST /cartridge/:name/sse` route (router.ex line 130, wired since the SSE landing — ADR-0013 §6, STATE entry 2026-05-18) was the only drift since contract v1.0 and is now governed by the `cartridge-sse-post` rule alongside `cartridge-invoke-post` (boj-server#165). - [x] Live policy file (`config/gateway-policy-boj.yaml`) promoted from the example. Content-identical to the example at promotion time; future BoJ-surface evolution lands in the live file and the example remains as the worked-example artefact (Phase A A3). Both §2.1 staging and §3.1 production load the live file via `POLICY_PATH`. -- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url ` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route plus default-deny verb canaries (DELETE/PUT/PATCH on `/cartridges` and `/health`) and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped). +- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url ` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route (25 in the live policy) plus six default-deny verb canaries — DELETE/PUT/PATCH on listed exact paths, OPTIONS on a listed path (no CORS-preflight bypass), DELETE on a regex-matched route (no per-verb regex regression), and GET on the POST-only `ssg-mcp-webhook` public route (the `{path, verb}` pairing must be enforced even when the path itself is in the policy) — and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped). --- diff --git a/scripts/hcg-policy-smoke.sh b/scripts/hcg-policy-smoke.sh index 75f8f44..aa4e278 100755 --- a/scripts/hcg-policy-smoke.sh +++ b/scripts/hcg-policy-smoke.sh @@ -193,12 +193,36 @@ probe POST /coprocessor/select deny "internal:coprocessor-select-post" probe GET /sdp/status deny "internal:sdp-status-get" # Default-deny verb canaries — global_verbs is [GET, POST], so any -# DELETE/PUT/PATCH on a known path must be denied via the no-match -# (or unknown-method) path. Verifies the verb-governance core invariant -# of ADR-0004. -probe DELETE /cartridges deny "verb-canary:DELETE /cartridges" -probe PUT /health deny "verb-canary:PUT /health" -probe PATCH /cartridges deny "verb-canary:PATCH /cartridges" +# DELETE/PUT/PATCH/OPTIONS on a known path must be denied via the +# no-match (or unknown-method) path. Verifies the verb-governance core +# invariant of ADR-0004. +# +# OPTIONS is named in the policy header's banned-verb list and gets its +# own canary because a CORS preflight auto-responder in the gateway +# would silently bypass policy. +# +# Regex-route verb canary (DELETE on cartridge-invoke-post) catches a +# class of bug the exact-path canaries miss: a regression where the +# regex matcher accepts the path under any verb instead of only the +# verb its rule lists. +# +# Wrong-verb-on-listed-path canary (GET on the ssg-mcp webhook, which +# only lists POST) verifies the {path, verb} pairing is enforced: the +# path is in the policy as a public exception, but only for POST; GET +# on the same path must default-deny because no rule covers it. +# +# HEAD is also banned by the policy header but is deliberately not +# canaried here — curl with `-X HEAD` (vs `--head`) waits for a body +# the server will not send, which interacts badly with `--max-time` in +# this script. HEAD enforcement remains covered by the gateway's own +# unit tests; the operator pre-check focuses on probes that survive +# curl's method quirks. +probe DELETE /cartridges deny "verb-canary:DELETE /cartridges" +probe PUT /health deny "verb-canary:PUT /health" +probe PATCH /cartridges deny "verb-canary:PATCH /cartridges" +probe OPTIONS /cartridges deny "verb-canary:OPTIONS /cartridges (preflight must not bypass)" +probe DELETE /cartridge/probe/invoke deny "verb-canary:DELETE on regex route (cartridge-invoke-post)" +probe GET /cartridges/ssg-mcp/webhook deny "verb-canary:GET on POST-only public route (ssg-mcp-webhook-post)" if [ "$WITH_BACKEND" = "1" ]; then echo