Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/integration/hcg-tier2-rollout-runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

# HCG tier-2 — rollout & rollback runbook

**Version:** 0.4 (policy-deny smoke script landed, Phase E in-progress)
**Date:** 2026-06-10 (rev. from 2026-06-09)
**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage). Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken.
**Version:** 0.5 (smoke-script verb-canary expansion, Phase E in-progress)
**Date:** 2026-06-13 (rev. from 2026-06-10)
**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage); §1.5 verb-canary block extended to cover OPTIONS, regex-route DELETE, and wrong-verb-on-listed-path so the operator pre-check fails closed against more verb-governance regression classes. Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken.
**ADR:** [`docs/decisions/0004-adopt-http-capability-gateway.md`](../decisions/0004-adopt-http-capability-gateway.md)
**Plan:** [`docs/integration/http-capability-gateway-plan.md`](http-capability-gateway-plan.md) (§ Phase E)
**Contract:** [`docs/integration/http-capability-gateway-boj-contract.md`](http-capability-gateway-boj-contract.md)
Expand Down Expand Up @@ -88,7 +88,7 @@ These cannot be inferred from the code/contract; the owner must fill them before
- [x] `container/gateway-deploy.k9.ncl` exists in the gateway repo (plan §E1) — http-capability-gateway#38 (2026-06-03). Five-level k9-svc pedigree (Snout / Scent / Leash / Gut / Muscle) modelled on `boj-server:container/deploy.k9.ncl`; per-environment `BACKEND_URL` (`http://127.0.0.1:7700` staging, `http://unix:/run/boj/gnosis.sock:/` production); trust source `"header"` staging → `"mtls"` production after §2.4 rehearsal; `max_unavailable = 0`; `failure_mode = "fail-closed"` matching the `[SEAMS] gateway-boj-gnosis` declaration.
- [x] Gateway policy file in place: `config/gateway-policy-boj-example.yaml`, covering all BoJ surface routes (`/.well-known/boj-node-pubkey`, `/health`, `/menu`, `/cartridges`, `/cartridge/:name`, `/cartridge/:name/invoke`, `/cartridge/:name/sse`, plus any added since contract v1.0). Re-verified 2026-05-28 against `BojRest.Router`; the `POST /cartridge/:name/sse` route (router.ex line 130, wired since the SSE landing — ADR-0013 §6, STATE entry 2026-05-18) was the only drift since contract v1.0 and is now governed by the `cartridge-sse-post` rule alongside `cartridge-invoke-post` (boj-server#165).
- [x] Live policy file (`config/gateway-policy-boj.yaml`) promoted from the example. Content-identical to the example at promotion time; future BoJ-surface evolution lands in the live file and the example remains as the worked-example artefact (Phase A A3). Both §2.1 staging and §3.1 production load the live file via `POLICY_PATH`.
- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url <staging-gateway-url>` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route plus default-deny verb canaries (DELETE/PUT/PATCH on `/cartridges` and `/health`) and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped).
- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url <staging-gateway-url>` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route (25 in the live policy) plus six default-deny verb canaries — DELETE/PUT/PATCH on listed exact paths, OPTIONS on a listed path (no CORS-preflight bypass), DELETE on a regex-matched route (no per-verb regex regression), and GET on the POST-only `ssg-mcp-webhook` public route (the `{path, verb}` pairing must be enforced even when the path itself is in the policy) — and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped).

---

Expand Down
36 changes: 30 additions & 6 deletions scripts/hcg-policy-smoke.sh
Original file line number Diff line number Diff line change
Expand Up @@ -193,12 +193,36 @@ probe POST /coprocessor/select deny "internal:coprocessor-select-post"
probe GET /sdp/status deny "internal:sdp-status-get"

# Default-deny verb canaries — global_verbs is [GET, POST], so any
# DELETE/PUT/PATCH on a known path must be denied via the no-match
# (or unknown-method) path. Verifies the verb-governance core invariant
# of ADR-0004.
probe DELETE /cartridges deny "verb-canary:DELETE /cartridges"
probe PUT /health deny "verb-canary:PUT /health"
probe PATCH /cartridges deny "verb-canary:PATCH /cartridges"
# DELETE/PUT/PATCH/OPTIONS on a known path must be denied via the
# no-match (or unknown-method) path. Verifies the verb-governance core
# invariant of ADR-0004.
#
# OPTIONS is named in the policy header's banned-verb list and gets its
# own canary because a CORS preflight auto-responder in the gateway
# would silently bypass policy.
#
# Regex-route verb canary (DELETE on cartridge-invoke-post) catches a
# class of bug the exact-path canaries miss: a regression where the
# regex matcher accepts the path under any verb instead of only the
# verb its rule lists.
#
# Wrong-verb-on-listed-path canary (GET on the ssg-mcp webhook, which
# only lists POST) verifies the {path, verb} pairing is enforced: the
# path is in the policy as a public exception, but only for POST; GET
# on the same path must default-deny because no rule covers it.
#
# HEAD is also banned by the policy header but is deliberately not
# canaried here — curl with `-X HEAD` (vs `--head`) waits for a body
# the server will not send, which interacts badly with `--max-time` in
# this script. HEAD enforcement remains covered by the gateway's own
# unit tests; the operator pre-check focuses on probes that survive
# curl's method quirks.
probe DELETE /cartridges deny "verb-canary:DELETE /cartridges"
probe PUT /health deny "verb-canary:PUT /health"
probe PATCH /cartridges deny "verb-canary:PATCH /cartridges"
probe OPTIONS /cartridges deny "verb-canary:OPTIONS /cartridges (preflight must not bypass)"
probe DELETE /cartridge/probe/invoke deny "verb-canary:DELETE on regex route (cartridge-invoke-post)"
probe GET /cartridges/ssg-mcp/webhook deny "verb-canary:GET on POST-only public route (ssg-mcp-webhook-post)"

if [ "$WITH_BACKEND" = "1" ]; then
echo
Expand Down
Loading