From c84f63cfa876f2f7ee1ab4875588269746005b32 Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Sat, 13 Jun 2026 06:11:51 +0000 Subject: [PATCH] =?UTF-8?q?feat(scripts):=20extend=20=C2=A71.5=20verb-cana?= =?UTF-8?q?ry=20coverage=20(Phase=20E)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Tightens `scripts/hcg-policy-smoke.sh` against three classes of verb-governance regression the original three canaries (DELETE/PUT/ PATCH on `/cartridges` and `/health`) don't catch. Single-lane HCG tier-2 channel (`standards#91`); Phase E (`standards#100`) is the active phase. Added probes: 1. **`OPTIONS /cartridges`** — `global_verbs: [GET, POST]` bans OPTIONS, but a CORS preflight auto-responder added later would silently bypass policy. The canary fails closed against that regression class. 2. **`DELETE /cartridge/probe/invoke`** — exercises the regex route `^/cartridge/[A-Za-z0-9_.-]+/invoke$` under a banned verb. The existing exact-path canaries don't catch a regex-matcher regression where the path is accepted under any verb instead of only the verb the rule lists. 3. **`GET /cartridges/ssg-mcp/webhook`** — the path is in the policy as a documented public exception, but only for POST. The canary verifies the `{path, verb}` pairing is enforced: GET on the same path must default-deny because no rule covers it. Deliberate omission: HEAD. Curl with `-X HEAD` (vs `--head`) waits for a body the server will not send, which interacts badly with the script's `--max-time 10`. HEAD enforcement remains covered by the gateway's own unit tests; the §1.5 operator pre-check focuses on probes that survive curl's method quirks. The reasoning is captured inline in the script comment so a future maintainer doesn't add it back as an oversight. Runbook §1.5 description updated to reflect the expanded canary set; version bump 0.4 → 0.5; status line records the extension. Verification: - `bash -n scripts/hcg-policy-smoke.sh` — syntax check passes. - Synthetic always-403 mock on :18443 — PASS=31 FAIL=0 (was 28), the three new canaries report PASS. Exits 0. - `--help` and bad-args exit codes unchanged (64). Out of scope: - The `--with-backend` allow-path matrix is not extended here. The authenticated routes the script probes in allow mode are the ones actually wired in `BojRest.Router`; the additional policy entries (`graphql`, `sse`, `order`, `umoja/*`, etc.) are declared-not-yet- wired per contract §8 and would 404 from BoJ, which the `allow_or_upstream` pattern misdiagnoses as gateway-deny. They stay in the deny matrix until they are wired in BoJ. - The `ssg-mcp-webhook-post` route is not added to the `--with-backend` allow probes for the same reason: the handler is in `openapi.yaml` but not yet in `router.ex`. - The script does not parse `config/gateway-policy-boj.yaml` to derive the probe matrix; the matrix stays hand-maintained and the parity-with-policy property remains a manual maintenance discipline. PR #210's commitment ("the script doubles as a policy-completeness checklist") is preserved. Channel position: ``` standards#91 (parent, open) ├── #96 Phase A — closed ├── #97 Phase B — closed ├── #98 Phase C — closed ├── #99 Phase D — closed (joint-closed via boj-server#168) └── #100 Phase E — IN PROGRESS ├── E5 runbook draft — boj-server#128 (landed) ├── E1 loopback prereqs — boj-server#130/#131/#132/#165/#173 (landed) ├── E1 deploy spec — http-capability-gateway#38 (landed) ├── E1 live policy promotion — boj-server#208 (landed) ├── §1.5 operator pre-check smoke — boj-server#210 (landed) ├── §1.5 verb-canary expansion — THIS PR ├── E1 .ctp signing — owner follow-up ├── E2 staging cut-over — owner follow-up ├── E3 telemetry verification — owner follow-up ├── E4 production rollout — owner follow-up └── §6.4 Trustfile flip + §6.5 joint-close — owner-only ``` Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 Co-Authored-By: Claude Opus 4.7 --- docs/integration/hcg-tier2-rollout-runbook.md | 8 ++--- scripts/hcg-policy-smoke.sh | 36 +++++++++++++++---- 2 files changed, 34 insertions(+), 10 deletions(-) diff --git a/docs/integration/hcg-tier2-rollout-runbook.md b/docs/integration/hcg-tier2-rollout-runbook.md index b68e95f..0626e4d 100644 --- a/docs/integration/hcg-tier2-rollout-runbook.md +++ b/docs/integration/hcg-tier2-rollout-runbook.md @@ -3,9 +3,9 @@ # HCG tier-2 — rollout & rollback runbook -**Version:** 0.4 (policy-deny smoke script landed, Phase E in-progress) -**Date:** 2026-06-10 (rev. from 2026-06-09) -**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage). Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken. +**Version:** 0.5 (smoke-script verb-canary expansion, Phase E in-progress) +**Date:** 2026-06-13 (rev. from 2026-06-10) +**Status:** Phase E deliverables E1 (deploy spec) + E5 (rollback runbook) drafted; live gateway policy (`config/gateway-policy-boj.yaml`) promoted from the worked example (§1.5); `scripts/hcg-policy-smoke.sh` lands as the checked-in §1.5 operator pre-check (deny-path covers gateway-alone; `--with-backend` adds allow-path coverage); §1.5 verb-canary block extended to cover OPTIONS, regex-route DELETE, and wrong-verb-on-listed-path so the operator pre-check fails closed against more verb-governance regression classes. Owner-input markers (`!OWNER:`) remain to be filled before any traffic-shift action is taken. **ADR:** [`docs/decisions/0004-adopt-http-capability-gateway.md`](../decisions/0004-adopt-http-capability-gateway.md) **Plan:** [`docs/integration/http-capability-gateway-plan.md`](http-capability-gateway-plan.md) (§ Phase E) **Contract:** [`docs/integration/http-capability-gateway-boj-contract.md`](http-capability-gateway-boj-contract.md) @@ -88,7 +88,7 @@ These cannot be inferred from the code/contract; the owner must fill them before - [x] `container/gateway-deploy.k9.ncl` exists in the gateway repo (plan §E1) — http-capability-gateway#38 (2026-06-03). Five-level k9-svc pedigree (Snout / Scent / Leash / Gut / Muscle) modelled on `boj-server:container/deploy.k9.ncl`; per-environment `BACKEND_URL` (`http://127.0.0.1:7700` staging, `http://unix:/run/boj/gnosis.sock:/` production); trust source `"header"` staging → `"mtls"` production after §2.4 rehearsal; `max_unavailable = 0`; `failure_mode = "fail-closed"` matching the `[SEAMS] gateway-boj-gnosis` declaration. - [x] Gateway policy file in place: `config/gateway-policy-boj-example.yaml`, covering all BoJ surface routes (`/.well-known/boj-node-pubkey`, `/health`, `/menu`, `/cartridges`, `/cartridge/:name`, `/cartridge/:name/invoke`, `/cartridge/:name/sse`, plus any added since contract v1.0). Re-verified 2026-05-28 against `BojRest.Router`; the `POST /cartridge/:name/sse` route (router.ex line 130, wired since the SSE landing — ADR-0013 §6, STATE entry 2026-05-18) was the only drift since contract v1.0 and is now governed by the `cartridge-sse-post` rule alongside `cartridge-invoke-post` (boj-server#165). - [x] Live policy file (`config/gateway-policy-boj.yaml`) promoted from the example. Content-identical to the example at promotion time; future BoJ-surface evolution lands in the live file and the example remains as the worked-example artefact (Phase A A3). Both §2.1 staging and §3.1 production load the live file via `POLICY_PATH`. -- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url ` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route plus default-deny verb canaries (DELETE/PUT/PATCH on `/cartridges` and `/health`) and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped). +- [ ] Gateway has been smoke-tested in isolation with the policy, returning expected allow/deny on each route. Run `scripts/hcg-policy-smoke.sh --gateway-url ` against the gateway loaded with `config/gateway-policy-boj.yaml`; the script exercises a no-trust-header deny probe for every non-public route (25 in the live policy) plus six default-deny verb canaries — DELETE/PUT/PATCH on listed exact paths, OPTIONS on a listed path (no CORS-preflight bypass), DELETE on a regex-matched route (no per-verb regex regression), and GET on the POST-only `ssg-mcp-webhook` public route (the `{path, verb}` pairing must be enforced even when the path itself is in the policy) — and is fully gateway-internal — BoJ does **not** need to be reachable for this run. Once BoJ is up behind the gateway, re-run with `--with-backend` from a trusted-proxy IP (loopback by default) to also cover the allow path on authenticated/internal routes including the `POST /cartridge/:name/sse` authenticated/untrusted pair carried over from boj-server#165's test plan. Attach the script's PASS/FAIL summary to the cut-over ticket; a single FAIL is a stop-the-rollout condition (gateway loaded the policy but is not enforcing as declared, or BoJ is unreachable from the gateway, or the script is being run from a non-trusted-proxy IP and the trust header is being stripped). --- diff --git a/scripts/hcg-policy-smoke.sh b/scripts/hcg-policy-smoke.sh index 75f8f44..aa4e278 100755 --- a/scripts/hcg-policy-smoke.sh +++ b/scripts/hcg-policy-smoke.sh @@ -193,12 +193,36 @@ probe POST /coprocessor/select deny "internal:coprocessor-select-post" probe GET /sdp/status deny "internal:sdp-status-get" # Default-deny verb canaries — global_verbs is [GET, POST], so any -# DELETE/PUT/PATCH on a known path must be denied via the no-match -# (or unknown-method) path. Verifies the verb-governance core invariant -# of ADR-0004. -probe DELETE /cartridges deny "verb-canary:DELETE /cartridges" -probe PUT /health deny "verb-canary:PUT /health" -probe PATCH /cartridges deny "verb-canary:PATCH /cartridges" +# DELETE/PUT/PATCH/OPTIONS on a known path must be denied via the +# no-match (or unknown-method) path. Verifies the verb-governance core +# invariant of ADR-0004. +# +# OPTIONS is named in the policy header's banned-verb list and gets its +# own canary because a CORS preflight auto-responder in the gateway +# would silently bypass policy. +# +# Regex-route verb canary (DELETE on cartridge-invoke-post) catches a +# class of bug the exact-path canaries miss: a regression where the +# regex matcher accepts the path under any verb instead of only the +# verb its rule lists. +# +# Wrong-verb-on-listed-path canary (GET on the ssg-mcp webhook, which +# only lists POST) verifies the {path, verb} pairing is enforced: the +# path is in the policy as a public exception, but only for POST; GET +# on the same path must default-deny because no rule covers it. +# +# HEAD is also banned by the policy header but is deliberately not +# canaried here — curl with `-X HEAD` (vs `--head`) waits for a body +# the server will not send, which interacts badly with `--max-time` in +# this script. HEAD enforcement remains covered by the gateway's own +# unit tests; the operator pre-check focuses on probes that survive +# curl's method quirks. +probe DELETE /cartridges deny "verb-canary:DELETE /cartridges" +probe PUT /health deny "verb-canary:PUT /health" +probe PATCH /cartridges deny "verb-canary:PATCH /cartridges" +probe OPTIONS /cartridges deny "verb-canary:OPTIONS /cartridges (preflight must not bypass)" +probe DELETE /cartridge/probe/invoke deny "verb-canary:DELETE on regex route (cartridge-invoke-post)" +probe GET /cartridges/ssg-mcp/webhook deny "verb-canary:GET on POST-only public route (ssg-mcp-webhook-post)" if [ "$WITH_BACKEND" = "1" ]; then echo