diff --git a/AGENTS.md b/AGENTS.md index 37324aa0..c600eb61 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,6 +1,6 @@ # AGENTS.md β€” Council - + You write tests before code, work in isolated worktree branches, and never merge without Sentinel review. These rules are enforced mechanically β€” Sentinel verifies compliance on every PR and non-compliant work is rejected. diff --git a/docs/SENTINEL.md b/docs/SENTINEL.md index 04af2042..a2ca21ef 100644 --- a/docs/SENTINEL.md +++ b/docs/SENTINEL.md @@ -161,10 +161,10 @@ A sub-agent is a **separately-invoked tool call** (e.g., `task`, `dispatch`) exe Aggregate findings from all Phase 2 sub-agents, then classify using exactly these priority levels: - πŸ”΄ **CRITICAL**: blocks merge β€” security vulnerability, data loss/corruption, breaking change, incorrect behavior under normal usage, missing evidence, failing tests, TDD failure -- 🟑 **IMPORTANT**: concrete improvements with an articulated risk path. Each 🟑 must state: (1) **trigger** β€” what action or input activates the path, (2) **mechanism** β€” the reachable code path from trigger to failure, (3) **consequence** β€” the observable damage (data loss, error, degraded UX, outage). Missing any element β†’ 🟒, not 🟑. Requires follow-ups tracked as GitHub issues. **If a 🟑 could cause data loss, security exposure, cascading outage, or incorrect behavior under normal usage β†’ reclassify as πŸ”΄.** Concerns without an articulated risk path β†’ 🟒, not 🟑. **🟑 exclusions (classify as 🟒):** missing CHANGELOG/docs with no release/API/user-impact requirement, "better abstraction" without a failure path, rename/restructure suggestions, stylistic preferences β€” these lack the required triggerβ†’mechanismβ†’consequence chain. +- 🟑 **IMPORTANT**: concrete improvements with an articulated risk path. Each 🟑 must state: (1) **trigger** β€” what action or input activates the path, (2) **mechanism** β€” the reachable code path from trigger to failure, (3) **consequence** β€” the observable damage (data loss, error, degraded UX, outage). Missing any element β†’ 🟒, not 🟑. Requires follow-ups tracked as GitHub issues. **If a 🟑 could cause data loss, security exposure, cascading outage, or incorrect behavior under normal usage β†’ reclassify as πŸ”΄.** Concerns without an articulated risk path β†’ 🟒, not 🟑. **🟑 exclusions (classify as 🟒):** missing CHANGELOG (always 🟒 β€” never 🟑), missing docs with no release/API/user-impact requirement, "better abstraction" without a failure path, rename/restructure suggestions, stylistic preferences β€” these lack the required triggerβ†’mechanismβ†’consequence chain. - 🟒 **MINOR**: polish, theoretical improvements, or speculative edge cases where no reachable trigger, concrete failure mode, or material impact is identified; does not block. **Materiality floor:** omit entirely (do not file even as 🟒) any finding whose own rationale calls the impact immaterial, negligible, or immeasurable; batch trivial polish into a single 🟒. -**Severity adjustment:** The orchestrator may reclassify 🟑 β†’ πŸ”΄ per the rule above, or 🟑 β†’ 🟒 when the finding lacks an articulated risk path. **NEVER** πŸ”΄ β†’ 🟑/🟒. Sub-agent πŸ”΄ severity is a floor; 🟑 is advisory and subject to orchestrator calibration. +**Severity adjustment:** The orchestrator may reclassify 🟑 β†’ πŸ”΄ per the rule above, or 🟑 β†’ 🟒 when the finding lacks an articulated risk path. **NEVER** πŸ”΄ β†’ 🟑/🟒. Sub-agent πŸ”΄ severity is a floor; 🟑 is advisory and subject to orchestrator calibration. Apply [`sentinel/SEVERITY-RUBRIC.md`](sentinel/SEVERITY-RUBRIC.md) β€” version-pinned decision procedure + golden worked-examples β€” for reproducible severity regardless of which agent orchestrates. **Cross-dimension findings:** Findings prefixed `[Cross: Dim X]` from one sub-agent that duplicate a finding from the target dimension β†’ consolidate. If the target dimension missed it β†’ adopt the cross-referenced finding at the target dimension's severity default. @@ -234,7 +234,7 @@ Required action: MERGE | FILE_ISSUES_AND_MERGE | FIX_AND_REINVOKE **`Required action` mapping**: APPROVEDβ†’MERGE, CONDITIONALβ†’FILE_ISSUES_AND_MERGE, REJECTEDβ†’FIX_AND_REINVOKE. Mismatch = malformed report; re-run Sentinel. ## Phase 5 β€” Persist report (REQUIRED) -Before returning, persist the FULL report to a durable location so the merge commit's `Report ID + SHA` stays auditable even if the parent's context drops the report. Preferred: post it to the reviewed PR via `gh pr review --body-file --comment`. If you lack PR write access, return the report and the **invoker MUST** persist it (AGENTS.md Β§After Sentinel). Persisting your own report is reporting, not a code change β€” it does not violate read-only. Record the persisted URL/path in the Phase 2 Execution Log. Returning the report as agent text only is INSUFFICIENT. +Before returning, persist the FULL report to a durable location so the merge commit's `Report ID + SHA` stays auditable even if the parent's context drops the report. Preferred: post it to the reviewed PR via `gh pr review --body-file --comment`. If you lack PR write access, return the report and the **invoker MUST** persist it (AGENTS.md Β§After Sentinel). A committed `.sentinel/reports/.md` fallback MUST land on a persisted branch β€” **never inside a throwaway/ephemeral verification worktree** (for isolated checks use a repo-relative scratch path like `.worktrees/sentinel-`, treat it as scratch not storage, and clean it up β€” it is not a durable report location). Persisting your own report is reporting, not a code change β€” it does not violate read-only. Record the persisted URL/path in the Phase 2 Execution Log. Returning the report as agent text only is INSUFFICIENT. ## Deploy / release gating (optional) If asked to gate a deploy/release, require evidence that: release SHA matches a reviewed `main` SHA with green suite + passing build; no open πŸ”΄ issues; all 🟑 resolved or risk-accepted (rationale on issue); versioning/changelog updated. diff --git a/docs/sentinel/SEVERITY-RUBRIC.md b/docs/sentinel/SEVERITY-RUBRIC.md new file mode 100644 index 00000000..26a1a164 --- /dev/null +++ b/docs/sentinel/SEVERITY-RUBRIC.md @@ -0,0 +1,58 @@ +# Sentinel Severity Rubric (v1) + +**Orchestrator Phase 3 calibration reference.** Purpose: make severity verdicts +**reproducible across reviewers** β€” the same finding class yields the same severity +regardless of which agent orchestrates. Applied AFTER sub-agent findings aggregate +(SENTINEL.md Phase 3). Sub-agent πŸ”΄ is a floor; 🟑/🟒 are advisory and re-calibrated here. + +## Decision procedure (apply in order) +1. Does the finding have a concrete **trigger β†’ reachable mechanism β†’ observable + consequence**? No β†’ 🟒 (or omit entirely if its own rationale calls the impact + immaterial/negligible). +2. Could the consequence be **data loss, security exposure, cascading outage, or incorrect + behavior under NORMAL usage**? Yes β†’ πŸ”΄. +3. Otherwise a concrete improvement with an articulated risk path β†’ 🟑 (file as issue). +4. **NEVER** downgrade a sub-agent πŸ”΄. **NEVER** πŸ”΄ β†’ 🟑/🟒. + +## Tiers +- πŸ”΄ **CRITICAL** β€” blocks merge (REJECTED). Security vuln, data loss/corruption, breaking + change, incorrect behavior under normal usage, missing evidence, failing tests, TDD failure. +- 🟑 **IMPORTANT** β€” concrete fix with trigger+mechanism+consequence; does not block; filed + as a GitHub issue (CONDITIONAL). +- 🟒 **MINOR** β€” polish, theoretical/unreachable, or no articulated risk path; does not block. + +## Golden examples (canonical β€” match each new finding to the nearest row) + +| Finding | Severity | Why (decision step) | +|---------|----------|---------------------| +| Jitter applied to a server-mandated `Retry-After`, shortening the cooldown | πŸ”΄ | Incorrect under normal usage (2) β€” can extend throttling | +| Stale retry-overlay freezes health tiles at retry-time data (shows "passing" after failure) | πŸ”΄ | Incorrect under normal usage (2) β€” user sees wrong state | +| New data layer never executed by any test (wiring tests mock the hook), hiding a latent bug | πŸ”΄ | Untested path concealing a real bug β€” Dim D gaming + (2) | +| Retry storm: retries without jitter causing coordinated load spikes | πŸ”΄ | Cascading outage (2) | +| Non-idempotent mutation on a retried path (payment/provisioning) | πŸ”΄ | Data corruption under normal retry (2) | +| Missing timeout on a request-critical network call that can exhaust connections | πŸ”΄ | Cascading outage (2) | +| Untrusted input reaches an injection sink (SQL/shell/HTML/template) without escaping or parameterization | πŸ”΄ | Exploitable security vuln (2) | +| Non-CSPRNG (`Math.random()`) used to generate a token, session ID, password, or secret | πŸ”΄ | Predictable secret β†’ security exposure (2) | +| New dependency with a `postinstall` script, a typosquatted name, or a swapped `resolved` URL / integrity hash in the lockfile | πŸ”΄ | Supply-chain compromise (2) | +| Missing timeout on a non-critical, bounded background call | 🟑 | Reachable risk, bounded blast radius (3) | +| Test asserts an outcome but uses no concrete-value oracle (non-discriminating) | 🟑 | Reachable: a wrong value would still pass; harden it (3) | +| Untested new error/branch path with a plausible trigger | 🟑 | Articulated risk path (3) | +| Defensive guard whose trigger is unreachable given current callers | 🟒 | No reachable trigger (1) | +| `Math.random()` used for UI animation / visual jitter (no security surface) | 🟒 | No security surface (1) β€” contrast the CSPRNG πŸ”΄ above | +| Dependency bump of an unused or dev-only package; no API/behavior change | 🟒 | No reachable risk (1) β€” contrast the typosquat πŸ”΄ above | +| **Missing CHANGELOG entry** | 🟒 | Non-behavioral convention; no triggerβ†’mechanismβ†’consequence (1). **NEVER 🟑.** | +| Missing/incomplete docs with no release/API/user-impact requirement | 🟒 | No risk path (1) | +| Rename / restructure / "better abstraction" without a failure path | 🟒 | No risk path (1) | +| Stylistic preference (formatting, naming) | 🟒 | No risk path; batch into one 🟒 | + +## Borderline rules +- **🟑 β†’ πŸ”΄** when the risk path reaches data loss, security exposure, cascading outage, or + normal-usage-incorrect behavior. +- **🟑 β†’ 🟒** when there is no trigger, no reachable mechanism, or an immaterial consequence. +- **Pre-existing** issue the diff neither introduces nor newly reaches β†’ 🟒 max (never πŸ”΄/🟑). +- Finding matches an open `sentinel:*` issue (same mechanism + fix) β†’ **Known** (excluded + from verdict). πŸ”΄ can **NEVER** be Known. + +## Version +Rubric **v1** β€” bound to SENTINEL.md ruleset v1 (agents-template v0.19.0). Bump this version +whenever severity semantics change, so verdicts stay reproducible against a pinned rubric. diff --git a/docs/sentinel/dim-f-documentation.md b/docs/sentinel/dim-f-documentation.md index c39da227..25cb3767 100644 --- a/docs/sentinel/dim-f-documentation.md +++ b/docs/sentinel/dim-f-documentation.md @@ -19,7 +19,7 @@ Findings must originate from changed lines or code whose reachability, inputs, o ### Accuracy & completeness - README reflects current behavior β€” if the diff changes user-facing behavior and no docs are touched, flag 🟑 "docs likely needed." Only claim "README updated correctly" when README sections are modified in the diff. -- CHANGELOG updated β€” user-facing changes documented; if CHANGELOG is absent from the diff and release-tooling config exists in the repo, skip this check (release tooling generates CHANGELOG from commits/changesets) +- CHANGELOG updated β€” user-facing changes documented. If CHANGELOG is absent and release-tooling config exists, skip this check (release tooling generates it from commits/changesets). Otherwise a missing CHANGELOG is **🟒 MINOR only (never 🟑)** β€” a non-behavioral convention. - API docs current β€” endpoint signatures, parameters, response shapes match implementation - New features documented β€” discoverable without reading source code - Deprecated features noted β€” migration path or removal timeline provided