Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AGENTS.md — Council

<!-- agents-template v0.18.0 -->
<!-- agents-template v0.19.0 -->

<role>You write tests before code, work in isolated worktree branches, and never merge without Sentinel review. These rules are enforced mechanically — Sentinel verifies compliance on every PR and non-compliant work is rejected.</role>

Expand Down
6 changes: 3 additions & 3 deletions docs/SENTINEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,10 @@ A sub-agent is a **separately-invoked tool call** (e.g., `task`, `dispatch`) exe

Aggregate findings from all Phase 2 sub-agents, then classify using exactly these priority levels:
- 🔴 **CRITICAL**: blocks merge — security vulnerability, data loss/corruption, breaking change, incorrect behavior under normal usage, missing evidence, failing tests, TDD failure
- 🟡 **IMPORTANT**: concrete improvements with an articulated risk path. Each 🟡 must state: (1) **trigger** — what action or input activates the path, (2) **mechanism** — the reachable code path from trigger to failure, (3) **consequence** — the observable damage (data loss, error, degraded UX, outage). Missing any element → 🟢, not 🟡. Requires follow-ups tracked as GitHub issues. **If a 🟡 could cause data loss, security exposure, cascading outage, or incorrect behavior under normal usage → reclassify as 🔴.** Concerns without an articulated risk path → 🟢, not 🟡. **🟡 exclusions (classify as 🟢):** missing CHANGELOG/docs with no release/API/user-impact requirement, "better abstraction" without a failure path, rename/restructure suggestions, stylistic preferences — these lack the required trigger→mechanism→consequence chain.
- 🟡 **IMPORTANT**: concrete improvements with an articulated risk path. Each 🟡 must state: (1) **trigger** — what action or input activates the path, (2) **mechanism** — the reachable code path from trigger to failure, (3) **consequence** — the observable damage (data loss, error, degraded UX, outage). Missing any element → 🟢, not 🟡. Requires follow-ups tracked as GitHub issues. **If a 🟡 could cause data loss, security exposure, cascading outage, or incorrect behavior under normal usage → reclassify as 🔴.** Concerns without an articulated risk path → 🟢, not 🟡. **🟡 exclusions (classify as 🟢):** missing CHANGELOG (always 🟢 — never 🟡), missing docs with no release/API/user-impact requirement, "better abstraction" without a failure path, rename/restructure suggestions, stylistic preferences — these lack the required trigger→mechanism→consequence chain.
- 🟢 **MINOR**: polish, theoretical improvements, or speculative edge cases where no reachable trigger, concrete failure mode, or material impact is identified; does not block. **Materiality floor:** omit entirely (do not file even as 🟢) any finding whose own rationale calls the impact immaterial, negligible, or immeasurable; batch trivial polish into a single 🟢.

**Severity adjustment:** The orchestrator may reclassify 🟡 → 🔴 per the rule above, or 🟡 → 🟢 when the finding lacks an articulated risk path. **NEVER** 🔴 → 🟡/🟢. Sub-agent 🔴 severity is a floor; 🟡 is advisory and subject to orchestrator calibration.
**Severity adjustment:** The orchestrator may reclassify 🟡 → 🔴 per the rule above, or 🟡 → 🟢 when the finding lacks an articulated risk path. **NEVER** 🔴 → 🟡/🟢. Sub-agent 🔴 severity is a floor; 🟡 is advisory and subject to orchestrator calibration. Apply [`sentinel/SEVERITY-RUBRIC.md`](sentinel/SEVERITY-RUBRIC.md) — version-pinned decision procedure + golden worked-examples — for reproducible severity regardless of which agent orchestrates.

**Cross-dimension findings:** Findings prefixed `[Cross: Dim X]` from one sub-agent that duplicate a finding from the target dimension → consolidate. If the target dimension missed it → adopt the cross-referenced finding at the target dimension's severity default.

Expand Down Expand Up @@ -234,7 +234,7 @@ Required action: MERGE | FILE_ISSUES_AND_MERGE | FIX_AND_REINVOKE
**`Required action` mapping**: APPROVED→MERGE, CONDITIONAL→FILE_ISSUES_AND_MERGE, REJECTED→FIX_AND_REINVOKE. Mismatch = malformed report; re-run Sentinel.

## Phase 5 — Persist report (REQUIRED)
Before returning, persist the FULL report to a durable location so the merge commit's `Report ID + SHA` stays auditable even if the parent's context drops the report. Preferred: post it to the reviewed PR via `gh pr review <pr> --body-file <report> --comment`. If you lack PR write access, return the report and the **invoker MUST** persist it (AGENTS.md §After Sentinel). Persisting your own report is reporting, not a code change — it does not violate read-only. Record the persisted URL/path in the Phase 2 Execution Log. Returning the report as agent text only is INSUFFICIENT.
Before returning, persist the FULL report to a durable location so the merge commit's `Report ID + SHA` stays auditable even if the parent's context drops the report. Preferred: post it to the reviewed PR via `gh pr review <pr> --body-file <report> --comment`. If you lack PR write access, return the report and the **invoker MUST** persist it (AGENTS.md §After Sentinel). A committed `.sentinel/reports/<id>.md` fallback MUST land on a persisted branch — **never inside a throwaway/ephemeral verification worktree** (for isolated checks use a repo-relative scratch path like `.worktrees/sentinel-<id>`, treat it as scratch not storage, and clean it up — it is not a durable report location). Persisting your own report is reporting, not a code change — it does not violate read-only. Record the persisted URL/path in the Phase 2 Execution Log. Returning the report as agent text only is INSUFFICIENT.

## Deploy / release gating (optional)
If asked to gate a deploy/release, require evidence that: release SHA matches a reviewed `main` SHA with green suite + passing build; no open 🔴 issues; all 🟡 resolved or risk-accepted (rationale on issue); versioning/changelog updated.
Expand Down
58 changes: 58 additions & 0 deletions docs/sentinel/SEVERITY-RUBRIC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Sentinel Severity Rubric (v1)

**Orchestrator Phase 3 calibration reference.** Purpose: make severity verdicts
**reproducible across reviewers** — the same finding class yields the same severity
regardless of which agent orchestrates. Applied AFTER sub-agent findings aggregate
(SENTINEL.md Phase 3). Sub-agent 🔴 is a floor; 🟡/🟢 are advisory and re-calibrated here.

## Decision procedure (apply in order)
1. Does the finding have a concrete **trigger → reachable mechanism → observable
consequence**? No → 🟢 (or omit entirely if its own rationale calls the impact
immaterial/negligible).
2. Could the consequence be **data loss, security exposure, cascading outage, or incorrect
behavior under NORMAL usage**? Yes → 🔴.
3. Otherwise a concrete improvement with an articulated risk path → 🟡 (file as issue).
4. **NEVER** downgrade a sub-agent 🔴. **NEVER** 🔴 → 🟡/🟢.

## Tiers
- 🔴 **CRITICAL** — blocks merge (REJECTED). Security vuln, data loss/corruption, breaking
change, incorrect behavior under normal usage, missing evidence, failing tests, TDD failure.
- 🟡 **IMPORTANT** — concrete fix with trigger+mechanism+consequence; does not block; filed
as a GitHub issue (CONDITIONAL).
- 🟢 **MINOR** — polish, theoretical/unreachable, or no articulated risk path; does not block.

## Golden examples (canonical — match each new finding to the nearest row)

| Finding | Severity | Why (decision step) |
|---------|----------|---------------------|
| Jitter applied to a server-mandated `Retry-After`, shortening the cooldown | 🔴 | Incorrect under normal usage (2) — can extend throttling |
| Stale retry-overlay freezes health tiles at retry-time data (shows "passing" after failure) | 🔴 | Incorrect under normal usage (2) — user sees wrong state |
| New data layer never executed by any test (wiring tests mock the hook), hiding a latent bug | 🔴 | Untested path concealing a real bug — Dim D gaming + (2) |
| Retry storm: retries without jitter causing coordinated load spikes | 🔴 | Cascading outage (2) |
| Non-idempotent mutation on a retried path (payment/provisioning) | 🔴 | Data corruption under normal retry (2) |
| Missing timeout on a request-critical network call that can exhaust connections | 🔴 | Cascading outage (2) |
| Untrusted input reaches an injection sink (SQL/shell/HTML/template) without escaping or parameterization | 🔴 | Exploitable security vuln (2) |
| Non-CSPRNG (`Math.random()`) used to generate a token, session ID, password, or secret | 🔴 | Predictable secret → security exposure (2) |
| New dependency with a `postinstall` script, a typosquatted name, or a swapped `resolved` URL / integrity hash in the lockfile | 🔴 | Supply-chain compromise (2) |
| Missing timeout on a non-critical, bounded background call | 🟡 | Reachable risk, bounded blast radius (3) |
| Test asserts an outcome but uses no concrete-value oracle (non-discriminating) | 🟡 | Reachable: a wrong value would still pass; harden it (3) |
| Untested new error/branch path with a plausible trigger | 🟡 | Articulated risk path (3) |
| Defensive guard whose trigger is unreachable given current callers | 🟢 | No reachable trigger (1) |
| `Math.random()` used for UI animation / visual jitter (no security surface) | 🟢 | No security surface (1) — contrast the CSPRNG 🔴 above |
| Dependency bump of an unused or dev-only package; no API/behavior change | 🟢 | No reachable risk (1) — contrast the typosquat 🔴 above |
| **Missing CHANGELOG entry** | 🟢 | Non-behavioral convention; no trigger→mechanism→consequence (1). **NEVER 🟡.** |
| Missing/incomplete docs with no release/API/user-impact requirement | 🟢 | No risk path (1) |
| Rename / restructure / "better abstraction" without a failure path | 🟢 | No risk path (1) |
| Stylistic preference (formatting, naming) | 🟢 | No risk path; batch into one 🟢 |

## Borderline rules
- **🟡 → 🔴** when the risk path reaches data loss, security exposure, cascading outage, or
normal-usage-incorrect behavior.
- **🟡 → 🟢** when there is no trigger, no reachable mechanism, or an immaterial consequence.
- **Pre-existing** issue the diff neither introduces nor newly reaches → 🟢 max (never 🔴/🟡).
- Finding matches an open `sentinel:*` issue (same mechanism + fix) → **Known** (excluded
from verdict). 🔴 can **NEVER** be Known.

## Version
Rubric **v1** — bound to SENTINEL.md ruleset v1 (agents-template v0.19.0). Bump this version
whenever severity semantics change, so verdicts stay reproducible against a pinned rubric.
2 changes: 1 addition & 1 deletion docs/sentinel/dim-f-documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Findings must originate from changed lines or code whose reachability, inputs, o

### Accuracy & completeness
- README reflects current behavior — if the diff changes user-facing behavior and no docs are touched, flag 🟡 "docs likely needed." Only claim "README updated correctly" when README sections are modified in the diff.
- CHANGELOG updated — user-facing changes documented; if CHANGELOG is absent from the diff and release-tooling config exists in the repo, skip this check (release tooling generates CHANGELOG from commits/changesets)
- CHANGELOG updated — user-facing changes documented. If CHANGELOG is absent and release-tooling config exists, skip this check (release tooling generates it from commits/changesets). Otherwise a missing CHANGELOG is **🟢 MINOR only (never 🟡)** — a non-behavioral convention.
- API docs current — endpoint signatures, parameters, response shapes match implementation
- New features documented — discoverable without reading source code
- Deprecated features noted — migration path or removal timeline provided
Expand Down
Loading