diff --git a/.gitignore b/.gitignore index ca203ce8..ab314b07 100644 --- a/.gitignore +++ b/.gitignore @@ -845,7 +845,6 @@ CLAUDE.md backend/CLAUDE.md frontend/CLAUDE.md docs/guides/DEVELOPER_DOCUMENTATION_STYLE_GUIDE.md -SESSION_LOG.md BACKLOG.md MONGODB_INDEX_MIGRATION_PLAN.md docs/engineering/navigation_mvp_plan.md diff --git a/BACKLOG.md b/BACKLOG.md index 7ecc6a89..99881815 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -3,7 +3,7 @@ > **Purpose**: Single source of truth for **pending** work in the OpenWatch Go rebuild (repo root). > Completed work is removed from this file; provenance lives in the commit history + `SESSION_LOG.md`. -**Last Updated**: 2026-06-15 +**Last Updated**: 2026-06-16 **Active Tree**: repo root (Go backend `cmd/`+`internal/`, React/TypeScript `frontend/`) **Frozen Tree**: the legacy Python/FastAPI backend was archived out of the repo on 2026-06-05 to `~/hanalyx/OWAR/openwatch-python/` (see CLAUDE.md) @@ -20,15 +20,6 @@ --- -## Packaging / Install - -| ID | Item | Priority | Status | Notes | -|----|------|----------|--------|-------| -| PKG-1 | RPM/DEB install does not provision identity keys — fresh install fails to boot | **P0** | Open | `systemctl enable --now openwatch.service` on a fresh rc7 RPM dies with `/etc/openwatch/keys/jwt_private.pem: no such file or directory`. Root cause: the app deliberately refuses to auto-generate signing keys in production (`cmd/openwatch/main.go:177-204` exits if `identity.jwt_private_key` / `credential_key_file` are missing — no ephemeral fallback), but neither the RPM `%post` (`packaging/rpm/openwatch.spec` — only `daemon-reload`) nor the DEB `postinst` creates `/etc/openwatch/keys/` or generates the keys. The shipped `openwatch.toml` has no `[identity]` section, so the `config.go:81-82` defaults apply: `/etc/openwatch/keys/jwt_private.pem` + `/etc/openwatch/keys/credential.key`. Fix: have `%post`/`postinst` idempotently (only if absent) create `/etc/openwatch/keys/` (0750 root:openwatch) and generate (a) RSA-2048 PEM JWT key (0640 root:openwatch) and (b) 32 raw-byte credential DEK (0600 openwatch:openwatch — `secretkey.LoadFromFile` rejects any group/other perm bits and any length != 32), same pattern as the demo TLS cert. Two keys, distinct rotation semantics: regenerating the credential DEK makes stored SSH creds + MFA secrets undecryptable, so generate-once + back-up; the JWT key only invalidates live sessions. Manual workaround documented for operators in the meantime | -| PKG-2 | Native package ships no Kensa rule corpus — scanning is dead out of the box | **P0** | Open | The Kensa *engine* is compiled into the binary (`github.com/Hanalyx/kensa v0.4.3` in `go.mod`, integration in `internal/kensa/`), but the ~508-rule YAML corpus is **not** bundled. `kensa.LoadRules` reads from disk at `DefaultPath = /usr/share/kensa/rules` (kensa `pkg/kensa/rules.go`); only `varsub/embedded/defaults.yml` (variable defaults) is `go:embed`ed, not the rules. `packaging/rpm/build-rpm.sh` copies only the binary + `openwatch.toml` + `openwatch.service`; `grep -i kensa packaging/` is empty. `cmd/openwatch/main.go:460-468` (C-16) states the design intent: production/air-gapped installs "rely on the signed kensa-rules package at the loader's default path" — but no such package is produced in this repo. Effect on a fresh install: server boots (rule loading is non-fatal — `main.go:473-496` warns and continues) but scans return no results, `/api/v1/rules` → 503, failed-rules titles fall back to bare rule ids, scan-variables surface disabled. `OPENWATCH_KENSA_RULES_DIR` is a dev-only override (warned loudly). Fix: produce + ship a `kensa-rules` package landing the corpus at `/usr/share/kensa/rules` (RPM `Requires:` / DEB `Depends:`), or vendor the corpus into the openwatch package. Blocks first-run usefulness | - ---- - ## Active Work — Host Detail | Item | Priority | Status | Notes | @@ -105,6 +96,7 @@ Gaps identified comparing `docs/KENSA_OPENWATCH_BOUNDARY.md` against current Ope | Item | Priority | Status | Notes | |------|----------|--------|-------| +| Wire SSH auth/sudo learning into discovery / intelligence / liveness | P2 | Not started | #566 landed the substrate (`internal/connprofile` + migration `0035`) and the dial-layer mechanism (`ssh.DialOptions.PreferAuth`/`ObservedAuth`), and wired the **scan** path (auth ordering + per-connection sudo-mode probe, recorded per host). The other three SSH paths — OS discovery, OS intelligence collector, liveness privilege probe — still re-probe each cycle instead of leading with the remembered choice. Their `SSHTransport.Dial` (`internal/intelligence/discovery`, shared by the collector) and the `internal/sshprivilege` dialer take no `hostID`, so thread it + a `connprofile` reader/recorder through. Mechanical — reuses everything from #566 | | Retention sweep for soft-deleted hosts | P3 | Not started | Soft-deleted hosts (`hosts.deleted_at` set by `internal/host/host.go:298 SoftDelete`) are retained **indefinitely** — no purge job exists anywhere, confirmed by a soft-deleted row from 2026-05-25 still present in the dev DB. The row stays for scan-history/audit integrity but is hidden from every query (`WHERE deleted_at IS NULL`). Add an optional retention sweep (a daemon-orchestration tick that hard-deletes `hosts WHERE deleted_at < now() - $window`, cascading scan history), with an operator-configurable window that defaults to disabled (keep forever). Low urgency — host volume is trivial — but closes unbounded soft-deleted-row growth and gives operators a real "forget this host" path | | `PATCH /api/v1/credentials/{id}` — in-place credential update | P2 | Deferred | Frontend uses replace-on-save (`` runs `POST → DELETE`). Real PATCH would close the orphan-credential failure mode | | `POST /api/v1/bulk/hosts/analyze-csv` + `import-with-mapping` | P2 | Deferred | Today the wizard runs CSV analysis client-side and submits row-by-row — no atomic semantics, no "update existing", no row caps | @@ -125,6 +117,18 @@ Gaps identified comparing `docs/KENSA_OPENWATCH_BOUNDARY.md` against current Ope --- +## Deferred Dependency Upgrades + +Dependabot major bumps closed (skipped) 2026-06-16, with the reason + revisit path. Dependabot re-raises each on its next version bump. + +| Dependency | Bump | Why deferred | +|------------|------|--------------| +| `@mui/material` | 7 → 9 | Two majors (Grid v2 API + theme/styling breaking changes). Real component migration; do as a dedicated PR (test empirically first) | +| `eslint` + `@eslint/js` + `eslint-plugin-react-hooks` | 9 → 10 / 5 → 7 | **Blocked upstream** — `typescript-eslint@8.61` / `eslint-plugin-react@7.37` peer-dep on eslint ≤9, so eslint 10 won't install (`ERESOLVE`). Revisit when the lint ecosystem supports eslint 10; land as one combined PR | +| `sigstore/cosign-installer` | 3.7 → 4.x | Installs cosign 3.0.5, which **breaks our offline key-based release signing** (`--tlog-upload` removed; default `--new-bundle-format` ignores `--output-signature`; `verify-blob` wants the rekor tlog). When done: pin `with: cosign-release: v2.6.1` (keep current signing) OR migrate to the bundle format + update `RELEASING.md`/`KEYS` verify steps | + +--- + ## CI / Quality | Item | Priority | Notes | diff --git a/SESSION_LOG.md b/SESSION_LOG.md new file mode 100644 index 00000000..02808847 --- /dev/null +++ b/SESSION_LOG.md @@ -0,0 +1,99 @@ +# SESSION_LOG.md — OpenWatch Go session handoff + +Append-only handoff log, most recent first. Each entry records what shipped, +what's next, and gotchas. Completed BACKLOG items are removed from BACKLOG.md +and their provenance lives here + in the commit history. + +--- + +## 2026-06-16 — Opus 4.8 (1M context) + +**Done** (all merged to `main`): + +- **SSH full auth/sudo matrix + per-host learning (#566).** The compliance + scan can now escalate with a sudo password (`sudo -S -p ''` over the SSH + session stdin; it previously hardcoded `sudo -n`, so password-sudo hosts + were inventoried but not scanned). New `internal/connprofile` store + + migration `0035_host_connection_profile` remembers the last-good SSH auth + method + sudo mode per host. Shared dial layer gained + `DialOptions.PreferAuth` (lead with the known-good method, avoid a doomed + publickey attempt that trips fail2ban/MaxAuthTries) + `ObservedAuth`. + `AllowCredentialSudoPassword` flipped to **default-on** (kill-switch). A + 4-agent adversarial review caught + fixed a real gap: the scan path didn't + honor the kill-switch / auth-method gate (now `system-connection-profile` + C-11 / AC-07, gated via `sudoPasswordFor`). + +- **Packaging — fresh install + upgrade (#564, #569).** #564: the Kensa rule + corpus now ships as a separate `kensa-rules` package (noarch RPM / + `Architecture: all` DEB) that openwatch `Requires`/`Depends` on, and the + RPM `%post` / DEB `postinst` provision the identity keys (RSA-2048 JWT + + 32-byte DEK, generate-if-absent) — both were P0 fresh-install blockers + (PKG-1/PKG-2). #569: **one-command upgrade** — `dnf/apt update` runs + `openwatch migrate` automatically on upgrade with a `pg_dump` restore point + and a fail-safe (`openwatch-upgrade.sh`: stop → backup+migrate → start, or + leave stopped on failure). New `internal/dbbackup` (pg_dump via PG* env, + never argv), `openwatch migrate --status/--backup-dir`, a daily + backup-cleanup systemd timer, `/etc/openwatch/upgrade.conf`, the + `release-upgrade` spec, `docs/runbooks/UPGRADING.md`, a container + upgrade-test (`packaging/tests/upgrade-container-test.sh`) and an + `upgrade-smoke` CI job. PostgreSQL **engine** major-version upgrades are + deliberately out of scope (operator-supervised `pg_upgrade`). + +- **CI gate speedup (#567) + specter results untrack (#568).** Collapsed the + two full test passes (`make test-race` + a separate json run) into one + `go test -race -json`, and cached golangci-lint — the "Quality + security + gates" gate dropped from ~23 min to ~13 min. Untracked the stale + committed `.specter-results.json` (CI regenerates it; the committed copy + only drifted and produced misleading local `specter coverage` reports). + +- **Settings + cleanup (#561, #562, #563).** SMTP channel edit pre-fill + + self-hosted fonts for air-gap (#561); removed all demo/fixture data from + the frontend (#562); backlog cleanup + CI/regression follow-up items + (#563). + +- **Dependency triage (14 Dependabot PRs).** Merged 9 (form-data security, + react-hook-form, setup-go 5→6, github-script 7→9, **vite 6→8 / vitest 3→4**, + action-gh-release 2→3, **lucide 0→1**, **zod 3→4 + @hookform/resolvers + 3→5**) — each frontend major empirically verified (tsc + build + 286 tests) + before merge. Skipped 6 with documented reasons: @types/node 25 (we're on + Node 20), eslint 10 ×3 (blocked upstream — typescript-eslint/eslint-plugin-react + peer-dep on eslint ≤9), cosign-installer v4 (breaks signing), MUI 7→9 + (deferred migration). + +**Next:** + +- **SSH learning follow-up** — wire the `connprofile` memo into the + discovery / intelligence / liveness paths (their `SSHTransport.Dial` / + `sshprivilege` dialer need `hostID` threaded through). Substrate + dial + mechanism already landed in #566. +- **Deferred dependency migrations** — MUI 7→9 (Grid v2 + theme), eslint 10 + (when typescript-eslint/eslint-plugin-react support it), cosign-installer + v4 (pin `cosign-release: v2.6.1` OR migrate to the bundle format + update + RELEASING.md/KEYS verify steps). All closed; Dependabot re-raises on the + next version. +- Standing BACKLOG items: raise the specter gate to 100%, CI-speed + (per-package DB isolation, job split), regression-coverage gaps (live-host + SSH/sudo test, Playwright E2E, negative-path security ACs). + +**Notes / gotchas:** + +- `.gitignore` aggressively ignores broad patterns (`*.spec`, `*test*.sh`, + `*test*.md`, …). Run `git check-ignore -v ` before pushing any new + generically-named file — it silently ate a `.spec` and two `*test*.sh` + this session. +- RPM scriptlets run with a **restricted PATH** (`/sbin:/bin:/usr/sbin:/usr/bin`, + no `/usr/local/bin`) — relevant for shims/helpers they invoke. +- **cosign 3** breaks our offline key-based detached-signature signing + (`--tlog-upload` removed; default `--new-bundle-format` ignores + `--output-signature`; `verify-blob` wants the rekor tlog). Hence #539 skipped. +- **Kensa v0.5.0** adds `api.HostConfig.SudoPassword` (sudo-with-password + across check/remediate/rollback). We already match the mechanism in #566, + and we use our own `TransportFactory`, so no change needed when we bump the + kensa dep — the field is additive. +- **Test DB:** an isolated throwaway Postgres runs in docker on `:5433` + (`ow_test`). NEVER point tests at the real `openwatch_go_dev` (one earlier + session truncated real data that way). +- Branch protection requires **up-to-date branches**, so each merge + re-BEHINDs the other open PRs — `update-branch` + re-run CI per PR. + +---