Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -845,7 +845,6 @@ CLAUDE.md
backend/CLAUDE.md
frontend/CLAUDE.md
docs/guides/DEVELOPER_DOCUMENTATION_STYLE_GUIDE.md
SESSION_LOG.md
BACKLOG.md
MONGODB_INDEX_MIGRATION_PLAN.md
docs/engineering/navigation_mvp_plan.md
Expand Down
24 changes: 14 additions & 10 deletions BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
> **Purpose**: Single source of truth for **pending** work in the OpenWatch Go rebuild (repo root).
> Completed work is removed from this file; provenance lives in the commit history + `SESSION_LOG.md`.

**Last Updated**: 2026-06-15
**Last Updated**: 2026-06-16
**Active Tree**: repo root (Go backend `cmd/`+`internal/`, React/TypeScript `frontend/`)
**Frozen Tree**: the legacy Python/FastAPI backend was archived out of the repo on 2026-06-05 to `~/hanalyx/OWAR/openwatch-python/` (see CLAUDE.md)

Expand All @@ -20,15 +20,6 @@

---

## Packaging / Install

| ID | Item | Priority | Status | Notes |
|----|------|----------|--------|-------|
| PKG-1 | RPM/DEB install does not provision identity keys — fresh install fails to boot | **P0** | Open | `systemctl enable --now openwatch.service` on a fresh rc7 RPM dies with `/etc/openwatch/keys/jwt_private.pem: no such file or directory`. Root cause: the app deliberately refuses to auto-generate signing keys in production (`cmd/openwatch/main.go:177-204` exits if `identity.jwt_private_key` / `credential_key_file` are missing — no ephemeral fallback), but neither the RPM `%post` (`packaging/rpm/openwatch.spec` — only `daemon-reload`) nor the DEB `postinst` creates `/etc/openwatch/keys/` or generates the keys. The shipped `openwatch.toml` has no `[identity]` section, so the `config.go:81-82` defaults apply: `/etc/openwatch/keys/jwt_private.pem` + `/etc/openwatch/keys/credential.key`. Fix: have `%post`/`postinst` idempotently (only if absent) create `/etc/openwatch/keys/` (0750 root:openwatch) and generate (a) RSA-2048 PEM JWT key (0640 root:openwatch) and (b) 32 raw-byte credential DEK (0600 openwatch:openwatch — `secretkey.LoadFromFile` rejects any group/other perm bits and any length != 32), same pattern as the demo TLS cert. Two keys, distinct rotation semantics: regenerating the credential DEK makes stored SSH creds + MFA secrets undecryptable, so generate-once + back-up; the JWT key only invalidates live sessions. Manual workaround documented for operators in the meantime |
| PKG-2 | Native package ships no Kensa rule corpus — scanning is dead out of the box | **P0** | Open | The Kensa *engine* is compiled into the binary (`github.com/Hanalyx/kensa v0.4.3` in `go.mod`, integration in `internal/kensa/`), but the ~508-rule YAML corpus is **not** bundled. `kensa.LoadRules` reads from disk at `DefaultPath = /usr/share/kensa/rules` (kensa `pkg/kensa/rules.go`); only `varsub/embedded/defaults.yml` (variable defaults) is `go:embed`ed, not the rules. `packaging/rpm/build-rpm.sh` copies only the binary + `openwatch.toml` + `openwatch.service`; `grep -i kensa packaging/` is empty. `cmd/openwatch/main.go:460-468` (C-16) states the design intent: production/air-gapped installs "rely on the signed kensa-rules package at the loader's default path" — but no such package is produced in this repo. Effect on a fresh install: server boots (rule loading is non-fatal — `main.go:473-496` warns and continues) but scans return no results, `/api/v1/rules` → 503, failed-rules titles fall back to bare rule ids, scan-variables surface disabled. `OPENWATCH_KENSA_RULES_DIR` is a dev-only override (warned loudly). Fix: produce + ship a `kensa-rules` package landing the corpus at `/usr/share/kensa/rules` (RPM `Requires:` / DEB `Depends:`), or vendor the corpus into the openwatch package. Blocks first-run usefulness |

---

## Active Work — Host Detail

| Item | Priority | Status | Notes |
Expand Down Expand Up @@ -105,6 +96,7 @@ Gaps identified comparing `docs/KENSA_OPENWATCH_BOUNDARY.md` against current Ope

| Item | Priority | Status | Notes |
|------|----------|--------|-------|
| Wire SSH auth/sudo learning into discovery / intelligence / liveness | P2 | Not started | #566 landed the substrate (`internal/connprofile` + migration `0035`) and the dial-layer mechanism (`ssh.DialOptions.PreferAuth`/`ObservedAuth`), and wired the **scan** path (auth ordering + per-connection sudo-mode probe, recorded per host). The other three SSH paths — OS discovery, OS intelligence collector, liveness privilege probe — still re-probe each cycle instead of leading with the remembered choice. Their `SSHTransport.Dial` (`internal/intelligence/discovery`, shared by the collector) and the `internal/sshprivilege` dialer take no `hostID`, so thread it + a `connprofile` reader/recorder through. Mechanical — reuses everything from #566 |
| Retention sweep for soft-deleted hosts | P3 | Not started | Soft-deleted hosts (`hosts.deleted_at` set by `internal/host/host.go:298 SoftDelete`) are retained **indefinitely** — no purge job exists anywhere, confirmed by a soft-deleted row from 2026-05-25 still present in the dev DB. The row stays for scan-history/audit integrity but is hidden from every query (`WHERE deleted_at IS NULL`). Add an optional retention sweep (a daemon-orchestration tick that hard-deletes `hosts WHERE deleted_at < now() - $window`, cascading scan history), with an operator-configurable window that defaults to disabled (keep forever). Low urgency — host volume is trivial — but closes unbounded soft-deleted-row growth and gives operators a real "forget this host" path |
| `PATCH /api/v1/credentials/{id}` — in-place credential update | P2 | Deferred | Frontend uses replace-on-save (`<ReplaceCredentialModal>` runs `POST → DELETE`). Real PATCH would close the orphan-credential failure mode |
| `POST /api/v1/bulk/hosts/analyze-csv` + `import-with-mapping` | P2 | Deferred | Today the wizard runs CSV analysis client-side and submits row-by-row — no atomic semantics, no "update existing", no row caps |
Expand All @@ -125,6 +117,18 @@ Gaps identified comparing `docs/KENSA_OPENWATCH_BOUNDARY.md` against current Ope

---

## Deferred Dependency Upgrades

Dependabot major bumps closed (skipped) 2026-06-16, with the reason + revisit path. Dependabot re-raises each on its next version bump.

| Dependency | Bump | Why deferred |
|------------|------|--------------|
| `@mui/material` | 7 → 9 | Two majors (Grid v2 API + theme/styling breaking changes). Real component migration; do as a dedicated PR (test empirically first) |
| `eslint` + `@eslint/js` + `eslint-plugin-react-hooks` | 9 → 10 / 5 → 7 | **Blocked upstream** — `typescript-eslint@8.61` / `eslint-plugin-react@7.37` peer-dep on eslint ≤9, so eslint 10 won't install (`ERESOLVE`). Revisit when the lint ecosystem supports eslint 10; land as one combined PR |
| `sigstore/cosign-installer` | 3.7 → 4.x | Installs cosign 3.0.5, which **breaks our offline key-based release signing** (`--tlog-upload` removed; default `--new-bundle-format` ignores `--output-signature`; `verify-blob` wants the rekor tlog). When done: pin `with: cosign-release: v2.6.1` (keep current signing) OR migrate to the bundle format + update `RELEASING.md`/`KEYS` verify steps |

---

## CI / Quality

| Item | Priority | Notes |
Expand Down
99 changes: 99 additions & 0 deletions SESSION_LOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# SESSION_LOG.md — OpenWatch Go session handoff

Append-only handoff log, most recent first. Each entry records what shipped,
what's next, and gotchas. Completed BACKLOG items are removed from BACKLOG.md
and their provenance lives here + in the commit history.

---

## 2026-06-16 — Opus 4.8 (1M context)

**Done** (all merged to `main`):

- **SSH full auth/sudo matrix + per-host learning (#566).** The compliance
scan can now escalate with a sudo password (`sudo -S -p ''` over the SSH
session stdin; it previously hardcoded `sudo -n`, so password-sudo hosts
were inventoried but not scanned). New `internal/connprofile` store +
migration `0035_host_connection_profile` remembers the last-good SSH auth
method + sudo mode per host. Shared dial layer gained
`DialOptions.PreferAuth` (lead with the known-good method, avoid a doomed
publickey attempt that trips fail2ban/MaxAuthTries) + `ObservedAuth`.
`AllowCredentialSudoPassword` flipped to **default-on** (kill-switch). A
4-agent adversarial review caught + fixed a real gap: the scan path didn't
honor the kill-switch / auth-method gate (now `system-connection-profile`
C-11 / AC-07, gated via `sudoPasswordFor`).

- **Packaging — fresh install + upgrade (#564, #569).** #564: the Kensa rule
corpus now ships as a separate `kensa-rules` package (noarch RPM /
`Architecture: all` DEB) that openwatch `Requires`/`Depends` on, and the
RPM `%post` / DEB `postinst` provision the identity keys (RSA-2048 JWT +
32-byte DEK, generate-if-absent) — both were P0 fresh-install blockers
(PKG-1/PKG-2). #569: **one-command upgrade** — `dnf/apt update` runs
`openwatch migrate` automatically on upgrade with a `pg_dump` restore point
and a fail-safe (`openwatch-upgrade.sh`: stop → backup+migrate → start, or
leave stopped on failure). New `internal/dbbackup` (pg_dump via PG* env,
never argv), `openwatch migrate --status/--backup-dir`, a daily
backup-cleanup systemd timer, `/etc/openwatch/upgrade.conf`, the
`release-upgrade` spec, `docs/runbooks/UPGRADING.md`, a container
upgrade-test (`packaging/tests/upgrade-container-test.sh`) and an
`upgrade-smoke` CI job. PostgreSQL **engine** major-version upgrades are
deliberately out of scope (operator-supervised `pg_upgrade`).

- **CI gate speedup (#567) + specter results untrack (#568).** Collapsed the
two full test passes (`make test-race` + a separate json run) into one
`go test -race -json`, and cached golangci-lint — the "Quality + security
gates" gate dropped from ~23 min to ~13 min. Untracked the stale
committed `.specter-results.json` (CI regenerates it; the committed copy
only drifted and produced misleading local `specter coverage` reports).

- **Settings + cleanup (#561, #562, #563).** SMTP channel edit pre-fill +
self-hosted fonts for air-gap (#561); removed all demo/fixture data from
the frontend (#562); backlog cleanup + CI/regression follow-up items
(#563).

- **Dependency triage (14 Dependabot PRs).** Merged 9 (form-data security,
react-hook-form, setup-go 5→6, github-script 7→9, **vite 6→8 / vitest 3→4**,
action-gh-release 2→3, **lucide 0→1**, **zod 3→4 + @hookform/resolvers
3→5**) — each frontend major empirically verified (tsc + build + 286 tests)
before merge. Skipped 6 with documented reasons: @types/node 25 (we're on
Node 20), eslint 10 ×3 (blocked upstream — typescript-eslint/eslint-plugin-react
peer-dep on eslint ≤9), cosign-installer v4 (breaks signing), MUI 7→9
(deferred migration).

**Next:**

- **SSH learning follow-up** — wire the `connprofile` memo into the
discovery / intelligence / liveness paths (their `SSHTransport.Dial` /
`sshprivilege` dialer need `hostID` threaded through). Substrate + dial
mechanism already landed in #566.
- **Deferred dependency migrations** — MUI 7→9 (Grid v2 + theme), eslint 10
(when typescript-eslint/eslint-plugin-react support it), cosign-installer
v4 (pin `cosign-release: v2.6.1` OR migrate to the bundle format + update
RELEASING.md/KEYS verify steps). All closed; Dependabot re-raises on the
next version.
- Standing BACKLOG items: raise the specter gate to 100%, CI-speed
(per-package DB isolation, job split), regression-coverage gaps (live-host
SSH/sudo test, Playwright E2E, negative-path security ACs).

**Notes / gotchas:**

- `.gitignore` aggressively ignores broad patterns (`*.spec`, `*test*.sh`,
`*test*.md`, …). Run `git check-ignore -v <path>` before pushing any new
generically-named file — it silently ate a `.spec` and two `*test*.sh`
this session.
- RPM scriptlets run with a **restricted PATH** (`/sbin:/bin:/usr/sbin:/usr/bin`,
no `/usr/local/bin`) — relevant for shims/helpers they invoke.
- **cosign 3** breaks our offline key-based detached-signature signing
(`--tlog-upload` removed; default `--new-bundle-format` ignores
`--output-signature`; `verify-blob` wants the rekor tlog). Hence #539 skipped.
- **Kensa v0.5.0** adds `api.HostConfig.SudoPassword` (sudo-with-password
across check/remediate/rollback). We already match the mechanism in #566,
and we use our own `TransportFactory`, so no change needed when we bump the
kensa dep — the field is additive.
- **Test DB:** an isolated throwaway Postgres runs in docker on `:5433`
(`ow_test`). NEVER point tests at the real `openwatch_go_dev` (one earlier
session truncated real data that way).
- Branch protection requires **up-to-date branches**, so each merge
re-BEHINDs the other open PRs — `update-branch` + re-run CI per PR.

---
Loading