Self-improve, Cowork pilotability & 1.0.0 GA audit hardening by phuetz · Pull Request #42 · phuetz/code-buddy

phuetz · 2026-05-29T16:14:30Z

Merges tmp-self-improve-default into main — 218 commits. This branch accumulated a large body of feature work; the most recent 4 commits are a 1.0.0 GA audit + hardening pass.

Validation (branch tip)

npm run typecheck: 0 errors (noUncheckedIndexedAccess now ON)
npm run lint: 0 errors (2379 pre-existing warnings)
Full Vitest suite: 986 files, 29116 passed, 0 failed, 88 skipped

Audit & hardening (top 4 commits — added in this pass)

b329dd7a docs: audit status (AUDIT-2026-05-29.md)
c116cb77 enable noUncheckedIndexedAccess across the codebase: 3423→0 errors / 528 files, 0 blind ! (audited against the diff), zero test regressions.
e1cc6c4c fix 39 pre-existing red tests (44 failed / 7 files → 0) + a real source bug: tool-handler.ts before-tool-call hook lacked the try/catch its pre-/post-bash siblings have.
916b1f1b 1.0.0 GA audit findings: secure-by-default network posture (CORS default→localhost, /ws Origin validation; DEFAULT_HOST kept 0.0.0.0 for the fleet mesh), confirmation check-order (a permission-mode deny can no longer be bypassed by CODEBUDDY_AUTO_CONFIRM / policy-allow), Cowork dep CVEs (ws/vite/tar), version sync rc.5→rc.8, ReDoS guard, peer-tool boot log, +more.

Bulk of the branch (214 earlier commits — pre-existing feature work)

Self-improve loop (D1/D2/D3), Cowork pilotability (slash bridges, panels, Fleet command center), companion/mediapipe cockpit, hermes, consent-gated browser_operator, universal-control pilot (WinForms/WPF/Avalonia), real computer-use QA harnesses, SSRF guard hardening, progress lib, etc.

Reviewer notes / open items

The self-improve path (CODEBUDDY_AUTO_CONFIRM / self_improvement capability) is this branch's core feature; I audited + hardened the permission-mode bypass, but a dedicated security review of the feature is worth a pass before GA.
Deferred audit items (see AUDIT-2026-05-29.md): turn-diff sync I/O (needs profiling), god-file decomposition (agentic-coding-runner 8441 LOC), RAG bounded eviction (product decision), exactOptionalPropertyTypes (v1.1), Phase-4 coverage spikes (MCP/voice/i18n/a11y).
Pre-existing runtime/scratch churn (.codebuddy/, .omx/, scratch/) is intentionally NOT included.

🤖 Generated with Claude Code

…work Make Code Buddy's engine commands and Hermes-parity surfaces drivable from the Cowork GUI instead of only the CLI (audit roadmap S0-S8). All routing goes through Cowork-native surfaces (never the headless CLI handlers) to avoid the cross-realm singleton trap. - S0 headless slash: core executeHeadlessSlashToken seam + default-deny allowlist; ChatView routes /-commands to real engine output, ui_effects or an honest "not yet pilotable" denial instead of a placeholder token toast. - S1 multi-agent: /swarm, /parallel, /agents, /fleet route to the native orchestrator/launcher/Fleet panels and show live in SubAgentPanel. - S2 browser operator: stream browser tool actions as browser.action events into a live BrowserOperatorOverlay with a STOP control. - S3 user model: "Infer from session" triggers runUserDialecticInference; proposed observations are reviewed/accepted in UserModelPanel. - S4 plan mode: /plan enters read-only plan permission mode. - S6 mobile supervision: MobileSupervisionPanel manages the pairing code and follow-up approval queue over the embedded server loopback routes, supervision-only (approval never dispatches work). - S7 compaction lineage: record a guarded fork run at the compaction boundary (no-op without an active observability run, never throws). - S8 long-tail: /lessons and /team open their Cowork panels. ~70 dedicated tests added; full Cowork suite (1414) green; both typechecks clean; core and Cowork bundles build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Phase 2) Critical: an e2e smoke against the running app revealed that 7 Cowork IPC bridges were DEAD in-app — `register*IpcHandlers(bridge)` ran at module top-level and captured a `null` bridge before the async boot assigned it, so every call returned the not-initialized fallback. This meant the slash routing shipped in 5cf77c8 ("pilot from Cowork") never actually worked in the running app; unit tests masked it by instantiating bridges directly. - Fix: command/orchestrator/subAgent/team/mention/skillMd/knowledge IPC now take a getter `() => bridge` resolved lazily per call (matching the newer `() => projectManager` handlers). Proven by a new Playwright e2e (`cowork/e2e/slash-commands-smoke.spec.ts`): /team, /fleet, /lessons open their panels and the bridges are reachable post-boot. - This resurrects multi-agent orchestration, sub-agents, team, mentions, skills and knowledge in Cowork (they existed but were dead due to the bug). Pilotability (axis B): - Subcommands route to their cockpit (/agents, /fleet, /team); /companion, /track → panels; /config, /workflow, /permissions, /hooks, /theme… → Settings tabs; /search, /persona, /sessions, /remember → generic open_panel; /history, /log, /workspace, /diff → headless allowlist. ~40 commands now pilotable. - New IdentityPanel (C3): edit SOUL.md/USER.md/AGENTS.md via core IdentityManager (`identityFiles.*` IPC), nav entry, /identity route. - docs/cowork-pilotability-matrix.md: full disposition of all 135 builtin slash commands + CLI groups = the falsifiable "completely pilotable" bar. Solidity: fixed 2 red tests (missing executeHooks mock). Full Cowork suite 1425/1425, e2e 7/7 in-app, root agent 88/89, typechecks + builds green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Surfaces paired device nodes (SSH/ADB/local) in Cowork via a new read-only `deviceNodes.list` IPC over the core DeviceNodeManager. Pairing/removal stay on the CLI (`buddy device`); secrets (pairing token, key path) are redacted before crossing to the renderer. New DevicePanel + nav entry. e2e proves the panel opens and the IPC is reachable in the running app. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds an `engine_action` slash ui_effect that triggers real side-effecting engine ops via existing IPC: /undo → checkpoint.undo, /redo → checkpoint.redo. Routes /test → test-runner panel and /think → reasoning viewer via the generic open_panel opener. Bridge + dispatcher unit tests; full Cowork suite 1429/1429. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When a session goes terminal, automatically run the (already-shipped, verified) user-model dialectic inference over its transcript — guarded to fire at most once per session and only for substantial conversations (≥6 user turns), so the extra LLM call is bounded. Fire-and-forget; reuses the verified `userModel.runInference` IPC, which only PROPOSES review-gated observations (never writes) and no-ops without a configured provider. The auto-trigger guard (`shouldAutoInferUserModel`) is unit-tested; the inference itself is the same path exercised by the manual "Infer from session" button (S3). Converts the user-model loop from manual-only to autonomous. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Closes the autonomous half of the learning loop, mirroring D1. New core `proposeLessonsFromSession(history, workDir, client?)` (src/agent/ lesson-auto-proposer.ts): LLM-analyses a session transcript and PROPOSES reusable procedural lessons into the candidate queue — strictly review-gated (propose() only enqueues PENDING; approval stays human-only) and no-ops without a configured provider. Exposed via `lessonCandidate.proposeFromSession` IPC and auto-triggered from Cowork's session-end hook (guarded: once per session, ≥6 user turns; fire-and-forget; shares the D1 substantial-session gate). Core parsing/propose path unit-tested (5 tests, injected client — no provider needed); the auto-trigger guard is the unit-tested D1 gate. Root + Cowork typechecks green; Cowork suite 1433/1433; core builds (dist emits the module the IPC loads). The live LLM call is provider-gated like D1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sessions (D3) The agent can now propose a Browser Operator session for live-web goals that web_search/web_fetch cannot satisfy (interaction, login-gated, multi-step). The tool builds an InternetScoutPlan + BrowserOperatorSessionDraft purely (no network, no browser launch) and returns it as a reviewable proposal: action log, consent scopes, stop control, proof export. Safety by design: - consent.required is true for local / interactive / login-gated plans; an isolated public-read plan is correctly ungated. - consent.granted is always false in a proposal — execution stays operator-driven via the existing consent-gated executor/draft surfaces. - fleetSafe:false, makesNetworkRequests:false. Cowork consumption: the tool name matches isBrowserOperatorTool() (starts with "browser_"), so the engine runner emits a live browser.action event and the BrowserOperatorOverlay renders the proposed session — not a silent JSON blob. Locked by a cowork browser-action test. Wiring mirrors lead_scout exactly: registry ITool (execute) + AGENT_TOOLS def (LLM-facing schema) + metadata (RAG). 12 unit tests + overlay-consumption guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…trix (axis-B) Reconcile the pilotability matrix against the actual routing (it had drifted from code) and close the genuinely-routable backlog, measured from the renderer: Routed (faithful — the target performs/shows the thing): - batch -> run_orchestrator (sibling of /swarm; actually decomposes + runs). - pairing -> device panel (C3 cockpit; setShowDevicePanel verified). - plugins, plugin -> Settings plugins tab (SettingsPlugins / TabId 'plugins' verified; proven open_settings effect). Reclassified after verifying the surface (no misdirection): - vulns/secrets-scan/security-review/guardian RUN a scan/review and produce output; Settings does not perform it -> 🔴, run via the agent in chat (SecurityReview/CodeGuardian auto-delegate). (/security + /policy DO route - they are config/dashboard.) A test locks in the non-misrouting. - yolo/autonomy: SettingsPermissionRules has no autonomy/YOLO control -> 🔴. Honest 🟡 backlog (surface EXISTS in the renderer but gated behind LOCAL component state, so a slash ui_effect can't reach it without lifting state to the store): checkpoints/restore/timeline (CheckpointPanel), voice/speak/tts (VoiceChatOverlay, local Titlebar state), export/save (ExportDialog, local Sidebar state), knowledge-graph (graph view is new). None is env/security-gated. Also documented: the docked ContextPanel (files/git/memory/knowledge/agents/mcp tabs) is always on-screen during a session, so those domains are reachable now; driving a specific tab via slash needs activeTab lifted to the store. Matrix measured from source: 45 ui_effect-routed + 11 headless + 3 special + prompt-forward = 🟢; small named 🟡 (local-state surfaces); rest 🔴 with true reasons; gated list (D4, secrets exec, research/flow live, browser-operator exec) not fabricated. Full cowork suite 1437/1437. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, e2e-verified Two named 🟡 backlog items had real renderer surfaces gated behind LOCAL component state (not store flags). Closed both with the intended DOM-event bridge pattern — additive, the existing UI triggers are untouched: - /voice, /speak, /tts -> dispatch `cowork:open-voice-chat` (the event the Titlebar VoiceOverlayButton already listens for) -> voice-chat overlay opens. - /export, /save -> dispatch `cowork:open-export` with the active session id; a new Sidebar listener opens the existing ExportDialog for that session. The dispatcher fires the events from the open_panel ui_effect (voice via PANEL_OPENERS, export ctx-aware since it needs the active session id). Added an `export-dialog` testid. Verified at every layer: bridge routing (unit), dispatcher event dispatch + detail (unit), and — crucially — the FULL chain in real Electron (e2e slash-commands-smoke: /export opens ExportDialog, /voice opens the overlay). This is the listener wiring that unit mocks would mask; e2e proves it live. 10/10 e2e, 1442/1442 unit, typecheck clean. Matrix updated: 50 ui_effect-routed. Also reconciled checkpoints/restore/timeline to 🟢 — they are already on-screen in the docked Context panel (Checkpoints section) plus undo/redo engine_actions, not a missing surface. 🟡 now shrinks to knowledge-graph (a genuine new view) + headless-info candidates. None is env/security-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…+ verified deferrals Acting on the headless-info backlog by verifying each candidate's core handler rather than deferring. Added to COWORK_HEADLESS_ALLOW (all verified read-only): - /quota -> handleQuota() formats the rate-limit display: pure read, no cwd, no mutation. - /export-formats -> handleExportFormats() returns a static text block. - /export-list -> handleExportList() reads ~/.codebuddy/exports (home-based, cwd-independent). Verified and deliberately NOT allowlisted, with true reasons: - bug, coverage: handleBug/handleCoverage read process.cwd() (the Electron dir in Cowork, not the active project) -> they'd scan the wrong path. - telemetry: handleTelemetry is an opt-in/opt-out toggle (mutates). - knowledge-graph: has NO headless handler in EnhancedCommandHandler (TUI-only via React) -> genuine new-view feature, not a routing/allowlist gap. Matrix: 14 headless-allowlisted. The axis-B backlog is now a SINGLE item — knowledge-graph, a new visualization view that deserves its own design. Everything else is 🟢 (routed / docked / allowlisted) or 🔴 with a true reason; the gated list (D4, secrets exec, research/flow live, browser-operator exec) stays documented, not fabricated. 30/30 bridge, typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xis-B item, e2e-verified The knowledge-graph view already existed (LessonsVaultGraph — the hierarchical, searchable lessons-vault view) but was gated behind Fleet-Command-Center-local state, unreachable by slash. Closed it with the established store-flag lift: - Lifted `showLessonsGraph` from FCC-local useState to the store (showLessonsGraph + setShowLessonsGraph); the FCC "browse" button still toggles the same flag. - /knowledge-graph -> open_panel('knowledge_graph') -> dispatcher opens the Fleet Command Center + sets showLessonsGraph (the graph renders inside FCC). Verified at every layer: bridge routing (unit), dispatcher sets both flags (unit), and the FULL chain in real Electron (e2e: /knowledge-graph -> lessons-vault-graph visible). 11/11 e2e, 1446/1446 unit, typecheck clean. This empties the constructible axis-B backlog: every builtin slash command now has a 🟢 disposition (ui_effect / headless / special / prompt-forward / docked panel) or a 🔴 with a true reason. Matrix Acceptance section updated to "met". The only remaining work is the gated list (D4 gateway, secrets vault execution, research/ flow live, browser-operator execution) — live resources / security design, documented and not fabricated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

"channels" was mis-classified as a new feature to defer — but a read-only channel-STATUS view is the same proven DevicePanel pattern: a window onto an existing core subsystem, verifiable without any provider/send. - New `channels.status` IPC -> core `getChannelManager().getStatus()` (read-only). The free-form `info` blob (may carry tokens/ids) is dropped; only type/connected/authenticated/lastActivity/error cross to the renderer. - New ChannelsPanel (read-only list, refresh) + store flag showChannelsPanel + ShellNavigation entry + preload api + App render. Configuring/sending stays on the CLI + cron delivery layer. Verified end-to-end in real Electron (e2e: panel opens + channels.status IPC reachable). 12/12 e2e, 1446/1446 unit, typecheck clean. Local code — no push-gate or security surface (read-only, secrets redacted). Matrix updated: the read-only-window-onto-existing-core category (device, channels) is now exhausted. Remaining CLI groups verified: groups = access-control config (security boundary -> gated), autonomous-code runs surface via the run/audit log, gitnexus is an MCP tool via the marketplace. research/flow stay new-feature (provider-gated value); secrets/D4 stay security-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Implements the actionable, validated items from the 2026-05-29 multi-agent audit (see AUDIT-2026-05-29.md). Lint + typecheck clean; all touched modules pass targeted tests (263). No new test failures introduced (39 pre-existing failures in 6 untouched files remain, flagged for separate triage). Security / GA gates: - server: default CORS to localhost (function form), add WS verifyClient on /ws allowing no-Origin clients, warn on non-loopback bind; keep 0.0.0.0 default so the fleet mesh is unaffected (new src/server/origin-check.ts) - confirmation-service: a permission-mode deny (e.g. plan) can no longer be bypassed by CODEBUDDY_AUTO_CONFIRM or a PolicyEngine 'allow' short-circuit (+ regression test); bound setLargeChangeThreshold to [1,10000] - permission-modes: loud warning when switching to bypassPermissions - declarative-rules: cap glob pattern length (ReDoS guard) - peer-tool-bridge: ERROR log at boot when workspace root unset - deps: cowork ws ^8.20.1, override vite ^7.3.3 + tar ^7.5.15 + ws Fixes / quality: - sidebar test: mock theme-context so useTheme works under the lightweight ink render mock (component was correct; 8/8 now pass) - memory adapters: replace 5 `as any` with typed response shapes - swe-agent-adapter: validate llmCall/executeTool are functions - gemini provider: wrap chat/chatStream in the shared circuit breaker (opt-in via ChatOptions.circuitBreaker, off by default) + parse rate-limit headers (best-effort) - model-tools: add o4* entry - codex-oauth: use logger instead of console.error - ci: rebuild better-sqlite3 so the DB test suite runs - version: root package.json 1.0.0-rc.5 -> rc.8 - docs: align tests/README coverage targets with the enforced 70% - gitignore: ignore *.traineddata, desktop bridge .exe, scratch artifacts Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Full Vitest suite is now green (986 files, 29116 passed, 0 failed; previously 44 failed / 7 files). No assertions weakened, no test gutted. Root causes: - Most failures: test mocks for HooksManager.executeHooks resolved to `undefined`, but the real contract returns HookResult[] which tool-handler.ts iterates with for..of -> "not iterable" failed every tool dispatch. Fixed the mocks to resolve to [] (codebuddy-agent, agent-core, agent-repair-integration). - ocr-tool: stale mock vs the newer 4-engine OCR cascade (real WASM IO via tesseract.js caused 20s timeouts; stale fallback-message assertion). Mocked tesseract.js + fixed event ordering + updated the message; no real OCR engine needed (child_process/VFS already mocked). - browser-watchdog / browser-operator-consent: mock-shape fixes. Source bug (real, not a test issue): - src/agent/tool-handler.ts: the before-tool-call lifecycle hook call was not wrapped in try/catch, unlike its pre-bash/post-bash siblings, so a throwing hook failed the whole tool call instead of degrading gracefully. Wrapped it (log + continue). Validated by agent-core "should handle hook execution failures gracefully". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Enables the strict `noUncheckedIndexedAccess` compiler flag, previously deferred with a "~200 locations" TODO — the real count was 3423 errors across 528 files. All migrated to safe null-handling. - 3423 -> 0 type errors; flag flipped on in tsconfig.json. - ZERO non-null assertions (`!`) added (audited against the actual diff): fixes use optional chaining, nullish-coalescing defaults, explicit guards, or restructuring — never `arr[i]!`, which would add no safety and defeat the migration. - Behavior preserved: full Vitest suite green (986 files, 29116 passed, 0 failed) — identical to the pre-migration baseline, i.e. zero regressions. typecheck 0 errors, lint 0 errors. Executed via a 528-agent migration workflow (one file per agent, hard "no blind !" rule), then iterate-until-zero on the residual cascade (only 1 cross-file residual remained, fixed manually in src/input/text-to-speech.ts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- 2.6 noUncheckedIndexedAccess: marked DONE (passe 3). Records the real scale (3423 errors / 528 files, not ~200), the 528-agent migration, the no-blind-! audit, and the green full suite. - 39 pre-existing red tests: marked DONE (0 failures). Notes the common executeHooks mock cause + the real tool-handler before-tool-call bug fix. - Header updated: work is committed + pushed (916b1f1 / e1cc6c4 / c116cb7). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Recap of the 1.0.0 GA audit-hardening work (4 head commits): network secure-by-default, confirmation-control ordering, security guard-rails, deps/version sync, red-suite repair, noUncheckedIndexedAccess migration, Gemini parity + repo hygiene. Notes that PR #42 also bundles the broader branch history (Cowork pilotability, self-improve loop). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`import * as fs from 'fs-extra'` only exposes fs-extra's own helpers (pathExists/ensureDir) under ESM on Node — Node's CJS lexer does not detect the node-fs methods (existsSync, writeFile, readFile, ...), so they are `undefined` on the namespace and throw at runtime. This was silently breaking three shipped features (workspace semantic indexing, /plan persistence, submit_plan) while the unit tests passed because they mock fs-extra. Switch to the default import (`import fs from 'fs-extra'`), whose default export carries every method. Found via a fresh-clone smoke test; verified by rebuild (tsc exit 0) + E2E run ("Workspace indexing complete" instead of "fs.existsSync is not a function", 0 error lines). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Simulates a brand-new user installing from source on Node 24: clean clone -> npm install -> build -> --version/--help/doctor -> real E2E task via the ChatGPT login (gpt-5.5, $0 flat-fee). Install/build/startup path is green and fast (0.2s cold start). Surfaces 8 frictions, almost all in delivery rather than code: F1 (npm publishes 0.4.0, 3 months stale), F6 (the fs-extra runtime bug — now fixed), F3 (transitive prod vulns), F7 (model routing falls back to gpt-4o->gpt-5.2), F8 (cost shows $0.02 despite flat-fee), F2/F4/F5 (stale badges + cosmetics). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ee cost Smoke-test findings F4/F7/F8: - F4: program.name was "codebuddy" while the binaries are `buddy`/`code-buddy` — `buddy --help` now shows the right usage line. - F7: a secondary call inheriting the global `gpt-4o` config default reached the Codex backend, got rejected, and noisily fell back to gpt-5.2 (3 models in one run). Proactively remap OpenAI-API-only slugs to the known-good Codex model in provider-chatgpt-responses (debug, not warn) — skips the failed round-trip. E2E: gpt-4o WARN 1 -> 0. - F8: cost showed $0.02 despite the flat-fee ChatGPT plan because getCurrentModel() can lag the actual Codex model. Add CodeBuddyClient.isSubscriptionAuth() (provider flag) and zero the displayed turn cost on that path. E2E: cost -> $0.0000. The call is optional-chained so partial client mocks in tests fall through cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Exercises the real fs-extra (no mock): asserts the default import exposes the node-fs methods, that the three previously-broken source files use the default import, and that submit_plan actually writes its plan file (would silently no-op under the old `import * as fs` form). 3 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…isories (F3) A fresh install reported 67 prod advisories (12 high). `npm audit fix` (no --force) brings that to 53 / 5 high by updating the lockfile within existing semver ranges (e.g. picomatch 2.3.1 -> 2.3.2 for micromatch while keeping 4.x for tinyglobby — a blanket override would have broken the 4.x consumers). Bump the direct axios floor to ^1.16.1; the widened axios header type required a string coercion in image-tool. The 5 residual highs need breaking major upgrades (stagehand -> langchain/langsmith, OTel sdk) or have no upstream fix (xlsx, optional) — tracked in SECURITY.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…oke-test (F1/F2) F1: the front-page `npm install -g` path installs 0.4.0 (3 months stale) — reorder README Quick Start and getting-started Installation to lead with from-source and warn that the npm release lags during rc (real fix is `npm publish` rc.8, owner action). F2: stale badges (27,334 tests / 85% coverage) -> 29K+ / >=70%. Update SMOKE-TEST-2026-05-29.md verdict table: F2/F4/F6/F7/F8 fixed, F1 mitigated, F3 partial (audit fix + tracked residuals), F5 deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… pass) A real code-writing task (write slugify.test.js) via the ChatGPT login produced correct, runnable code (node --test 4/4 pass), $0.0000, 0 model-routing warnings. Confirms the base is usable for real tasks; remaining work is `npm publish` (owner), deferred GLib cosmetic (F5), or Phase-2 refactor debt for external hand-off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ster (Phase 2) Circular-dependency cleanup (audit Phase 2.1), verified 10 -> 8 cycles: - runner <-> checkpoint-manager: checkpoint-manager imported 3 interfaces with a value `import`. Since check:circular runs madge with skipTypeImports, switching to `import type` erases the edge — one line, zero runtime impact. - runner -> task-decomposer -> edit-proposal-producer -> runner: edit-proposal -producer imported the pure path helpers (normalizeGitPath, isPathAllowedBy Contract, resolveRepoPath) as values from the 8.4K-LOC runner. Extract them to a new dependency-free `agentic-coding-paths.ts`; runner re-exports them for back-compat. This also starts decomposing the god file (audit Phase 2.2). Verified: typecheck 0, `npm run check:circular` 10 -> 8, and tests/agent/autonomous 146/146 pass (incl. the path-traversal security suite that exercises the moved guards). The remaining agentic cycle (runner <-> verification-loop) is a genuine value cycle needing a larger extraction — left for a dedicated pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Records the 2 cycles broken this session (10 -> 8) and a detailed, handoff-ready plan for cycle 3 (runner <-> verification-loop), which is deliberately deferred: it requires relocating a security-critical fs.writeFile secret-redaction monkey-patch that is import-style sensitive and not directly unit-tested, so it needs a dedicated, fully-gated pass rather than a budget-constrained one. Also maps the remaining value cycles (incl. the 7-module monster) for the hand-off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…first (Phase 2 cycle 3) Extracts edit application/preview + verification execution + the fs.writeFile secret-redaction monkey-patch out of the 8.4K-LOC runner into a cohesive agentic-coding-edits.ts. verification-loop now imports those functions from the new module and `import type`s the runner types, so the cycle is gone (check:circular 8 -> 7). The agentic-coding cluster is now fully cycle-free. Done test-first because the move relocates a SECURITY mechanism: the patch redacts secrets on string fs.writeFile EXCEPT during declared edits (isApplyingEdits), and it is import-style sensitive (must stay on the node:fs/promises DEFAULT export singleton). Added agentic-coding-redaction.test.ts pinning both behaviours; it was written and made to pass against the OLD code first, then re-run green against the extracted module. Verified: typecheck 0, tests/agent/autonomous 148/148 (incl. the redaction gate + path-traversal security suite), autonomous-code-command 60/60, check:circular 8 -> 7. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The branch audit found three fixable value cycles and a circular-dependency gate that still trusted stale allowlist entries. Extracted leaf modules for message guards, local memory, and peer method registration so providers, adapters, and bridges no longer import their registries back into themselves. Runtime metadata and generated scratch results are removed from the tracked branch surface, while ignore rules now cover those files so future audit noise stays local. Constraint: Four Phase 2 runtime cycles still require larger injection or registry redesigns. Rejected: Break CodeBuddyAgent/fleet/heartbeat cycles in this pass | too much boot/runtime risk for a branch hygiene audit. Confidence: high Scope-risk: moderate Directive: Keep KNOWN_CYCLES exact; stale accepted cycles should fail the gate instead of hiding regressions. Tested: npm run typecheck; npm run check:circular; npx eslint scripts/check-circular-deps.ts; npm run lint; npm test -- --run tests/unit/client.test.ts tests/unit/codebuddy-client.test.ts tests/context/transcript-repair.test.ts tests/memory/memory-provider.test.ts tests/server/peer-rpc.test.ts tests/server/peer-chat-bridge.test.ts tests/fleet/peer-chat-stream.test.ts tests/fleet/peer-tool-bridge.test.ts tests/server/peer-tool-bridge.test.ts Not-tested: full npm test; live Cowork/Electron runtime; live fleet websocket mesh

phuetz and others added 29 commits May 29, 2026 01:16

feat(cowork): route /switch to the model picker

6e62a13

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

phuetz merged commit 4ba0591 into main May 29, 2026
1 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Self-improve, Cowork pilotability & 1.0.0 GA audit hardening#42

Self-improve, Cowork pilotability & 1.0.0 GA audit hardening#42
phuetz merged 29 commits into
mainfrom
tmp-self-improve-default

phuetz commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

phuetz commented May 29, 2026

Validation (branch tip)

Audit & hardening (top 4 commits — added in this pass)

Bulk of the branch (214 earlier commits — pre-existing feature work)

Reviewer notes / open items

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant