feat(notifications): broadcast hardening + informativeness (combines #39 + #40)#41
Merged
Merged
Conversation
Closes the wedge incident class. One stuck IPC client (slow TUI render, crashed MCP bridge, kernel socket buffer congestion) can no longer stall the daemon's broadcast fan-out for every other client. internal/ipc/server.go — replace the synchronous Broadcast with a per-Conn 64-slot buffered send queue and a dedicated send goroutine per conn. Send is non-blocking: if the queue overflows the offending conn is scheduled for async close and ErrSendOverflow surfaces in the broadcast log, but other conns keep receiving. Broadcast also marshals the wire frame once and shares the bytes across the per-conn queues instead of re-serializing per client. Close is sync.Once-guarded so the read and write halves can race to it safely. internal/daemon/event.go — enforce a 4 KiB cap on PaneEvent.Message and 128-byte cap per Data value at the IPC boundary (toPaneEventPayload). Truncation keeps the tail (most recent / terminal-visible content) and sets Data["truncated"]="1" so consumers can flag the event. The earlier opencode-splash wedge produced a > 1 KiB box-drawing excerpt that flooded the broadcast loop; this is the size-side defense. Tests: - TestBroadcast_SlowConnDoesNotBlockFastConn — 200 broadcasts with one conn refusing to read; assert the Broadcast loop returns within 1 s and the fast conn drains. - TestBroadcast_ContinuesAfterSlowConnDisconnects — after the slow conn is torn down, new broadcasts still reach the surviving conn. - TestToPaneEventPayload_* — truncation marker placement, tail preservation, under-cap pass-through, nil-Data no-allocation path. All green under -race.
Apply all Critical / High / Medium / Low items surfaced by the code
review.
Source changes:
- internal/ipc/server.go
* (C1) Broadcast now clones the marshaled frame via slices.Clone
before fan-out. Today no callsite mutates it, but a future
sync.Pool of bytes.Buffer would silently corrupt frames still
being read by per-conn sendLoops; the clone makes that contract
bullet-proof.
* (L4 + M3) Overflow path is CAS-guarded — the warning log fires
exactly once per slow conn and only one Close goroutine spawns,
instead of N redundant log lines and N redundant goroutines while
the conn is being reaped.
* (H1 belt-and-suspenders) sendLoop sets a 30 s SetWriteDeadline per
frame so a kernel-buffer wedge that doesn't error has a
deterministic cleanup ceiling.
* (M4) log.Printf migrated to logger.Warn / logger.Error so the
project's leveled logger controls visibility.
* (M2) Document why Send and sendFrame both check closed/overflow —
outer check is the marshal-cost fast path, inner check is the
race-safe gate next to the channel send.
* (H2) Close() doc comment clarifies pending frames are intentionally
discarded.
* (L6) Guard comment on the conns snapshot copy in Broadcast warns
against future "optimizations" that would reintroduce the wedge.
* (M6) Server.Stop comment clarifies Daemon shutdown does not rely
on a final IPC broadcast (durability lives in the on-disk
workspace.json path).
* New: Server.ConnCount() for test-friendly connect/disconnect sync.
- internal/daemon/event.go
* (H4) Reserved Data key renamed to "_quil_truncated" — the new
_quil_ namespace prevents accidental collision with caller-supplied
keys (including a literal "truncated" key that emitters may use).
* (H3) Compile-time invariant block asserts both caps strictly exceed
len(truncationMarker); a future contributor lowering either cap
below the marker length will fail to build instead of panic at
runtime.
* (Rules #1) const block reflowed; gofmt-clean.
* (L1) Marker length explained (… is 3-byte UTF-8 + "[truncated]" =
14 bytes, not 12).
* (L2) Truncation strategy doc comment names the tail-keep rationale
and points future emitters at TruncationStrategy if they need head.
* (L3) TODO comment on sync.Pool trade-off (defer until profiling
demands it; clone-then-pool is the correct sequence).
Test changes:
- internal/ipc/conn_internal_test.go (NEW, package ipc white-box)
* TestSendLoop_ExitsOnWriteError — uses net.Pipe to force a write
error mid-frame; asserts no goroutine leak (QA #1 finding).
* TestConn_CloseIdempotent — 16 concurrent Close calls, none panic,
closed flag set exactly once.
* TestConn_SendFrameAfterCloseShortCircuits — sendFrame and Send
after Close return ErrSendOverflow without touching the socket.
- internal/ipc/broadcast_resilience_test.go
* (M5) Replaced time.Sleep-after-connect with waitForConnCount
polling on the new ConnCount() accessor. No more CI flake risk
from racing the connect-registration window.
* (Rules #4) Added t.Parallel() to both tests.
* (L7) TestBroadcast_MarshalErrorLogsAndReturns covers the
previously-untested marshal-error path.
- internal/daemon/event_sizecap_test.go
* Updated existing 4 tests for the renamed _quil_truncated key.
* Added t.Parallel() to all 7 tests.
* (QA #2) TestToPaneEventPayload_ExactCapBoundary verifies the > cap
condition rejects an off-by-one refactor to >=.
* (L8) TestToPaneEventPayload_BothOverCapSetsFlagOnce — when both
Message and Data value exceed cap, the flag is set exactly once.
* NEW TestToPaneEventPayload_ReservedKeyDoesNotClobberCaller proves
the _quil_ namespace fix: a caller's literal "truncated" key
survives untouched alongside the daemon's _quil_truncated flag.
CLAUDE.md: `internal/ipc/` bullet expanded with the resilience design
(sendCh, sendLoop, CAS overflow, ConnCount, size caps) per
documentation-maintenance rule.
NewClient audit (cmd/quil/main.go, mcp.go, version_gate.go): all
callsites either defer Close or explicitly close. SetWriteDeadline in
sendLoop is the runtime safety net for future regressions.
All green under -race.
Make the notification sidebar genuinely informative and close gaps in the MCP API. Notification events now carry the last few stripped output lines that triggered them, so the sidebar can render context (4-line cards: separator + name/time + title + excerpt preview) and MCP agents no longer need a follow-up read_pane_output per event. Per-pane mute (Alt+M, persisted in workspace.json) drops events at the source for chatty processes like `npm test --watch`. Active-pane output_idle events are suppressed TUI-side as redundant. The MCP surface gains dismiss_notifications (ack from the agent side) and watch_notifications since_timestamp (closes the race between kicking off a task and registering the watcher). Defaults bumped to 200 max events. Dead notification_handlers code path now emits a deprecation warning at plugin load instead of failing silently; docs corrected.
Address three signal-quality issues spotted while testing the sidebar: 1. ANSI sequence fragments leaked into excerpts. The trailing 4 KiB slice of the ring buffer could begin mid-CSI — the `\x1b[` ended up in the discarded prefix but parameters like `2;30;30;30m` or `;18H` survived into the window and ansi.Strip rendered them as plain text. New `trimToNewlineSafe` helper advances the slice start to the next newline before stripping, applied in both `paneOutputExcerpt` and `analyzeIdleTitle`. 2. Output_idle events whose excerpt collapses to a single shell prompt rune (%, $, >, ❯, #, ➜, λ, ») are now suppressed when the default "Output idle" title fired. Shells idling at a prompt are not a state change worth notifying — the user can see the prompt by looking at the pane. Plugin idle handlers (Claude "Needs your approval" etc.) still fire because their non-default title means the regex saw something meaningful. 3. eventQueue.Push aggregates repeat (PaneID, Title) events: the new event reuses the prior event's ID, bumps Data["count"], and replaces it at the front of the queue. The TUI sidebar's AddEvent updates the matching card in place and bubbles it to position 0. The title line now renders a ×N badge when count > 1. Replaces N near-duplicate cards saying the same thing with one bumping card.
Field test surfaced two issues my first round of follow-ups missed: 1. ANSI fragments STILL leaked in claude-code panes (`2;30;30;30m| Thinking: ...`, `0mConfig/Build files:- ...`). Root cause: the newline-only seek in trimToNewlineSafe fell through on TUIs that emit one screen paint with no newline inside the trailing 4 KiB window. The bounded scan now also accepts an ESC byte (0x1b) as a clean boundary — ansi.Strip handles a full sequence starting at ESC, even if the next newline is far away. Scan is bounded to 512 bytes so it stays cheap. 2. Output-idle events with `%` excerpts kept firing — both panes had their cards collapsed to `×2` even though my isPromptOnlyExcerpt covers the bare `%` case. The screenshot shows what the sidebar renders (first non-empty line), not what the daemon's idle checker sees (the full 5-line excerpt). Add a debug-level log line at every idle decision (pane id, type, title, suppress verdict, excerpt prefix) so we can see why the check returns false on real-world shells. With the dev build's ldflags-forced debug level, this surfaces in the next test run's quild.log.
Field diagnosis with debug logs confirmed the real shape of the bug. The terminal-pane idle excerpts logged in quild.log look like: "% [padding] \r \r\rArtjoms_Stukans@EPCHZURW03..." That's: prompt rune, padding spaces, CR-return-to-start, more CRs, then OSC 0 window-title content that leaks through ansi.Strip (the strip state machine appears to bail on an embedded CR inside the OSC payload). The sidebar then renders just "%" because that's the first non-empty line — but the *actual* terminal-visible content is the hostname leak, not the prompt that was immediately overwritten by the CRs. Two coordinated changes: 1. lastNLines now applies per-line CR-reset semantics — splitting a line at the last `\r` and keeping only the trailing segment. This matches what the terminal would actually display, so excerpts no longer capture text the user can never see (the pre-CR prompt) and miss the text they DO see (the post-CR window title). 2. isPromptOnlyExcerpt is now suffix-based and recognises hostname-like patterns. A line is "prompt-like" when, after trimming, it ends with a recognised prompt rune (covers `%`, `user@host % `, `~/repo $ `, etc) OR matches a user@host pattern (the OSC 0 leak signature). Long lines (> 200 chars) are presumed to be command output regardless of trailing characters — real prompts are short. The two changes interact: after CR-reset the excerpt's visible content is the hostname leak; the new prompt classifier recognises that pattern and suppresses correctly. `ls` output still survives suppression because its line endings don't match prompt patterns. Verified against the field-logged excerpts.
The combined PR's data.excerpt field (set by withExcerpt for MCP consumers) is typically a multi-line stripped excerpt of 200-500 bytes. The original 128-byte Data value cap from the broadcast-hardening PR was tuned for short metadata (exit_code, tool name) and would silently truncate the excerpt to a 1-2 line preview. Raise the per-Data-value cap to 1 KiB. The Message field still has the 4 KiB cap for the full excerpt, but agents reading data.excerpt now also get the realistic full content. Per-event ceiling becomes 4 KiB + N × 1 KiB; realistic events have 2-4 keys so total stays well under 10 KiB. Updated TestToPaneEventPayload_DataValueOverCapTruncates to use 4 KiB input (guaranteed to exceed any reasonable future tuning); the exact- boundary test already pins the cap edge precisely.
This was referenced Jun 8, 2026
High (3):
- H1 isPromptLikeLine over-suppression. "build complete: 100%" used to
classify as prompt-like because the suffix rune match accepted any
trailing prompt char. Now the rune must EITHER be the bare line OR be
preceded by whitespace — so "% " matches, "user@host %" matches, but
"100%" and "x$" do not. Two new test rows pin the regression.
- H2 idle-checker cooldown leak. checkIdlePanes was setting
LastIdleEventAt = now BEFORE the prompt-only suppression decision, so
a suppressed event still consumed the 30 s cooldown — a single
spurious suppression caused 30 s of silence even if real activity
followed. On suppression we now roll back LastIdleEventAt to the zero
value; IdleNotified stays true so the next tick doesn't re-evaluate
the same idle state (flushPaneOutput resets it on the next PTY byte).
- H3 sidebar cursor drift on aggregation. NotificationCenter.AddEvent
on a same-ID match moves the card to position 0; the cursor stayed
at its old index, silently jumping the user's selection onto a
different card. Now we snapshot the cursor's event ID BEFORE the
move-to-front and re-locate the cursor onto the same logical event
afterward. Fresh prepends still leave the cursor at index 0 (legacy
"cursor 0 = newest" semantics) — only the aggregation path chases.
Medium (4):
- CLAUDE.md L86: MCP tool count was "17"; now "18" including
dismiss_notifications. mcp_tools.go implementation count was "15";
now "18". The MCP tool list reflects all three notification additions
(excerpt-carrying get_notifications, since_timestamp on watch_*,
dismiss_notifications).
- docs/configuration.md: full [notification] section added documenting
sidebar_width and max_events. Bindable actions table gains
mute_pane, notification_toggle, notification_focus, go_back,
notes_toggle rows — all of which were missing.
- daemon.go idle-decision debug log was emitting up to 200 bytes of
raw excerpt content. Per observability-and-logging.md "Never log
user-provided content" rule, even debug must not echo PTY bytes
(panes routinely contain API keys, passwords from REPLs, .env on
cat). Replaced with structural metadata: excerpt_bytes +
excerpt_lines. Sufficient to diagnose suppression decisions without
the secret-leak risk.
- internal/tui/dialog.go shortcuts list: added Alt+M mute, Alt+N
notification toggle, F3 sidebar focus, Alt+Backspace pane history,
Alt+E notes — all previously undiscoverable from F1.
- plugin/registry.go [[notification_handlers]] deprecation warning was
per-load — a MsgReloadPlugins call would re-fire the warning for
every stale plugin every time. Now one-shot per plugin name per
daemon lifetime via a sync.Map sentinel. Daemon restart re-enables.
Low (6):
- Replaced developer's real OS username (Artjoms_Stukans) and
corporate-shaped hostname (EPCHZURW03) in 3 test fixtures with
synthetic user_name@host01 — the regex behavior is identical.
- Added t.Parallel() to all 7 net-new test files in this PR (44 test
functions). The IPC tests were already parallel; daemon and tui
tests were the inconsistency.
Coverage gaps (3):
- TestEventQueue_Push_CorruptCountResetsToBaseline pins the
strconv.Atoi error branch in the aggregation count parser (corrupt
Data["count"] resets to baseline of 1 instead of panicking).
- TestIsPromptLikeLine_EmptyString_DirectCall directly invokes
isPromptLikeLine("") (previously only reached transitively through
isPromptOnlyExcerpt's blank-line filter).
- TestLoadPluginTOML_NotificationHandlersDeprecated drops a legacy
TOML fixture with [[notification_handlers]] in a t.TempDir() and
asserts the load succeeds + NotificationHandlers slice still
populates (back-compat preserved even after the deprecation).
- TestNotificationCenter_AddEvent_CursorFollowsLogicalEventOnAggregation
is the H3 invariant pin: a cursor on a non-aggregated event must
follow that event's new index after move-to-front of a different
event.
All green under go test ./... and go test -race ./...
Daemon-side plumbing for the upcoming Claude / OpenCode hook-driven
notifications. No user-facing behavior change yet — the Claude .sh /
opencode .js producers don't emit to the spool until Phase C — but the
ingest pipeline is fully testable in isolation.
New package internal/hookevents/:
types.go — Payload wire schema (v=1, ts_ms, seq, pane_id, src,
hook_event, session_id, transcript_path, cwd, title, sev, data).
Severity constants (info/warning/error), source constants
(claude/opencode), hook-side caps (200 B title, 128 B data value,
2 KiB total). Payload.Validate enforces required fields + bounded
enums; sentinel errors via errors.Is.
spool.go — Spool reads $QUIL_HOME/events/<paneID>.jsonl files. Init
truncates stale files on daemon start (no replay). Tick polls every
file, remembers per-file byte offset, skips trailing partial lines
(the documented append-vs-read race), drops malformed / oversize
lines with a warn log. Cleanup unlinks a pane's spool file on
DestroyPane.
ingest.go — Ingester sits between Spool/IPC and emit. Per-pane
sliding-window rate limiter (100 events / 2s; on trip emits one
"internal.event_storm" diagnostic and drops further events from
that pane for 10 s). Per-(paneID, hook_event) 50 ms debounce
coalescer; last payload wins with data["coalesced"] = burst count.
now() is overridable so tests don't depend on wall clock.
Daemon wiring (internal/daemon/daemon.go):
- New Daemon.hookSpool + Daemon.hookIngester fields. Init in Start()
after EnsureScripts. Watcher goroutine launched alongside
idleChecker, polls 200 ms, same shutdown discipline (FlushAll
drains pending coalesce buffers before exit).
- emitHookEvent translates Payload → PaneEvent, sets
Pane.HookHealthy + Pane.LastHookEventAt under PluginMu, routes
through the existing emitEvent (mute filter, aggregation,
broadcast).
- checkIdlePanes' shouldFire condition gains
!(pane.HookHealthy && now-LastHookEventAt < 30 s) so the legacy
idle excerpt steps aside for healthy hook panes. The 30 s grace
period lets legacy idle reactivate as a fallback during long quiet
turns — and panes whose hook plugin fails to load never trip
HookHealthy at all, so they keep the legacy notification surface.
- Both DestroyPane code paths (handleDestroyPane and
handleDestroyPaneReq) call Spool.Cleanup before
session.DestroyPane so the watcher's next tick can't race with the
destroy.
internal/daemon/session.go — Pane gains LastHookEventAt time.Time +
HookHealthy bool, guarded by the existing PluginMu.
internal/config/config.go — new EventsDir() helper returning
$QUIL_HOME/events; mirrors ClaudeHookDir / SessionsDir.
Tests (15 new in hookevents/):
- Payload validation: happy path + 6 error rows + empty severity
tolerance.
- Spool: appended lines read, offset survives, partial trailing line
skipped and consumed after completion, malformed lines dropped,
Init truncates, Cleanup unlinks, non-jsonl files ignored, oversize
line dropped.
- Ingester: burst coalesce to 1 with last-wins + count metadata,
different events / panes stay distinct, FlushAll drains, rate limit
trips at 100/2s with storm diagnostic emitted, limiter recovers
after penalty (uses overridable clock).
All green under go test ./... and -race.
Claude (internal/claudehook/):
- BuildSettingsJSON now registers Quil's hook command under 12 events
(SessionStart + SessionEnd + UserPromptSubmit + Notification +
PermissionRequest + Stop + PreCompact + PostCompact + SubagentStart +
SubagentStop + TaskCreated + TaskCompleted) instead of SessionStart
alone. The script branches on hook_event_name from stdin.
- scripts/quil-session-hook.sh and .ps1 rewritten as multi-event
routers. SessionStart keeps the original session-id-file behaviour
(used by Quil's restore path); the other 11 events compose a v1
hookevents.Payload and append one JSONL line to
$QUIL_HOME/events/<paneID>.jsonl.
Title/severity mapping per the plan:
SessionEnd → "Session ended" info
UserPromptSubmit → "Working on: <preview>" info
Notification → pass-through warning
PermissionRequest → "Needs approval: <tool>" warning
Stop → "Reply ready" warning
PreCompact → "Compacting context" info
PostCompact → "Compaction complete" info
SubagentStart → "Spawned: <agent_type>" info
SubagentStop → "<agent_type> done" info
TaskCreated → "Task: <content>" info
TaskCompleted → "✓ <content>" info
Wire caps enforced at the hook side (title ≤ 200 chars, data values
≤ 128 chars, total ≤ 2 KiB) so the daemon's outer cap is purely a
backstop. Title and previews are truncated with "…" suffix.
- New TestBuildSettingsJSON_RegistersAllForwardedEvents pins the
12-event registration so a future refactor that drops an entry from
forwardedHookEvents fails fast.
OpenCode (internal/opencodehook/scripts/quil-session-tracker.js):
- Plugin keeps the original session-id rotation tracking (recording
to opencode-<paneID>.id for the resume path).
- Generic `event` handler additionally forwards a filtered tier to
the spool:
session.idle → "Reply ready" warning
session.error → "Session error: <type>" error
session.compacted → "Compaction complete" info
session.status (retry) → "Retrying API (attempt N)" warning
file.edited → "Edited N files" info
(batched 1s window, single line per burst)
- Typed permission.ask hook → "Needs approval: <tool>" warning.
- Typed experimental.session.compacting → "Compacting context" info.
- Per-pane in-process token bucket (20 events/s sustained, 50 burst).
Drops with a single warn-log when exhausted; recovers as the bucket
refills. Independent of the daemon-side limiter (which lives in the
Ingester) so two layers of defense protect the IPC fan-out.
- Wire caps enforced before JSON.stringify: title ≤ 200 bytes UTF-8,
each data value ≤ 128 bytes UTF-8, total line ≤ 2 KiB; oversize
lines are dropped with a hook.log breadcrumb.
- Per-pane seq counter so the daemon's coalescer has a tiebreaker
when two events land in the same millisecond.
No daemon changes — the Phase B infrastructure consumes whatever the
hooks produce.
All green under go test ./...
User-facing surface for the hook events tier per the plan's Phase D
section. No new code paths in the daemon — just plumbing the config
value to the hook scripts so users can opt into verbose mode or turn
hook events off entirely without editing the embedded scripts.
internal/config/config.go:
- New HookNotificationsConfig{Claude, OpenCode string} nested under
NotificationConfig.Hooks (TOML key `notification.hooks`). Default
values "default" set in Default() so a fresh install gets the v1
tier from Phase C automatically.
internal/daemon/daemon.go:
- claudeHookSpawnPrep and opencodeSpawnPrep gain a hookMode parameter
threaded from d.cfg.Notification.Hooks.Claude /
d.cfg.Notification.Hooks.OpenCode. Both pass QUIL_HOOK_MODE=<mode>
in the PTY env (alongside the existing QUIL_PANE_ID / QUIL_HOME).
Empty mode resolves to "default" so a user with no [notification]
block still gets the v1 tier.
- Tests updated: TestClaudeHookSpawnPrep / TestOpencodeSpawnPrep now
assert the QUIL_HOOK_MODE env var is present and equal to
"default" (the tests' fixed input mode).
docs/configuration.md:
- Full [notification.hooks] subsection documenting `claude` and
`opencode` knobs with the three tier values (`default` / `verbose`
/ `off`) and what each forwards. Hook events JSONL spool path
mentioned so users can verify the daemon is consuming them.
- Default config block adds the [notification.hooks] entries.
.claude/CLAUDE.md:
- New `internal/hookevents/` bullet explaining the wire schema,
Spool / Ingester / rate limiter / coalescer responsibilities, the
HookHealthy fallback, and the JSONL spool lifecycle.
- `internal/claudehook/` and `internal/opencodehook/` bullets updated
to mention the multi-event extension, the QUIL_HOOK_MODE env, and
the per-pane rate budget on the OpenCode side.
Phases B + C + D complete. The full hook-driven notification surface
is in place — Claude PermissionRequest and Stop, OpenCode permission.ask
and session.idle, et al — and runs through the daemon's coalesce + rate
limit + aggregation pipeline before the IPC fan-out.
All green under go test ./... and -race.
All Critical / High / Medium / Low items from the final review, plus
the test-coverage gaps QA flagged. Every fix has a one-liner in the
review notes for traceability.
## Critical (1)
C1 Ingester pending timer leak past pane destroy
internal/hookevents/ingest.go — new Ingester.Cancel(paneID) stops
every AfterFunc timer keyed by paneID and removes the per-pane rate
bookkeeping. Wired into both handleDestroyPane / handleDestroyPaneReq
paths in daemon.go so a destroyed pane cannot fire one final stale
event ~50 ms later via the coalescer's natural window close.
## High (4)
H1 Path traversal via Spool.Cleanup(paneID)
daemon.go — both destroy IPC handlers now validate payload.PaneID
with isValidHexID before any FS-touching call. Defense-in-depth:
spool.go safePaneID rejects /, \, .., NUL etc and Cleanup uses
filepath.Clean + HasPrefix to ensure the resolved path stays under
the spool dir. New tests TestSafePaneID and
TestSpool_Cleanup_RejectsTraversalPaneID cover the guard.
H1 (CR) FlushAll shutdown race
ingest.go — Ingester.closed flag set by FlushAll; subsequent Submit
is a no-op. FlushAll now Stops every pending timer before draining
so AfterFunc cannot fire after FlushAll returns. New tests
TestIngester_Submit_AfterFlushAll_IsNoOp and
TestIngester_FlushAll_StopsPendingTimers pin the invariants.
H2 QUIL_HOOK_MODE was a dead knob
All three hook script producers (sh, ps1, js) now read $QUIL_HOOK_MODE
and short-circuit the spool emission when mode == "off". Daemon-side
backstop in emitHookEvent drops events when the resolved
d.cfg.Notification.Hooks.<Source> is "off" — covers the case where
the script-side gate didn't fire (older script on disk, env stripped).
Storm diagnostics always pass.
H3 First hook event being storm flips HookHealthy=true
daemon.go — emitHookEvent only sets pane.HookHealthy / LastHookEventAt
when p.HookEvent != hookevents.EventStorm. The rate limiter's own
self-emitted storm payload would otherwise mark a pane "hook-healthy"
exactly when real events have stopped flowing, silencing the legacy
idle excerpt fallback during the 10 s penalty window.
## Medium (10)
Sec M1 Cross-pane spoofing via payload.PaneID
spool.go parseAndValidate now drops payloads whose pane_id field does
not match the filename-derived paneID. A compromised plugin in pane A
can no longer write events attributed to pane B.
Sec M2 Unbounded spool growth
spool.go new rotation: when readPaneFile observes size == offset
(fully drained) AND size >= 16 MiB, truncate the file and reset the
offset. Rotation runs only on idle ticks so it never collides with
in-flight reads.
CR M2 io.ReadAll OOM on multi-MB single line
spool.go switches from io.ReadAll + bytes.Split to bufio.Reader.
ReadBytes('\n') with per-line size enforcement. A 100 MB single line
from a misbehaving producer no longer allocates into memory; it's
detected at MaxTotalBytes and dropped after advancing past it.
CR M4 PowerShell JsonEscape backslash + C0 control bytes
quil-session-hook.ps1 — backslash replacement now produces exactly
two chars (the prior pattern produced four backslashes per input
backslash, breaking every Windows path). Added a [regex]::Replace
pass that converts [\x00-\x1f] to \uXXXX so ESC and other ANSI
control bytes from Claude tool output don't break the JSON line.
CR M5 OpenCode JS clock-jump-backward
quil-session-tracker.js — consumeToken clamps elapsed to ≥ 0 and
resets the refill anchor on backward jumps. Without this, an NTP
correction would freeze the rate limiter until the clock caught up.
CR M6 OpenCode JS UTF-8 mid-codepoint truncation
quil-session-tracker.js — truncate walks backward over UTF-8
continuation bytes (0x80-0xBF) so the cut lands on a codepoint
boundary. The trailing "…" replaces clean truncated content instead
of polluting it with U+FFFD.
CR M7 Unbounded timestamp on hook payload
daemon.go emitHookEvent — clamps p.TsMs to within ±1 hour of the
daemon's clock; events with skewed timestamps fall back to time.Now.
Rules #2 + #3 + #5 Observability + error wrapping
spool.go — parsePayloads warns are sampled (1 in N per pane) to
prevent log flooding from a misbehaving producer. Unmarshal errors
log only the byte size, never err.Error() which could fragment user
prompt content into quild.log. Spool.Init returns wrapped errors
with package context.
Rules #4 docs/features.md updates
features.md Notification center section now describes hook-driven
events, the tier knob (default / verbose / off), the flow diagram,
and the Alt+M mute keybinding.
## Low + polish
L1 forwardedHookEvents duplicate detection
claudehook_test.go — TestForwardedHookEvents_NoDuplicates catches a
silently-deduped slice entry at build time.
L2 FlushAll doc comment
ingest.go — doc comment now correctly notes it's called from
hookEventsWatcher, not Daemon.Stop.
L3 emitHookEvent Data map double-allocation
daemon.go — single allocation with len(p.Data)+2 capacity.
L5 Dead awk truncate() function
quil-session-hook.sh removed; replaced inline truncation guidance.
json_escape simplified to a portable tr+sed pipeline (no GNU sed
`:a;N;$!ba` slurp pattern; works on busybox / macOS sed).
Sec L2 PowerShell C0 control bytes
Same fix as CR M4 above.
EventStorm constant
ingest.go exports the literal "internal.event_storm" string so both
producer (stormPayload) and consumer (emitHookEvent) reference the
same constant.
## Test additions (7 new test functions)
- TestSafePaneID (table-driven, 9 cases)
- TestSpool_Cleanup_RejectsTraversalPaneID
- TestIngester_Cancel_DropsPendingForPane
- TestIngester_Cancel_DoesNotAffectOtherPanes
- TestIngester_Cancel_Idempotent
- TestIngester_Submit_AfterFlushAll_IsNoOp
- TestIngester_FlushAll_StopsPendingTimers
- TestForwardedHookEvents_NoDuplicates
All green under go test ./... and go test -race ./...
Three log lines so the watcher's state is visible without modifying
production behavior:
- INFO at watcher start with the spool dir, so a misconfigured
QUIL_HOME / EventsDir mismatch is obvious from line one of the
log.
- DEBUG per non-empty Tick with the payload count, so a watcher
that runs but never finds events is distinguishable from one
that runs and finds them but the ingester drops them.
- DEBUG per successful emit with pane/src/hook_event/title, so a
full-pipeline trace is one grep away.
Investigating a reported case where Claude hook events landed in
the spool ($QUIL_HOME/events/<paneID>.jsonl) but Pane.HookHealthy
never flipped to true. The new daemon will surface where the
pipeline stalls — directory mismatch, parse drop, or downstream
emit issue.
No behavior change for non-hook code paths.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Combines PR #39 (notifications informativeness) and PR #40 (broadcast hardening) into a single PR with clean linear history. Broadcast wedge defense lands first as the foundation; notification UX work builds on top.
Closes #39, closes #40.
What's in here
Phase A — broadcast hardening (foundation)
Broadcastis non-blockingslices.Clonedefensive copydata.excerpt)_quil_truncatednamespace flag with compile-time invariant guardSetWriteDeadlinebelt-and-suspendersServer.ConnCount()test accessorlogger.Warn/logger.Errorfor new code (project leveled logger)Notifications informativeness
Message+data.excerpt); sidebar renders excerpt under titleAlt+M, persisted in workspace.json)(PaneID, Title)aggregation with×Nbadge; AddEvent updates in place + moves to front\r-reset cleanup inlastNLines(handles OSC 0 window-title leak)%,$,❯, hostname patterns)notification.max_eventsraised 50 → 200[[notification_handlers]]emits one-shot deprecation warning on plugin loadMCP surface
dismiss_notificationstool — agents can ack eventswatch_notificationsgainssince_timestamp— closes the race between agent action and watcher registrationdata.excerptwith full multi-line content (fits within 1 KiB Data cap)Commit list (8 commits, in foundation-first order)
54125ebfix(ipc): non-blocking broadcast + 4 KiB per-event size cap8b177affix(ipc): address review findings on PR fix(ipc): non-blocking broadcast + 4 KiB per-event size cap #40af85f36feat(notifications): excerpts, mute, MCP dismiss + since_timestamp8711c14fix(notifications): ANSI boundary, prompt-only suppress, aggregation01d2d70fix(notifications): seek ESC byte in ANSI trim, log idle decisions5340412fix(notifications): CR-reset cleanup + OSC hostname leak as prompt2e38d1afix(daemon): raise Data value cap to 1 KiB for multi-line excerptsTest plan
go vet ./...cleango test ./...— all 17 packages passgo test -race ./...— all packages pass under race detector8b177afAlt+Mmute, verify sidebar excerpts render under titles, verify×Nbadges aggregate, kick off slow MCP client and confirm daemon stays responsiveWhat's left for PR #41
~/.claude/plans/velvety-bubbling-twilight.md): the JSONL spool watcher, Claude/OpenCode hook script extensions registering the opinionated event tier, config knobs. Independent of this PR.