Skip to content

Release 0.17.6 — workflow-capture reliability hardening#112

Merged
ronaldeddings merged 1 commit into
mainfrom
release/0.17.6
Jun 18, 2026
Merged

Release 0.17.6 — workflow-capture reliability hardening#112
ronaldeddings merged 1 commit into
mainfrom
release/0.17.6

Conversation

@ronaldeddings

@ronaldeddings ronaldeddings commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Makes macOS + browser workflow capture (the monitor surface) production-grade across speech recognition, screen recording, AX/scroll event volume, and the task envelope. Bumps all surfaces to 0.17.6.

Speech

  • Ship the Speech Recognition usage string so macOS can present the TCC prompt; guard speech calls so a missing string can't crash on requestAuthorization.
  • New interceptor macos trust speech [--prompt] surfaces the Speech Recognition dialog (momentary activation-policy upgrade) and deep-links Settings; trust check now reports speech state.
  • Rewritten live recognition: actor-neutral authorization, on-device gated on support, ~55s task restart with a short audio ring buffer, and an always-on offline .caf fallback so a transcript exists even when live ASR is denied. New --speech-mode auto|live|offline|off. Spin-up is async — never blocks monitor start.

Screen recording

  • Fix SCStream frame geometry (pixel scale + scalesToFit + capture resolution).
  • Continuous video via SCRecordingOutputscreen.mp4 (--video [fps], --video-max-mb).
  • Frame budget / disk cap / dedup (--frame-budget, --frame-disk-cap, --frame-pixel-scale).

Event volume

  • Coalesce scroll + AX value/layout (--scroll-coalesce-ms, --ax-coalesce-ms).
  • Async batched event writer off the main run loop + bounded queue with backpressure shedding.
  • --all-apps scope hygiene (skip system chrome by default; --include-system-apps, --exclude-app). Built-in pause-before-stop.

Task envelope

  • macOS monitor stop --task now snapshots + regenerates + grades locally.
  • sid recovery + post-attach verification (no more empty split-brain envelopes).
  • Implement monitor repair --task; correct the recommended-fix strings that named a non-existent command. Auto-finalize on stop.

Lifecycle

  • Decouple start/stop acks from heavy setup; raise the monitor RPC deadline; flush the event writer on SIGINT/SIGTERM.

Verification

  • TS typecheck (all configs), swift build, and the full test suite green (484 pass / 0 fail; +2 new tests: dual-source single-envelope, repair recovery).
  • Built, signed, notarized, stapled, and installed locally; verified the speech-permission self-service prompt, full-resolution frames, continuous video, and live+offline audio all capture under one task envelope.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added monitor repair command to recover incomplete or disconnected monitor tasks
    • Added speech recognition support (offline/live modes) during workflow capture
    • Added capture configuration options: video FPS, frame budgeting, pixel scaling, scroll/accessibility event coalescing, system app filtering
    • Improved event handling with backpressure shedding for high-rate capture events
  • Bug Fixes

    • Enhanced session recovery and verification for macOS monitor start/stop
    • Increased timeout for monitor operations to prevent split-brain behavior
  • Chores

    • Version bump: 0.17.6

Make macOS + browser workflow capture (monitor) production-grade across speech
recognition, screen recording, AX/scroll event volume, and the task envelope.

Speech
- Ship the Speech Recognition usage string so macOS can present the TCC prompt;
  guard speech calls so a missing string can't crash on requestAuthorization.
- New `interceptor macos trust speech [--prompt]` surfaces the Speech
  Recognition dialog (momentary activation-policy upgrade) and deep-links
  Settings; `trust check` now reports speech state.
- Rewrite live recognition: actor-neutral authorization, on-device gated on
  support, ~55s task restart with a short audio ring buffer, and an always-on
  offline .caf fallback so a transcript exists even when live ASR is denied.
  `--speech-mode auto|live|offline|off`. Spin-up is async (never blocks start).

Screen recording
- Fix SCStream frame geometry (pixel scale + scalesToFit + capture resolution).
- Continuous video via SCRecordingOutput -> screen.mp4 (`--video [fps]`,
  `--video-max-mb`). Frame budget / disk cap / dedup (`--frame-budget`,
  `--frame-disk-cap`, `--frame-pixel-scale`).

Event volume
- Coalesce scroll + AX value/layout (`--scroll-coalesce-ms`, `--ax-coalesce-ms`).
- Async batched event writer off the main run loop + bounded queue with
  backpressure shedding.
- `--all-apps` scope hygiene (skip system chrome by default;
  `--include-system-apps`, `--exclude-app`). Built-in pause-before-stop.

Task envelope
- macOS `monitor stop --task` now snapshots + regenerates + grades locally.
- sid recovery + post-attach verification (no more empty split-brain envelopes).
- Implement `monitor repair --task`; correct the recommended-fix strings that
  named a non-existent command. Auto-finalize on stop.

Lifecycle
- Decouple start/stop acks from heavy setup; raise the monitor RPC deadline;
  flush the event writer on SIGINT/SIGTERM.

Bump all surfaces to 0.17.6. Tests +2 (dual-source single-envelope, repair
recovery); suite 484/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a non-blocking MonitorEventWriter with backpressure, scroll/AX event coalescing, continuous video recording via SCRecordingOutput, multi-mode speech recognition with offline CAF fallback, frame deduplication and budget caps, Speech Recognition TCC integration, and a repairMonitorTask workflow for recovering split-brain sessions, all surfaced through new CLI capture knobs and subcommands.

Changes

Monitor Capture Infrastructure and Repair Flow

Layer / File(s) Summary
Non-blocking MonitorEventWriter with backpressure
interceptor-bridge/Sources/Platform.swift, interceptor-bridge/Sources/main.swift
Replaces synchronous tee writes with a serial-queue MonitorEventWriter that caches FileHandles, enforces a highWaterMark in-flight limit, returns a Bool backpressure signal from appendMonitorEvent, and flushes/closes on SIGINT/SIGTERM.
MonitorRuntime extended state
interceptor-bridge/Sources/Domains/MonitorRuntime.swift
Adds stored properties for continuous SCRecordingOutput artifacts, Speech ring-buffer/restart/offline-CAF state, and per-session knobs: speechMode, framePixelScale, frameBudget, disk caps, coalescing intervals, app filters, and backpressureNotified.
Scroll and AX event coalescing
interceptor-bridge/Sources/Domains/MonitorInputBridge.swift, interceptor-bridge/Sources/Domains/MonitorAxBridge.swift
MonitorInputBridge gains scrollCoalesceMs with accumulateScroll/flushScroll batching wheel deltas per window. MonitorAxBridge gains setCoalesceMs and emitMaybeCoalesced/flushCoalesced that debounce kAXValueChangedNotification and kAXLayoutChangedNotification; detachAll flushes pending events first.
Frame budgets, video recording, rewritten speech recognition
interceptor-bridge/Sources/Domains/MonitorDomain.swift
startFrameCapture uses pixel scaling, branches on video vs still-frame FPS, wires SCRecordingOutput, and emits video_start/video_stop. MonitorCaptureOutput gains frameBudget/diskCapBytes limits, per-frame hash deduplication, and frame_budget_reached emission. Speech is rewritten for offline/live/auto modes with a CAF tee, ring-buffer task restarts, and speech_audio_done.
Session start/stop with backpressure shedding and flush ordering
interceptor-bridge/Sources/Domains/MonitorDomain.swift
startSession parses new knobs, persists runtime config including AX coalescing, and applies isExcludedApp on scope-all attach. stopSession pauses-then-flushes before ending, records mon_stop before removing runtime, and flushes/closes the writer handle before file rotation. recordEvent implements noisyEvents shedding with a one-time capture_backpressure marker.
Speech Recognition TCC and build scripts
interceptor-bridge/Sources/Domains/TrustDomain.swift, scripts/build-bridge.sh
TrustDomain conditionally imports Speech, adds checkSpeech(prompt:) with activation-policy upgrade/revert, surfaces a Speech Recognition permissions entry with Settings deep-link, extends the walkthrough pane sequence, and reports speechRecognition in trust results. Build script emits NSSpeechRecognitionUsageDescription in Info.plist.
repairMonitorTask, findLatestSessionSidForTask, and quality extensions
shared/monitor-tasks.ts
Adds findLatestSessionSidForTask (scans MONITOR_SESSIONS_DIR), transcribeTaskSpeechAudio (detects unprocessed CAF files), and repairMonitorTask (recovers unregistered sessions, snapshots, transcribes, regenerates, generates quality report). generateMonitorTaskCaptureQuality gains speech_permission_denied and frame_geometry_degenerate findings. recommendedFix strings updated to reference interceptor monitor repair.
Tests for multi-source envelope and split-brain repair
test/monitor-tasks.test.ts
Adds two tests: browser+macOS sessions sharing a single task envelope with merged timeline; an unregistered macOS session scenario asserting repairMonitorTask attaches it and returns a usable quality report.
CLI macos.ts: SID recovery, attach verification, stop flow, capture knobs
cli/commands/macos.ts
Adds SID fallback via findLatestSessionSidForTask; post-attach meta verification with markMonitorTaskSourceAttachFailed on mismatch; dedicated macos_monitor stop --task flow with transcript/quality output; speech subcommand detection for trust prompt logic; new monitor capture knob parsing (speechMode, video fps/max-mb, frame budget/scale, coalescing, app filters) wired into the outgoing action.
CLI monitor repair subcommand and timeout
cli/commands/monitor.ts, cli/index.ts, cli/transport.ts
monitor.ts adds interceptor monitor repair subcommand and monitor task repair operation, calls synthesizeMonitorTaskTranscript on stop, and updates help text. index.ts adds repair to MONITOR_LOCAL_SUBCOMMANDS. transport.ts adds a 60s timeout override for macos_monitor.
Version bumps to 0.17.6
package.json, extension/manifest.json, extension/dist-mv2/manifest.json
Updates version from 0.17.3 to 0.17.6 across package and extension manifests.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as CLI (macos.ts / monitor.ts)
  participant MonitorDomain as MonitorDomain (Swift)
  participant MonitorEventWriter as MonitorEventWriter (Swift)
  participant Bridges as Input/AX Bridges (Swift)
  participant SpeechCapture as Speech/Frame Capture (Swift)

  rect rgba(30, 100, 200, 0.5)
    note over CLI,MonitorDomain: Session Start
    CLI->>MonitorDomain: macos_monitor start (speechMode, video fps, coalescing, excludeApps...)
    MonitorDomain->>MonitorDomain: parseKnobs → persist onto MonitorRuntime
    MonitorDomain->>Bridges: start(scrollCoalesceMs, axCoalesceMs)
    MonitorDomain->>SpeechCapture: startSpeechRecognition(speechMode: offline/live/auto)
    MonitorDomain->>SpeechCapture: startFrameCapture(videoFps, frameBudget, diskCap)
    MonitorDomain->>MonitorEventWriter: enqueue mon_start + video_start
  end

  rect rgba(200, 100, 30, 0.5)
    note over Bridges,MonitorEventWriter: High-rate event coalescing
    Bridges->>Bridges: emitMaybeCoalesced / accumulateScroll (timer window)
    Bridges->>MonitorDomain: recordEvent(coalesced payload)
    MonitorDomain->>MonitorEventWriter: enqueue → Bool (backpressure)
    MonitorEventWriter-->>MonitorDomain: highWater → shed noisyEvents, emit capture_backpressure
  end

  rect rgba(30, 160, 80, 0.5)
    note over CLI,MonitorEventWriter: Session Stop
    CLI->>MonitorDomain: macos_monitor stop --task <id>
    MonitorDomain->>MonitorDomain: s.paused = true
    MonitorDomain->>MonitorEventWriter: flush()
    MonitorDomain->>SpeechCapture: stopSpeechRecognition → finalize CAF, emit speech_audio_done
    MonitorDomain->>SpeechCapture: stopFrameCapture → emit video_stop
    MonitorDomain->>MonitorEventWriter: close(sessionEventsPath) → rotate file
    CLI->>CLI: synthesizeTranscript + generateCaptureQuality → print
  end
Loading
sequenceDiagram
  participant CLI as CLI (monitor.ts)
  participant repairFn as repairMonitorTask (shared/monitor-tasks.ts)
  participant SessionDir as MONITOR_SESSIONS_DIR
  participant TaskMeta as Task Metadata

  CLI->>repairFn: repairMonitorTask(taskId, {snapshotSources, regenerate})
  repairFn->>SessionDir: findLatestSessionSidForTask(taskId)
  SessionDir-->>repairFn: unregistered SID
  repairFn->>TaskMeta: attachMonitorTaskSource(sid)
  repairFn->>repairFn: snapshotMonitorTaskSources()
  repairFn->>repairFn: transcribeTaskSpeechAudio(taskId)
  repairFn->>repairFn: synthesizeMonitorTaskTranscript(taskId)
  repairFn->>repairFn: generateMonitorTaskCaptureQuality(taskId)
  repairFn-->>CLI: { manifest, report, attached: [sid] }
  CLI-->>CLI: print JSON or status+quality
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Hacker-Valley-Media/Interceptor#79: Introduced the multi-session monitor plumbing in Platform.appendMonitorEvent and MonitorRuntime that this PR directly extends with the async MonitorEventWriter and backpressure.
  • Hacker-Valley-Media/Interceptor#95: Established the task-scoped monitor session envelope and session SID attach flow in shared/monitor-tasks.ts and CLI that this PR builds the repairMonitorTask workflow on top of.

Poem

🐇 Hoppity-hop through the capture queue,
No more blocking writes — async it flew!
Coalesced the scrolls, debounced the AX rain,
Repaired split-brain sessions with hardly a strain.
Speech offline, video rolling, budgets in check —
The bunny kept data safe, frame after spec! 🎬

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Release 0.17.6 — workflow-capture reliability hardening' accurately and specifically describes the main focus of this changeset: a production release that hardens the workflow capture feature across multiple reliability dimensions (speech recognition, screen recording, event volume management, task envelope handling).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/0.17.6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ronaldeddings ronaldeddings merged commit 9b8c4a6 into main Jun 18, 2026
1 of 2 checks passed
@ronaldeddings ronaldeddings deleted the release/0.17.6 branch June 18, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant