feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen)#1
Open
kwliang1 wants to merge 56 commits into
Open
feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen)#1kwliang1 wants to merge 56 commits into
kwliang1 wants to merge 56 commits into
Conversation
kwliang1
pushed a commit
that referenced
this pull request
Jun 28, 2026
Sam's suggestion #1: the cached `status: 'live' | 'dead'` field can go stale if a session dies outside daemon control. Replace with `deadAt?: number` (records when death was detected) and `isAlive()` helper that checks tmux directly. Dead code `isSessionDead()` removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4bbdf4d to
994ac68
Compare
Moves all spawn prompt construction (spawn, fork, handoff, resurrect) out of session-lifecycle.ts into dedicated prompt builders. Sharpens the set_description instruction: "Lead with the domain if one is clear. 5 words max. Rewrite it whenever your focus shifts." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…INSTRUCTION - Move prompts.ts into daemon/prompts/session.ts alongside existing protocol prompts (build-critic, review-critic, design-*, etc.) - Export DESCRIPTION_INSTRUCTION for protocol prompts to compose in - Honest commit scope: this is refactor + behavioral change (10→5 words, domain-leading, lower rewrite threshold) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: extract session prompts to daemon/prompts.ts
Adds `hydra` CLI for programmatic session management over the Unix socket. Commands: spawn, list, status, kill, health, clear-key. - Idempotency keys prevent duplicate spawns (survives daemon restarts) - Initiator tracked as structured field on SpawnOpts and SessionInfo - @mentions allowFrom users in CLI-spawned threads for auto-join - Kill race fixed: capture key before kill, overwrite after death handler - DEFAULT_SESSION_CHANNEL updated to active server Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fetch helpers now return null on API failure (distinct from [] for success-with-no-items). pollPr() only advances lastCheckedAt when all three comment fetches succeed, so a failed poll cycle retries the same time window instead of permanently skipping comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: hydra CLI — programmatic daemon interface
Re-implements PRs sf8193#39 and sf8193#42 which were merged to `live` (not `main`) and lost in the 2026-06-28 live rebuild. - readAccessFile() now spreads parsed over defaults — new Access fields no longer silently drop - defaultListen on Access and GroupPolicy types - resolveListenState cascade: thread listenOverride → group defaultListen → global defaultListen → false - listen/unlisten commands persist listenOverride to ThreadMetadata so respawned sessions inherit the thread's listen preference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msg is not in doSpawnSession's scope — chatId is the correct parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ThrottledQueue previously swallowed errors without retry — if ch.setName() threw for any reason, the visual update was permanently lost. Now re-enqueues on failure (up to 3 attempts), preserving original priority and coalescing with any newer value for the same key. Documents the empirically measured Discord shared-scope rate limit on thread renames: under burst conditions, ~2 rapid renames trigger 429 + retry-after ~600s (x-ratelimit-scope: shared). Per-channel vs global scoping unconfirmed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(discord): connectivity-aware resilience via gateway health contract
Discord enforces a shared-scope rate limit on thread renames (~2 per burst window). Mid-protocol turn transitions consumed the budget before completion could land. Thread renames now fire only on outer state changes (spawn, protocol start/end, kill, cancel). Mid-protocol progress moves to thread-visible text — review uses a single live-edited status message (gateway.edit, 5/2s rate limit), build and design embed badges in existing status messages. Design badges use formatPhaseBadge() (single source of truth). A 3-round review goes from 8+ renames to 2 — completion always lands immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ment, per-platform logs, byte revival, safe restart preflight
Typing `! <message>` in a thread sends Escape to the tmux session (interrupting current work), then delivers the message normally. - Uses Bun.spawn array form (no shell, no injection surface) - Adds initiator field to SpawnOpts for CLI compatibility - Resolves type errors from sf8193#53 visual system refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: ThrottledQueue retry on failure + rate limit docs
fix: restore defaultListen + persist listen across respawns
Publishes a live session overview to the bot's Home tab using Block Kit. Auto-updates on session changes (debounced) and periodically every 5 min. Shows status, thread links, description, and age for each live session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement onReaction in SlackGateway so the existing hocho handler in the daemon router works for Slack, not just Discord. Handles thread parents by deleting children first, reacts⚠️ when threads contain undeletable messages from other users. Includes bot self-reaction guard for parity with Discord. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts the thread-parent delete logic that fetched and deleted all children before retrying the parent. Single-message delete only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track lastReplyId on sessions from both inbound messages (router) and outbound bot replies (bridge-dispatch). Dashboard and CLI list use gateway.getMessageUrl to build thread-scoped deep links that open Slack to the latest message in the thread panel. Includes debounced persist (2s coalesce), deleted-message cleanup, startup backfill via Slack's latest_reply, and shutdown flush. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add spawn input to Home tab (text field with Enter dispatch) - Show PR watch links as context blocks under each session - Auto-unwatch PRs on merge/close during poll cycle - Backfill PR titles from GitHub on daemon startup - Auth checks on home:spawn and app_home_opened (allowFrom gate) - Deduplicate PR API call in pollPr (pass prData to fetchCheckStatus) - Fix lastReplyId semantic split — standardize on outbound reply ID - Escape mrkdwn injection in session descriptions - Cap block count at 31 to stay under Slack's 100-block view limit - Input max_length: 500 with handler-side truncation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each daemon now writes daemon-{platform}.json alongside the legacy
daemon.json during bridge sync. The bridge checks the platform-keyed
file first, eliminating the last-writer-wins race when two daemons
share a plugin cache. CHAT_PLATFORM is now propagated to spawned
sessions and the Slack byte so bridges can resolve the correct file.
Fully backwards compatible — old bridges fall through to daemon.json,
old daemons still write daemon.json which new bridges accept as
legacy fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: platform-keyed daemon config for dual-daemon operation
Add daemon+byte lifecycle commands to the hydra CLI. Platform is always required (no default). - `hydra up <platform>` — validate byte script exists, check for running tmux sessions and orphaned claude processes (prevents gotcha sf8193#32 ping-pong), start daemon, wait for socket, start byte - `hydra down <platform>` — stop byte via stop-byte.sh (orphan cleanup), stop daemon, remove stale socket + PID file - `hydra restart <platform>` — restart daemon only (picks up code changes) No hardcoded platform enum — uses filesystem-based validation (does start-{platform}-byte.sh exist?). New platforms work with zero CLI changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: hydra up/down/restart — CLI lifecycle management
Replaces start-byte-v2.sh (discord) and start-slack-byte.sh with a
single platform-agnostic start-byte.sh. The v2 daemon+bridge
architecture is now the only architecture — the version suffix was a
migration artifact. Old names preserved as thin deprecation wrappers
(inject CHAT_PLATFORM, exec start-byte.sh, print notice to stderr).
Key changes beyond dedup:
- Shared env preamble (env-setup.sh) — PATH, .env sourcing, STATE_DIR
in one place, sourced by every script (including preflight.sh)
- Strict mode (set -euo pipefail) on all executable scripts
- Progressive CHAT_PLATFORM enforcement — refuses to default when
multiple platform state dirs exist (N-platform aware)
- Auth tokens read from file, not interpolated into command strings
- macOS assertion makes the platform contract explicit
- Unified log paths: ~/hydra-${CHAT_PLATFORM}-{daemon,byte}.log
- Consistent #!/bin/bash shebangs across all scripts
- Script architecture documented in README (layering, conventions)
- Deprecation wrappers for backwards compat (shell history, worktrees)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor: shared preamble, unified byte script, strict mode
…adAt
Four fixes to the crash detection category:
1. Remove duplicate inline death check in bridge-server.ts socket.on('end')
— checkSessionDeath() already covers this case via setTimeout. The inline
copy fired in parallel, producing double death notices.
2. Health poll (daemon.ts) changed from OR to AND — only flag as crashed
when BOTH tmux is dead AND bridge is disconnected. Bridge-only
disconnects are handled by the bridge-server disconnect handler (3s
delay + tmux check). The OR condition false-positived on temporary
bridge drops and newly spawned sessions.
3. Health poll now sets info.deadAt + calls registry.persist() +
threadRegistry update + refreshSessionVisual(), fully consistent
with checkSessionDeath().
4. 60s spawn grace period — skip crash detection for sessions younger
than SPAWN_GRACE_MS (bridge needs time to connect after spawn).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: 60s grace period on crash detection for spawned sessions
DEFAULT_SESSION_CHANNEL was hardcoded to a Discord channel ID as fallback. This is deployment-specific config that belongs in .env, not source code. Now required from .env — daemon warns on startup if missing. Also fixes load order: export moved after .env sourcing so .env values are actually read. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: DEFAULT_SESSION_CHANNEL must read after .env sourcing
Absorb all 10 shell scripts (start-daemon, start-byte, stop-byte, restart-daemon, watchdog, preflight, env-setup, compile-check, kill-orphan-bytes) into the TypeScript CLI as cli/helpers.ts and cli/lifecycle.ts. The CLI entry point (cli/hydra.ts) is now a slim router that delegates to typed lifecycle commands. New commands: hydra watchdog <platform>, hydra preflight <platform>. Shell scripts retained for backward compat until production validated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
hydra install <platform> generates a launchd plist with correct paths for the current user, loads it, creates the state dir, and runs preflight. hydra uninstall removes it. Simplifies new user setup to: bun install → create .env → hydra install → hydra up. README rewritten to use CLI commands instead of shell scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
hydra install now accepts --cwd and --config-dir flags so users don't need env vars. All executable shell scripts now print a deprecation warning pointing to the CLI equivalent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If git worktree remove --force fails (e.g. corrupted .git file), fall back to rm -rf but only when the path is inside a .worktrees/ directory to prevent accidental deletion of real repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Splits that land inside a ``` code fence now close the fence at the chunk boundary and reopen it (with the language tag) at the start of the next chunk, so multi-part messages render correctly in Slack and Discord. The new 'markdown' mode is the default; 'length' and 'newline' modes are preserved for back-compat. Split-point preference: paragraph break outside fence > line break outside fence > line break inside fence > space > hard limit. Best-effort avoidance of mid-table-row splits. Reserves 4 chars of headroom for the fence closer so chunks never exceed the stated limit. Progress guard prevents infinite loops when fence overhead >= cut size. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(daemon): add markdown-aware message chunking mode
Transcribe inbound audio attachments (Discord voice notes, Slack audio clips) to text so users can dictate prompts to Claude alongside text and images. Claude has no native audio input, so the daemon transcribes first and merges the result into the message as a [voice transcript] block; the original audio file stays in downloaded_files. - daemon/transcription.ts: audio detection, transcript merging (pure, unit-tested), and an HTTP client to a self-hosted STT sidecar. Failures are logged and skipped — dictation never blocks message delivery. - daemon/router.ts: hook transcription into buildNotificationPayload after attachment download. - transcribe-server/: self-hosted sidecar serving NVIDIA Canary-Qwen 2.5B via NeMo (top of the Open ASR leaderboard for English accuracy), plus a GPU-free mock_server.py for testing the wiring locally. - Off by default; enable with HYDRA_TRANSCRIBE_ENABLED=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make dictation work as a packaged default rather than a manual opt-in:
- Daemon transcription is now ON by default ("auto"): it's attempted
whenever audio arrives and silently no-ops if no sidecar is reachable
(the fetch fails fast). HYDRA_TRANSCRIBE_ENABLED=0 opts out.
- start-transcribe.sh: idempotent launcher for the sidecar in a tmux
session. Accepts a backend arg (`./start-transcribe.sh mock`) for a
zero-GPU end-to-end test; canary backend refuses cleanly until set up.
- watchdog.sh: revive the sidecar when HYDRA_TRANSCRIBE_AUTOSTART is set,
reusing the same supervision pattern as the bot session.
- transcribe-server/setup.sh: one-time venv + NeMo install.
- Docs/env updated for the packaged-default flow.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pasting a doc line with an inline '# comment' into interactive zsh (which does not strip '#') passed the comment as args, making the launcher try backend '#'. Only accept mock|canary as the positional arg; ignore anything else with a warning and fall back to env/default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Canary-Qwen runs via NeMo and needs a CUDA GPU, so it can't run on Mac. Add a Parakeet-MLX backend (NVIDIA Parakeet TDT on Apple's MLX runtime): native, ~50x realtime on M-series, ~6% English WER, no GPU. - transcribe-server/server_mlx.py — FastAPI server, same /transcribe contract, parakeet-mlx + ffmpeg resample. - transcribe-server/requirements-mlx.txt — light deps (no torch/NeMo). - start-transcribe.sh — `parakeet` backend; default by platform (Darwin -> parakeet, else canary). - setup.sh — installs the right requirements per platform / arg. - Docs updated across README, transcribe-server/README, .env.example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
MLX ships arm64-only wheels, and on Apple Silicon the shell frequently runs under Rosetta (x86_64), where 'pip install mlx' fails with no matching wheel. setup.sh now uses an arm64 Homebrew python@3.12 and builds the venv via 'arch -arm64' for the parakeet backend; uv/system python paths remain for canary. Also relax the parakeet-mlx pin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tmux does not reliably inherit env, so a PARAKEET_MODEL override in .env never reached the server (it would fall back to the 0.6B default). Forward model-selection vars explicitly, only when set. Enables pinning the smaller 110M Parakeet on constrained networks where the 2.4GB 0.6B is impractical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tests The module-top-level stub leaked into every test file loaded after it, swallowing bun test's per-test output and final summary for the rest of the suite (bun runs all files in one process). The stub was unnecessary: these tests only exercise pure helpers that never write to stderr. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Voice dictation now comes up with the daemon instead of needing a separate manual step or an explicit opt-in flag: - start-transcribe.sh gains an --auto mode used by every supervisor: explicit HYDRA_TRANSCRIBE_AUTOSTART wins in both directions; when unset it starts the sidecar iff the backend is ready (venv built, or mock chosen), and quietly no-ops otherwise so unconfigured machines don't log a failure every watchdog cycle. Honors HYDRA_TRANSCRIBE_ENABLED=0. - hydra up starts it right after the daemon (model loads while the byte comes up); hydra watchdog revives it each tick; hydra down stops it. - Legacy start-daemon.sh and watchdog.sh call the same --auto path; the watchdog's AUTOSTART=1 opt-in grep is replaced by the shared gate. - mock backend: resolve python3 up front and fall back to /usr/bin/python3 — asdf's shim fails when no python version is pinned for the dir, and mock_server.py is pure stdlib. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Slack voice clips arrive as files with mimetype audio/mp4 (m4a), or audio/webm;codecs=opus from browser recordings — assert detection for those shapes plus the extension fallback. Add transcribeDownloads tests with a stubbed fetch: only audio files are POSTed, sidecar failure skips the file instead of throwing, and the disabled flag short-circuits before the network. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d 1) Structural: - ONE shared tmux session (hydra-transcribe) for all platform daemons — per-platform sessions raced for the same default port, and the loser reloaded the full model every watchdog tick. hydra down kills it; the other platform's watchdog revives it within a tick. - A crashed server PARKS its session (error on screen + in log) instead of exiting, so supervisors' has-session check holds: broken configs fail once, not as a model-load crash-loop every 120s. - The mock backend is excluded from --auto (manual or explicit AUTOSTART=1 only) — leftover test config must not keep canned transcripts flowing into real prompts. A remote HYDRA_TRANSCRIBE_URL also disables local autostart. Environment/robustness: - Extract only dictation keys from the state-dir .env instead of set -a sourcing the whole file — the model server has no business holding chat bot tokens (they leaked into the tmux server env when this script bootstrapped it). - Forward PATH into the tmux pane (launchd-frozen server env lacks /opt/homebrew/bin, breaking the servers' ffmpeg lookup) and shell-quote every interpolated value (shq) so an embedded quote can't break out of the tmux command string. - URL without an explicit port now binds the scheme default so a mismatch fails visibly instead of the sidecar silently serving a port the daemon never queries. - start-daemon.sh: sidecar refusal no longer fails the whole script under set -e after a successful daemon start. Legacy watchdog runs the sidecar step before the daemon branches' early exits. Daemon: - isAudioFile: a definitive non-audio MIME (video/mp4 screen recording) is no longer re-classified as audio by its extension; only generic types fall back. Codec suffixes (audio/webm;codecs=opus) parsed correctly. - transcribeFile checks size via statSync BEFORE reading — the cap now protects daemon memory, not just sidecar latency. - Tests: env save/restore moved into beforeEach/afterEach (the old describe-body restore ran at collection time and leaked env into later test files); network tests set HYDRA_TRANSCRIBE_ENABLED explicitly; new cases for video-MIME rejection and the pre-network size cap. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- --auto with AUTOSTART unset now also requires BACKEND != mock — with a built venv, leftover BACKEND=mock in .env passed the venv-only gate and auto-supervised canned transcripts, the exact residue case the previous commit claimed to prevent. - .env key extraction now parses like shell sourcing: optional 'export' prefix, quoted values kept verbatim (a # inside quotes is not a comment), unquoted values lose trailing inline comments/whitespace. The grep|cut version kept ' 0 # never' whole, silently defeating explicit opt-outs and erroring per watchdog tick on commented backends. - Document that multi-platform machines must keep dictation config identical across platform .env files (shared session = first supervisor wins). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
6136e07 to
2e5be02
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds voice dictation to Hydra: inbound audio attachments (Discord voice notes, Slack voice clips / audio uploads) are transcribed to text so you can dictate prompts to Claude alongside text and images. Claude has no native audio input, so the daemon transcribes first and merges the result as a
[voice transcript] ...block; the original audio stays indownloaded_files.Rebased on the latest upstream
main(the hydra CLI, Slack Home tab, markdown chunking, etc.). The unrelatedproj:/dir:spawn-prefix commit that was previously on this branch is dropped — that's its own PR (upstream sf8193#60).Integrated into daemon start (not a separate service to remember)
hydra upstarts the sidecar with the daemon (kicked off right after the daemon tmux spawn so the model loads while the byte comes up).hydra downstops it. Everyhydra watchdogtick revives it. Legacystart-daemon.sh/watchdog.shcall the same path.start-transcribe.sh --auto, one shared gate: explicitHYDRA_TRANSCRIBE_AUTOSTART=1/0wins in both directions; when unset it starts the sidecar iff a real backend is set up (venv built) and quietly no-ops otherwise — machines without dictation never log a failure per watchdog cycle. After the one-time./transcribe-server/setup.shthere is zero extra config.hydra-transcribe) serves every platform daemon — per-platform sessions would race for the same port.tmux kill-session -t hydra-transcribeto retry.AUTOSTART=1only) — leftover test config can't keep canned transcripts flowing into real prompts. A remoteHYDRA_TRANSCRIBE_URLdisables local autostart.Slack audio attachments
messageevents with subtypefile_share/slack_audioand afiles[]entry (mimetype: audio/mp4,.m4a); the gateway only filtersbot_message, so they flow throughdownloadAttachments(bearer-authurl_private_download) into the transcription hook like any attachment.audio/mp4,audio/webm;codecs=opus(browser recordings), extension fallback for generic mimetypes only — a definitive non-audio MIME (video/mp4screen recording) is never re-classified as voice. Both sidecar servers resample via ffmpeg, so m4a/ogg/webm all work.transcribeDownloadstests pin the contract with a stubbed fetch: only audio files are POSTed, sidecar failure skips (never blocks delivery), the size cap is enforced before the network, the disabled flag short-circuits.Packaged default (not opt-in)
HYDRA_TRANSCRIBE_TIMEOUT_MS).HYDRA_TRANSCRIBE_ENABLED=0opts out.POST /transcribe→{"text": ...}).Hardening (3-lens adversarial review: engineering / ops blast-radius / security)
.env(no moreset -asourcing that leaked bot tokens into the model server's environment); parsing matches shell sourcing (exportprefix, quoted values verbatim, inline comments stripped).PATHforwarded into the tmux pane (launchd-frozen server env broke ffmpeg lookup); every interpolated value shell-quoted (shq); mock backend falls back to system python (asdf shims fail without a pinned version).transcribeFilechecks size viastatSyncbefore reading (the cap protects daemon memory, not just sidecar latency); a URL without an explicit port binds the scheme default so a mismatch fails visibly instead of silently serving the wrong port;start-daemon.sh's sidecar step can't fail the script underset -eafter a successful daemon start.process.stderr.writestub swallowed every later test file's output (and hid a suite crash).Try it now — no GPU
Real model (one-time; needs ffmpeg)
Changes
daemon/transcription.ts— audio detection + transcript merge (pure, unit-tested) + HTTP client; failures logged and skipped, never block delivery.daemon/router.ts— hook intobuildNotificationPayloadafter attachment download;voice_transcriptmeta marker.cli/lifecycle.ts,cli/helpers.ts— sidecar supervision inhydra up/down/watchdog(startTranscribeAuto, sharedtranscribeTmux).start-transcribe.sh—--autogate, shared session, park-on-crash, key-allowlisted.envextraction,shqquoting.start-daemon.sh,watchdog.sh— call the shared--autopath (replaces the watchdog'sAUTOSTART=1grep opt-in; sidecar step runs before the watchdog's early exits).transcribe-server/—server_mlx.py(Parakeet-MLX),server.py(Canary-Qwen via NeMo + ffmpeg),mock_server.py(stdlib stub),setup.sh,requirements*.txt,README.md.daemon/__tests__/transcription.test.ts— helper tests + Slack shapes +transcribeDownloadswith stubbed fetch + proper env save/restore..env.example,README.md.Verified:
bun buildclean on daemon.ts / bridge.ts / cli/hydra.ts;bun test→ 298/298; mock sidecar e2e over the real code path (transcribeDownloads→ multipart POST → merge), incl. sidecar-down fail-fast; live checks of the full--autogate matrix, park-on-crash (forced bind failure → parked once, no respawn),.envparsing edge cases, no token leak into the sidecar env (ps eww),hydra watchdogstarting andhydra downstopping the sidecar. Reviewed by 3 parallel independent agents (engineering / ops / security) over 4 rounds until 2 successive clean rounds.🤖 Generated with Claude Code