feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen) by kwliang1 · Pull Request #1 · kwliang1/hydra

kwliang1 · 2026-06-28T02:39:08Z

What

Adds voice dictation to Hydra: inbound audio attachments (Discord voice notes, Slack voice clips / audio uploads) are transcribed to text so you can dictate prompts to Claude alongside text and images. Claude has no native audio input, so the daemon transcribes first and merges the result as a [voice transcript] ... block; the original audio stays in downloaded_files.

Rebased on the latest upstream main (the hydra CLI, Slack Home tab, markdown chunking, etc.). The unrelated proj:/dir: spawn-prefix commit that was previously on this branch is dropped — that's its own PR (upstream sf8193#60).

Integrated into daemon start (not a separate service to remember)

hydra up starts the sidecar with the daemon (kicked off right after the daemon tmux spawn so the model loads while the byte comes up). hydra down stops it. Every hydra watchdog tick revives it. Legacy start-daemon.sh / watchdog.sh call the same path.
All supervisors go through start-transcribe.sh --auto, one shared gate: explicit HYDRA_TRANSCRIBE_AUTOSTART=1/0 wins in both directions; when unset it starts the sidecar iff a real backend is set up (venv built) and quietly no-ops otherwise — machines without dictation never log a failure per watchdog cycle. After the one-time ./transcribe-server/setup.sh there is zero extra config.
One shared tmux session (hydra-transcribe) serves every platform daemon — per-platform sessions would race for the same port.
A crashed server parks instead of exiting: the session stays alive holding the error, so a broken config (bad port, missing ffmpeg, failed model download) fails once — not a model-load crash-loop every 120s watchdog tick. Fix the cause, tmux kill-session -t hydra-transcribe to retry.
The mock backend is never auto-supervised (manual run or explicit AUTOSTART=1 only) — leftover test config can't keep canned transcripts flowing into real prompts. A remote HYDRA_TRANSCRIBE_URL disables local autostart.

Slack audio attachments

Slack voice clips arrive as message events with subtype file_share/slack_audio and a files[] entry (mimetype: audio/mp4, .m4a); the gateway only filters bot_message, so they flow through downloadAttachments (bearer-auth url_private_download) into the transcription hook like any attachment.
Detection covers Slack shapes: audio/mp4, audio/webm;codecs=opus (browser recordings), extension fallback for generic mimetypes only — a definitive non-audio MIME (video/mp4 screen recording) is never re-classified as voice. Both sidecar servers resample via ffmpeg, so m4a/ogg/webm all work.
New transcribeDownloads tests pin the contract with a stubbed fetch: only audio files are POSTed, sidecar failure skips (never blocks delivery), the size cap is enforced before the network, the disabled flag short-circuits.

Packaged default (not opt-in)

On by default on the daemon side. Whenever a transcription sidecar is reachable, voice notes are transcribed; when it isn't, audio passes through untouched (fetch fails fast and is skipped; a live-but-slow sidecar delays only the voice message itself, bounded by HYDRA_TRANSCRIBE_TIMEOUT_MS). HYDRA_TRANSCRIBE_ENABLED=0 opts out.
Self-hosted, audio stays local. macOS (Apple Silicon) → Parakeet-MLX (default); Linux+GPU → NVIDIA Canary-Qwen 2.5B via NeMo. Swappable behind a one-line HTTP contract (POST /transcribe → {"text": ...}).

Hardening (3-lens adversarial review: engineering / ops blast-radius / security)

The sidecar env gets only dictation keys from the state-dir .env (no more set -a sourcing that leaked bot tokens into the model server's environment); parsing matches shell sourcing (export prefix, quoted values verbatim, inline comments stripped).
PATH forwarded into the tmux pane (launchd-frozen server env broke ffmpeg lookup); every interpolated value shell-quoted (shq); mock backend falls back to system python (asdf shims fail without a pinned version).
transcribeFile checks size via statSync before reading (the cap protects daemon memory, not just sidecar latency); a URL without an explicit port binds the scheme default so a mismatch fails visibly instead of silently serving the wrong port; start-daemon.sh's sidecar step can't fail the script under set -e after a successful daemon start.
Fixed a test-infra bug from the original branch: a module-level process.stderr.write stub swallowed every later test file's output (and hid a suite crash).

Try it now — no GPU

./start-transcribe.sh mock        # GPU-free stub (manual only — never auto-supervised)
# send a voice note -> Claude gets "[voice transcript] This is a mock transcription..."

Real model (one-time; needs ffmpeg)

./transcribe-server/setup.sh      # venv + the right backend for your platform
# done — the sidecar now starts and stays up with the daemon

Changes

daemon/transcription.ts — audio detection + transcript merge (pure, unit-tested) + HTTP client; failures logged and skipped, never block delivery.
daemon/router.ts — hook into buildNotificationPayload after attachment download; voice_transcript meta marker.
cli/lifecycle.ts, cli/helpers.ts — sidecar supervision in hydra up / down / watchdog (startTranscribeAuto, shared transcribeTmux).
start-transcribe.sh — --auto gate, shared session, park-on-crash, key-allowlisted .env extraction, shq quoting.
start-daemon.sh, watchdog.sh — call the shared --auto path (replaces the watchdog's AUTOSTART=1 grep opt-in; sidecar step runs before the watchdog's early exits).
transcribe-server/ — server_mlx.py (Parakeet-MLX), server.py (Canary-Qwen via NeMo + ffmpeg), mock_server.py (stdlib stub), setup.sh, requirements*.txt, README.md.
daemon/__tests__/transcription.test.ts — helper tests + Slack shapes + transcribeDownloads with stubbed fetch + proper env save/restore.
.env.example, README.md.

Verified: bun build clean on daemon.ts / bridge.ts / cli/hydra.ts; bun test → 298/298; mock sidecar e2e over the real code path (transcribeDownloads → multipart POST → merge), incl. sidecar-down fail-fast; live checks of the full --auto gate matrix, park-on-crash (forced bind failure → parked once, no respawn), .env parsing edge cases, no token leak into the sidecar env (ps eww), hydra watchdog starting and hydra down stopping the sidecar. Reviewed by 3 parallel independent agents (engineering / ops / security) over 4 rounds until 2 successive clean rounds.

🤖 Generated with Claude Code

Sam's suggestion #1: the cached `status: 'live' | 'dead'` field can go stale if a session dies outside daemon control. Replace with `deadAt?: number` (records when death was detected) and `isAlive()` helper that checks tmux directly. Dead code `isSessionDead()` removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Moves all spawn prompt construction (spawn, fork, handoff, resurrect) out of session-lifecycle.ts into dedicated prompt builders. Sharpens the set_description instruction: "Lead with the domain if one is clear. 5 words max. Rewrite it whenever your focus shifts." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…INSTRUCTION - Move prompts.ts into daemon/prompts/session.ts alongside existing protocol prompts (build-critic, review-critic, design-*, etc.) - Export DESCRIPTION_INSTRUCTION for protocol prompts to compose in - Honest commit scope: this is refactor + behavioral change (10→5 words, domain-leading, lower rewrite threshold) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: extract session prompts to daemon/prompts.ts

Adds `hydra` CLI for programmatic session management over the Unix socket. Commands: spawn, list, status, kill, health, clear-key. - Idempotency keys prevent duplicate spawns (survives daemon restarts) - Initiator tracked as structured field on SpawnOpts and SessionInfo - @mentions allowFrom users in CLI-spawned threads for auto-join - Kill race fixed: capture key before kill, overwrite after death handler - DEFAULT_SESSION_CHANNEL updated to active server Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fetch helpers now return null on API failure (distinct from [] for success-with-no-items). pollPr() only advances lastCheckedAt when all three comment fetches succeed, so a failed poll cycle retries the same time window instead of permanently skipping comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: hydra CLI — programmatic daemon interface

Re-implements PRs sf8193#39 and sf8193#42 which were merged to `live` (not `main`) and lost in the 2026-06-28 live rebuild. - readAccessFile() now spreads parsed over defaults — new Access fields no longer silently drop - defaultListen on Access and GroupPolicy types - resolveListenState cascade: thread listenOverride → group defaultListen → global defaultListen → false - listen/unlisten commands persist listenOverride to ThreadMetadata so respawned sessions inherit the thread's listen preference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

msg is not in doSpawnSession's scope — chatId is the correct parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ThrottledQueue previously swallowed errors without retry — if ch.setName() threw for any reason, the visual update was permanently lost. Now re-enqueues on failure (up to 3 attempts), preserving original priority and coalescing with any newer value for the same key. Documents the empirically measured Discord shared-scope rate limit on thread renames: under burst conditions, ~2 rapid renames trigger 429 + retry-after ~600s (x-ratelimit-scope: shared). Per-channel vs global scoping unconfirmed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(discord): connectivity-aware resilience via gateway health contract

Discord enforces a shared-scope rate limit on thread renames (~2 per burst window). Mid-protocol turn transitions consumed the budget before completion could land. Thread renames now fire only on outer state changes (spawn, protocol start/end, kill, cancel). Mid-protocol progress moves to thread-visible text — review uses a single live-edited status message (gateway.edit, 5/2s rate limit), build and design embed badges in existing status messages. Design badges use formatPhaseBadge() (single source of truth). A 3-round review goes from 8+ renames to 2 — completion always lands immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ment, per-platform logs, byte revival, safe restart preflight

Typing `! <message>` in a thread sends Escape to the tmux session (interrupting current work), then delivers the message normally. - Uses Bun.spawn array form (no shell, no injection surface) - Adds initiator field to SpawnOpts for CLI compatibility - Resolves type errors from sf8193#53 visual system refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: ThrottledQueue retry on failure + rate limit docs

fix: restore defaultListen + persist listen across respawns

Publishes a live session overview to the bot's Home tab using Block Kit. Auto-updates on session changes (debounced) and periodically every 5 min. Shows status, thread links, description, and age for each live session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implement onReaction in SlackGateway so the existing hocho handler in the daemon router works for Slack, not just Discord. Handles thread parents by deleting children first, reacts ⚠️ when threads contain undeletable messages from other users. Includes bot self-reaction guard for parity with Discord. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reverts the thread-parent delete logic that fetched and deleted all children before retrying the parent. Single-message delete only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Track lastReplyId on sessions from both inbound messages (router) and outbound bot replies (bridge-dispatch). Dashboard and CLI list use gateway.getMessageUrl to build thread-scoped deep links that open Slack to the latest message in the thread panel. Includes debounced persist (2s coalesce), deleted-message cleanup, startup backfill via Slack's latest_reply, and shutdown flush. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add spawn input to Home tab (text field with Enter dispatch) - Show PR watch links as context blocks under each session - Auto-unwatch PRs on merge/close during poll cycle - Backfill PR titles from GitHub on daemon startup - Auth checks on home:spawn and app_home_opened (allowFrom gate) - Deduplicate PR API call in pollPr (pass prData to fetchCheckStatus) - Fix lastReplyId semantic split — standardize on outbound reply ID - Escape mrkdwn injection in session descriptions - Cap block count at 31 to stay under Slack's 100-block view limit - Input max_length: 500 with handler-side truncation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each daemon now writes daemon-{platform}.json alongside the legacy daemon.json during bridge sync. The bridge checks the platform-keyed file first, eliminating the last-writer-wins race when two daemons share a plugin cache. CHAT_PLATFORM is now propagated to spawned sessions and the Slack byte so bridges can resolve the correct file. Fully backwards compatible — old bridges fall through to daemon.json, old daemons still write daemon.json which new bridges accept as legacy fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: platform-keyed daemon config for dual-daemon operation

Add daemon+byte lifecycle commands to the hydra CLI. Platform is always required (no default). - `hydra up <platform>` — validate byte script exists, check for running tmux sessions and orphaned claude processes (prevents gotcha sf8193#32 ping-pong), start daemon, wait for socket, start byte - `hydra down <platform>` — stop byte via stop-byte.sh (orphan cleanup), stop daemon, remove stale socket + PID file - `hydra restart <platform>` — restart daemon only (picks up code changes) No hardcoded platform enum — uses filesystem-based validation (does start-{platform}-byte.sh exist?). New platforms work with zero CLI changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: hydra up/down/restart — CLI lifecycle management

Replaces start-byte-v2.sh (discord) and start-slack-byte.sh with a single platform-agnostic start-byte.sh. The v2 daemon+bridge architecture is now the only architecture — the version suffix was a migration artifact. Old names preserved as thin deprecation wrappers (inject CHAT_PLATFORM, exec start-byte.sh, print notice to stderr). Key changes beyond dedup: - Shared env preamble (env-setup.sh) — PATH, .env sourcing, STATE_DIR in one place, sourced by every script (including preflight.sh) - Strict mode (set -euo pipefail) on all executable scripts - Progressive CHAT_PLATFORM enforcement — refuses to default when multiple platform state dirs exist (N-platform aware) - Auth tokens read from file, not interpolated into command strings - macOS assertion makes the platform contract explicit - Unified log paths: ~/hydra-${CHAT_PLATFORM}-{daemon,byte}.log - Consistent #!/bin/bash shebangs across all scripts - Script architecture documented in README (layering, conventions) - Deprecation wrappers for backwards compat (shell history, worktrees) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: shared preamble, unified byte script, strict mode

…adAt Four fixes to the crash detection category: 1. Remove duplicate inline death check in bridge-server.ts socket.on('end') — checkSessionDeath() already covers this case via setTimeout. The inline copy fired in parallel, producing double death notices. 2. Health poll (daemon.ts) changed from OR to AND — only flag as crashed when BOTH tmux is dead AND bridge is disconnected. Bridge-only disconnects are handled by the bridge-server disconnect handler (3s delay + tmux check). The OR condition false-positived on temporary bridge drops and newly spawned sessions. 3. Health poll now sets info.deadAt + calls registry.persist() + threadRegistry update + refreshSessionVisual(), fully consistent with checkSessionDeath(). 4. 60s spawn grace period — skip crash detection for sessions younger than SPAWN_GRACE_MS (bridge needs time to connect after spawn). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: 60s grace period on crash detection for spawned sessions

DEFAULT_SESSION_CHANNEL was hardcoded to a Discord channel ID as fallback. This is deployment-specific config that belongs in .env, not source code. Now required from .env — daemon warns on startup if missing. Also fixes load order: export moved after .env sourcing so .env values are actually read. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: DEFAULT_SESSION_CHANNEL must read after .env sourcing

Absorb all 10 shell scripts (start-daemon, start-byte, stop-byte, restart-daemon, watchdog, preflight, env-setup, compile-check, kill-orphan-bytes) into the TypeScript CLI as cli/helpers.ts and cli/lifecycle.ts. The CLI entry point (cli/hydra.ts) is now a slim router that delegates to typed lifecycle commands. New commands: hydra watchdog <platform>, hydra preflight <platform>. Shell scripts retained for backward compat until production validated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hydra install <platform> generates a launchd plist with correct paths for the current user, loads it, creates the state dir, and runs preflight. hydra uninstall removes it. Simplifies new user setup to: bun install → create .env → hydra install → hydra up. README rewritten to use CLI commands instead of shell scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hydra install now accepts --cwd and --config-dir flags so users don't need env vars. All executable shell scripts now print a deprecation warning pointing to the CLI equivalent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

If git worktree remove --force fails (e.g. corrupted .git file), fall back to rm -rf but only when the path is inside a .worktrees/ directory to prevent accidental deletion of real repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Splits that land inside a ``` code fence now close the fence at the chunk boundary and reopen it (with the language tag) at the start of the next chunk, so multi-part messages render correctly in Slack and Discord. The new 'markdown' mode is the default; 'length' and 'newline' modes are preserved for back-compat. Split-point preference: paragraph break outside fence > line break outside fence > line break inside fence > space > hard limit. Best-effort avoidance of mid-table-row splits. Reserves 4 chars of headroom for the fence closer so chunks never exceed the stated limit. Progress guard prevents infinite loops when fence overhead >= cut size. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(daemon): add markdown-aware message chunking mode

Transcribe inbound audio attachments (Discord voice notes, Slack audio clips) to text so users can dictate prompts to Claude alongside text and images. Claude has no native audio input, so the daemon transcribes first and merges the result into the message as a [voice transcript] block; the original audio file stays in downloaded_files. - daemon/transcription.ts: audio detection, transcript merging (pure, unit-tested), and an HTTP client to a self-hosted STT sidecar. Failures are logged and skipped — dictation never blocks message delivery. - daemon/router.ts: hook transcription into buildNotificationPayload after attachment download. - transcribe-server/: self-hosted sidecar serving NVIDIA Canary-Qwen 2.5B via NeMo (top of the Open ASR leaderboard for English accuracy), plus a GPU-free mock_server.py for testing the wiring locally. - Off by default; enable with HYDRA_TRANSCRIBE_ENABLED=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make dictation work as a packaged default rather than a manual opt-in: - Daemon transcription is now ON by default ("auto"): it's attempted whenever audio arrives and silently no-ops if no sidecar is reachable (the fetch fails fast). HYDRA_TRANSCRIBE_ENABLED=0 opts out. - start-transcribe.sh: idempotent launcher for the sidecar in a tmux session. Accepts a backend arg (`./start-transcribe.sh mock`) for a zero-GPU end-to-end test; canary backend refuses cleanly until set up. - watchdog.sh: revive the sidecar when HYDRA_TRANSCRIBE_AUTOSTART is set, reusing the same supervision pattern as the bot session. - transcribe-server/setup.sh: one-time venv + NeMo install. - Docs/env updated for the packaged-default flow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pasting a doc line with an inline '# comment' into interactive zsh (which does not strip '#') passed the comment as args, making the launcher try backend '#'. Only accept mock|canary as the positional arg; ignore anything else with a warning and fall back to env/default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Canary-Qwen runs via NeMo and needs a CUDA GPU, so it can't run on Mac. Add a Parakeet-MLX backend (NVIDIA Parakeet TDT on Apple's MLX runtime): native, ~50x realtime on M-series, ~6% English WER, no GPU. - transcribe-server/server_mlx.py — FastAPI server, same /transcribe contract, parakeet-mlx + ffmpeg resample. - transcribe-server/requirements-mlx.txt — light deps (no torch/NeMo). - start-transcribe.sh — `parakeet` backend; default by platform (Darwin -> parakeet, else canary). - setup.sh — installs the right requirements per platform / arg. - Docs updated across README, transcribe-server/README, .env.example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

MLX ships arm64-only wheels, and on Apple Silicon the shell frequently runs under Rosetta (x86_64), where 'pip install mlx' fails with no matching wheel. setup.sh now uses an arm64 Homebrew python@3.12 and builds the venv via 'arch -arm64' for the parakeet backend; uv/system python paths remain for canary. Also relax the parakeet-mlx pin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

tmux does not reliably inherit env, so a PARAKEET_MODEL override in .env never reached the server (it would fall back to the 0.6B default). Forward model-selection vars explicitly, only when set. Enables pinning the smaller 110M Parakeet on constrained networks where the 2.4GB 0.6B is impractical. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tests The module-top-level stub leaked into every test file loaded after it, swallowing bun test's per-test output and final summary for the rest of the suite (bun runs all files in one process). The stub was unnecessary: these tests only exercise pure helpers that never write to stderr. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Voice dictation now comes up with the daemon instead of needing a separate manual step or an explicit opt-in flag: - start-transcribe.sh gains an --auto mode used by every supervisor: explicit HYDRA_TRANSCRIBE_AUTOSTART wins in both directions; when unset it starts the sidecar iff the backend is ready (venv built, or mock chosen), and quietly no-ops otherwise so unconfigured machines don't log a failure every watchdog cycle. Honors HYDRA_TRANSCRIBE_ENABLED=0. - hydra up starts it right after the daemon (model loads while the byte comes up); hydra watchdog revives it each tick; hydra down stops it. - Legacy start-daemon.sh and watchdog.sh call the same --auto path; the watchdog's AUTOSTART=1 opt-in grep is replaced by the shared gate. - mock backend: resolve python3 up front and fall back to /usr/bin/python3 — asdf's shim fails when no python version is pinned for the dir, and mock_server.py is pure stdlib. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Slack voice clips arrive as files with mimetype audio/mp4 (m4a), or audio/webm;codecs=opus from browser recordings — assert detection for those shapes plus the extension fallback. Add transcribeDownloads tests with a stubbed fetch: only audio files are POSTed, sidecar failure skips the file instead of throwing, and the disabled flag short-circuits before the network. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…d 1) Structural: - ONE shared tmux session (hydra-transcribe) for all platform daemons — per-platform sessions raced for the same default port, and the loser reloaded the full model every watchdog tick. hydra down kills it; the other platform's watchdog revives it within a tick. - A crashed server PARKS its session (error on screen + in log) instead of exiting, so supervisors' has-session check holds: broken configs fail once, not as a model-load crash-loop every 120s. - The mock backend is excluded from --auto (manual or explicit AUTOSTART=1 only) — leftover test config must not keep canned transcripts flowing into real prompts. A remote HYDRA_TRANSCRIBE_URL also disables local autostart. Environment/robustness: - Extract only dictation keys from the state-dir .env instead of set -a sourcing the whole file — the model server has no business holding chat bot tokens (they leaked into the tmux server env when this script bootstrapped it). - Forward PATH into the tmux pane (launchd-frozen server env lacks /opt/homebrew/bin, breaking the servers' ffmpeg lookup) and shell-quote every interpolated value (shq) so an embedded quote can't break out of the tmux command string. - URL without an explicit port now binds the scheme default so a mismatch fails visibly instead of the sidecar silently serving a port the daemon never queries. - start-daemon.sh: sidecar refusal no longer fails the whole script under set -e after a successful daemon start. Legacy watchdog runs the sidecar step before the daemon branches' early exits. Daemon: - isAudioFile: a definitive non-audio MIME (video/mp4 screen recording) is no longer re-classified as audio by its extension; only generic types fall back. Codec suffixes (audio/webm;codecs=opus) parsed correctly. - transcribeFile checks size via statSync BEFORE reading — the cap now protects daemon memory, not just sidecar latency. - Tests: env save/restore moved into beforeEach/afterEach (the old describe-body restore ran at collection time and leaked env into later test files); network tests set HYDRA_TRANSCRIBE_ENABLED explicitly; new cases for video-MIME rejection and the pre-network size cap. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- --auto with AUTOSTART unset now also requires BACKEND != mock — with a built venv, leftover BACKEND=mock in .env passed the venv-only gate and auto-supervised canned transcripts, the exact residue case the previous commit claimed to prevent. - .env key extraction now parses like shell sourcing: optional 'export' prefix, quoted values kept verbatim (a # inside quotes is not a comment), unquoted values lose trailing inline comments/whitespace. The grep|cut version kept ' 0 # never' whole, silently defeating explicit opt-outs and erroring per watchdog tick on commented backends. - Document that multi-platform machines must keep dictation config identical across platform .env files (shared session = first supervisor wins). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

kwliang1 force-pushed the feat/voice-dictation-canary-qwen branch from 4bbdf4d to 994ac68 Compare June 28, 2026 22:32

kwliang1 changed the title ~~feat: voice dictation via Canary-Qwen transcription sidecar~~ feat: voice dictation (Canary-Qwen) — packaged default Jun 28, 2026

kwliang1 changed the title ~~feat: voice dictation (Canary-Qwen) — packaged default~~ feat: voice dictation — Parakeet-MLX (macOS) / Canary-Qwen (GPU) Jun 29, 2026

dcetlin and others added 26 commits June 29, 2026 10:30

Merge pull request sf8193#61 from sf8193/refactor/extract-prompts

ac724cd

refactor: extract session prompts to daemon/prompts.ts

feat(discord): connectivity-aware resilience via gateway health contract

449167f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: address Sam's review — input validation + async test sleep

9c3a0c6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request sf8193#47 from sf8193/build/hydra-cli

6614fb7

feat: hydra CLI — programmatic daemon interface

fix: use chatId not msg in resolveListenState call

387bbc7

msg is not in doSpawnSession's scope — chatId is the correct parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request sf8193#51 from sf8193/feat/discord-resilience

88ba52d

feat(discord): connectivity-aware resilience via gateway health contract

Merge pull request sf8193#64 from sf8193/fix/rename-gate-outer-only

bb81eae

feat(ops): operational robustness — dup-main guard, singleton enforce…

12d5cd1

…ment, per-platform logs, byte revival, safe restart preflight

Merge pull request sf8193#50 from sf8193/feat/interrupt-prefix

1316b2c

Merge pull request sf8193#57 from sf8193/feat/operational-robustness

17b4d63

Merge pull request sf8193#62 from sf8193/fix/throttled-queue-retry

26890d0

fix: ThrottledQueue retry on failure + rate limit docs

Merge pull request sf8193#63 from sf8193/fix/restore-default-listen

ef01ed5

fix: restore defaultListen + persist listen across respawns

fix: remove cascading thread delete from hocho reaction

50e0966

Reverts the thread-parent delete logic that fetched and deleted all children before retrying the parent. Single-message delete only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dcetlin and others added 28 commits July 1, 2026 11:28

Merge pull request sf8193#66 from sf8193/feat/platform-keyed-daemon-json

e4c32cb

feat: platform-keyed daemon config for dual-daemon operation

Merge pull request sf8193#67 from sf8193/feat/cli-lifecycle

30b0fbd

feat: hydra up/down/restart — CLI lifecycle management

Merge pull request sf8193#68 from sf8193/refactor/shell-scripts

bee6f2d

refactor: shared preamble, unified byte script, strict mode

Merge pull request sf8193#69 from sf8193/fix/spawn-crash-grace

0522a14

fix: 60s grace period on crash detection for spawned sessions

Merge pull request sf8193#70 from sf8193/fix/config-env-load-order

e868791

fix: DEFAULT_SESSION_CHANNEL must read after .env sourcing

feat: auto-watch PRs mentioned in session replies (sf8193#72)

3f9dfc8

Merge pull request sf8193#71 from sf8193/kevinliang/markdown-chunk-mode

c27b77b

feat(daemon): add markdown-aware message chunking mode

docs: note PARAKEET_MODEL override (110M for constrained networks)

a992304

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

kwliang1 force-pushed the feat/voice-dictation-canary-qwen branch from 6136e07 to 2e5be02 Compare July 3, 2026 08:38

kwliang1 changed the title ~~feat: voice dictation — Parakeet-MLX (macOS) / Canary-Qwen (GPU)~~ feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen) Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen)#1

feat: voice dictation — Slack/Discord audio → text, supervised with the daemon (Parakeet-MLX / Canary-Qwen)#1
kwliang1 wants to merge 56 commits into
mainfrom
feat/voice-dictation-canary-qwen

kwliang1 commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kwliang1 commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Integrated into daemon start (not a separate service to remember)

Slack audio attachments

Packaged default (not opt-in)

Hardening (3-lens adversarial review: engineering / ops blast-radius / security)

Try it now — no GPU

Real model (one-time; needs ffmpeg)

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kwliang1 commented Jun 28, 2026 •

edited

Loading