Problem / why
Recording is toggle-only: press to start, press again to stop. Thoth already runs voice-activity detection (webrtc-vad) to trim trailing silence, but there is no "press once and let it end itself" mode. Across current dictation tools, a hands-free VAD mode — press once, recording auto-stops on silence (no second press, no button hold) — has become a standard option alongside push-to-talk/toggle. For short utterances especially, the second press is friction.
What competitors do (research, 2026)
- Cross-tool theme: hands-free VAD ("press once, auto start/stop on speech/silence") is now table stakes among power-user dictation apps — e.g. Whispering (press once; VAD records on speech, stops on silence) and Handy (Silero VAD). (github.com/epicenter-so/epicenter; github.com/cjpais/handy.)
Value for Thoth
- Removes the end-of-utterance second press for quick dictations.
- Reuses an existing building block (VAD is already present for trimming), so it is a mode on top of current machinery rather than new infrastructure.
Scope / non-goals (PRD)
- A configurable recording mode (toggle vs hands-free/VAD) with silence-timeout sensitivity, sitting alongside the existing toggle mode — not replacing it.
- The recording/VAD timing path has historically been entangled with tail-truncation, so the PRD must treat auto-stop timing carefully — it must never clip the end of speech. (Note: push-to-talk / hands-free recording modes were previously removed; the PRD should confirm why before re-introducing an auto-stop variant.)
Related
- Existing VAD silence-trim (docs/product P06/P09). Recording mode is currently Toggle-only.
Test Requirements
To be specified in the PRD. At minimum: Rust tests for the auto-stop trigger logic against silence thresholds, plus a regression guard that auto-stop does not truncate trailing speech.
Acceptance criteria (outcomes)
- The user can choose a hands-free mode where one press starts recording and silence ends it, without clipping the end of speech, with configurable sensitivity.
Idea capture for a future PRD pass — what & why, not how. No suitable milestone exists yet.
Problem / why
Recording is toggle-only: press to start, press again to stop. Thoth already runs voice-activity detection (webrtc-vad) to trim trailing silence, but there is no "press once and let it end itself" mode. Across current dictation tools, a hands-free VAD mode — press once, recording auto-stops on silence (no second press, no button hold) — has become a standard option alongside push-to-talk/toggle. For short utterances especially, the second press is friction.
What competitors do (research, 2026)
Value for Thoth
Scope / non-goals (PRD)
Related
Test Requirements
To be specified in the PRD. At minimum: Rust tests for the auto-stop trigger logic against silence thresholds, plus a regression guard that auto-stop does not truncate trailing speech.
Acceptance criteria (outcomes)
Idea capture for a future PRD pass — what & why, not how. No suitable milestone exists yet.