Skip to content

Hands-free recording mode — auto-stop on silence (VAD-driven) #88

@poodle64

Description

@poodle64

Problem / why

Recording is toggle-only: press to start, press again to stop. Thoth already runs voice-activity detection (webrtc-vad) to trim trailing silence, but there is no "press once and let it end itself" mode. Across current dictation tools, a hands-free VAD mode — press once, recording auto-stops on silence (no second press, no button hold) — has become a standard option alongside push-to-talk/toggle. For short utterances especially, the second press is friction.

What competitors do (research, 2026)

  • Cross-tool theme: hands-free VAD ("press once, auto start/stop on speech/silence") is now table stakes among power-user dictation apps — e.g. Whispering (press once; VAD records on speech, stops on silence) and Handy (Silero VAD). (github.com/epicenter-so/epicenter; github.com/cjpais/handy.)

Value for Thoth

  • Removes the end-of-utterance second press for quick dictations.
  • Reuses an existing building block (VAD is already present for trimming), so it is a mode on top of current machinery rather than new infrastructure.

Scope / non-goals (PRD)

  • A configurable recording mode (toggle vs hands-free/VAD) with silence-timeout sensitivity, sitting alongside the existing toggle mode — not replacing it.
  • The recording/VAD timing path has historically been entangled with tail-truncation, so the PRD must treat auto-stop timing carefully — it must never clip the end of speech. (Note: push-to-talk / hands-free recording modes were previously removed; the PRD should confirm why before re-introducing an auto-stop variant.)

Related

  • Existing VAD silence-trim (docs/product P06/P09). Recording mode is currently Toggle-only.

Test Requirements

To be specified in the PRD. At minimum: Rust tests for the auto-stop trigger logic against silence thresholds, plus a regression guard that auto-stop does not truncate trailing speech.

Acceptance criteria (outcomes)

  • The user can choose a hands-free mode where one press starts recording and silence ends it, without clipping the end of speech, with configurable sensitivity.

Idea capture for a future PRD pass — what & why, not how. No suitable milestone exists yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions