You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thoth inserts dictated text at the cursor, but once it lands, the only way to fix or restructure it is the keyboard. Across the current dictation market, voice-driven editing has moved from a power-user nicety to a baseline expectation: you correct, select, delete and restructure what you just said without touching the keyboard. Thoth has no equivalent — its only voice-driven output control today is the formatting commands ("new paragraph" / "new line") shipped in v2026.6.4. This is the clearest new capability gap surfaced by a competitive review (mid-2026).
For how Thoth is actually used here — long-form dictation on a worn lapel mic into AI tools, with run-on sentences and ASR slips on technical terms/proper nouns — keyboard-free correction is high value: it keeps the user in the speak-don't-type flow instead of breaking to fix a mis-heard word or reshape a rambling sentence.
What competitors do (research, 2026)
Talon Voice — deterministic, fully-local grammar: "select line/word/all", "scratch that" / "nope that" (clear the last phrase), directional selection, re-format the last phrase via formatters. No cloud, no ambiguity; the original "select X / scratch that" model. (talonvoice.com; knausj/community command set.)
Aqua Voice / Wispr Flow — LLM-rewrite approach: say "make this a list", "rephrase that", "redo the second sentence" mid-flow with no command syntax; the model rewrites in place. (aquavoice.com; wisprflow.ai.)
macOS Voice Control (the accessibility path, distinct from Dictation) — a spoken selection/correction/navigation grammar.
Two philosophies for the PRD to weigh: a deterministic local command grammar (Talon-style — predictable, offline, no LLM, fits the privacy-first positioning) vs LLM-rewrite of the last utterance (more natural, needs the enhancement model, higher surprise risk). A hybrid is plausible.
Value for Thoth
Extends an axis Thoth just entered (voice formatting commands, v2026.6.4) into genuinely differentiating territory.
The deterministic-local variant is fully offline and on-brand — a privacy-first tool can offer keyboard-free editing where cloud competitors can't make the same privacy claim.
Directly improves the operator's primary workflow (hands-free correction during long dictation).
Scope / non-goals (for the PRD to settle)
Decide deterministic-grammar vs LLM-rewrite vs hybrid, and whether editing acts on the just-dictated buffer or on arbitrary on-screen text (the latter needs Accessibility selection control — note the privacy/permission analysis already done in Implement accessibility API context for enhancement #42).
False-positive handling (a command phrase spoken as content): the v2026.6.4 voice-formatting commands already established a conservative "standalone command only" matching approach worth reusing.
Not a full hands-free computer-control suite (that is Talon's domain, out of Thoth's "one job, done well" scope).
Related
v2026.6.4 voice formatting commands ("new paragraph" / "new line") — the first step on this axis.
To be specified during the PRD/implementation phase. At minimum: Rust unit tests for command detection/parsing and the edit transforms (mirroring the v2026.6.4 voice-command tests in transcription/filter.rs), plus frontend tests for any new settings/UI. Conservative-matching tests (a command phrase spoken as content must not trigger an edit) are mandatory.
Acceptance criteria (outcomes)
The user can correct or restructure just-dictated text by voice without the keyboard, behind a setting, with predictable behaviour, no destructive surprise edits, and the raw text recoverable.
Idea capture for a future PRD pass — describes what & why, not how. No suitable milestone exists yet; assign once a roadmap milestone is created.
Problem / why this matters
Thoth inserts dictated text at the cursor, but once it lands, the only way to fix or restructure it is the keyboard. Across the current dictation market, voice-driven editing has moved from a power-user nicety to a baseline expectation: you correct, select, delete and restructure what you just said without touching the keyboard. Thoth has no equivalent — its only voice-driven output control today is the formatting commands ("new paragraph" / "new line") shipped in v2026.6.4. This is the clearest new capability gap surfaced by a competitive review (mid-2026).
For how Thoth is actually used here — long-form dictation on a worn lapel mic into AI tools, with run-on sentences and ASR slips on technical terms/proper nouns — keyboard-free correction is high value: it keeps the user in the speak-don't-type flow instead of breaking to fix a mis-heard word or reshape a rambling sentence.
What competitors do (research, 2026)
Two philosophies for the PRD to weigh: a deterministic local command grammar (Talon-style — predictable, offline, no LLM, fits the privacy-first positioning) vs LLM-rewrite of the last utterance (more natural, needs the enhancement model, higher surprise risk). A hybrid is plausible.
Value for Thoth
Scope / non-goals (for the PRD to settle)
Related
Test Requirements
To be specified during the PRD/implementation phase. At minimum: Rust unit tests for command detection/parsing and the edit transforms (mirroring the v2026.6.4 voice-command tests in
transcription/filter.rs), plus frontend tests for any new settings/UI. Conservative-matching tests (a command phrase spoken as content must not trigger an edit) are mandatory.Acceptance criteria (outcomes)
Idea capture for a future PRD pass — describes what & why, not how. No suitable milestone exists yet; assign once a roadmap milestone is created.