Voice-driven inline editing & correction by voice ("scratch that", "select the last sentence", "make this a list")

## Problem / why this matters
Thoth inserts dictated text at the cursor, but once it lands, the only way to fix or restructure it is the keyboard. Across the current dictation market, voice-driven editing has moved from a power-user nicety to a baseline expectation: you correct, select, delete and restructure what you just said without touching the keyboard. Thoth has no equivalent — its only voice-driven output control today is the formatting commands ("new paragraph" / "new line") shipped in v2026.6.4. This is the clearest *new* capability gap surfaced by a competitive review (mid-2026).

For how Thoth is actually used here — long-form dictation on a worn lapel mic into AI tools, with run-on sentences and ASR slips on technical terms/proper nouns — keyboard-free correction is high value: it keeps the user in the speak-don't-type flow instead of breaking to fix a mis-heard word or reshape a rambling sentence.

## What competitors do (research, 2026)
- **Talon Voice** — deterministic, fully-local grammar: "select line/word/all", "scratch that" / "nope that" (clear the last phrase), directional selection, re-format the last phrase via formatters. No cloud, no ambiguity; the original "select X / scratch that" model. (talonvoice.com; knausj/community command set.)
- **Aqua Voice / Wispr Flow** — LLM-rewrite approach: say "make this a list", "rephrase that", "redo the second sentence" mid-flow with no command syntax; the model rewrites in place. (aquavoice.com; wisprflow.ai.)
- **macOS Voice Control** (the accessibility path, distinct from Dictation) — a spoken selection/correction/navigation grammar.

Two philosophies for the PRD to weigh: a deterministic local command grammar (Talon-style — predictable, offline, no LLM, fits the privacy-first positioning) vs LLM-rewrite of the last utterance (more natural, needs the enhancement model, higher surprise risk). A hybrid is plausible.

## Value for Thoth
- Extends an axis Thoth just entered (voice formatting commands, v2026.6.4) into genuinely differentiating territory.
- The deterministic-local variant is fully offline and on-brand — a privacy-first tool can offer keyboard-free editing where cloud competitors can't make the same privacy claim.
- Directly improves the operator's primary workflow (hands-free correction during long dictation).

## Scope / non-goals (for the PRD to settle)
- Decide deterministic-grammar vs LLM-rewrite vs hybrid, and whether editing acts on the just-dictated buffer or on arbitrary on-screen text (the latter needs Accessibility selection control — note the privacy/permission analysis already done in #42).
- False-positive handling (a command phrase spoken as content): the v2026.6.4 voice-formatting commands already established a conservative "standalone command only" matching approach worth reusing.
- Not a full hands-free computer-control suite (that is Talon's domain, out of Thoth's "one job, done well" scope).

## Related
- v2026.6.4 voice formatting commands ("new paragraph" / "new line") — the first step on this axis.
- #54 (speech cleanup profiles / quick-switch — shipped) and #42 (accessibility-API context — closed won't-do; its privacy/permission reasoning is directly relevant if editing ever touches on-screen text).

## Test Requirements
To be specified during the PRD/implementation phase. At minimum: Rust unit tests for command detection/parsing and the edit transforms (mirroring the v2026.6.4 voice-command tests in `transcription/filter.rs`), plus frontend tests for any new settings/UI. Conservative-matching tests (a command phrase spoken as content must not trigger an edit) are mandatory.

## Acceptance criteria (outcomes)
- The user can correct or restructure just-dictated text by voice without the keyboard, behind a setting, with predictable behaviour, no destructive surprise edits, and the raw text recoverable.

---
*Idea capture for a future PRD pass — describes what & why, not how. No suitable milestone exists yet; assign once a roadmap milestone is created.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice-driven inline editing & correction by voice ("scratch that", "select the last sentence", "make this a list") #87

Problem / why this matters

What competitors do (research, 2026)

Value for Thoth

Scope / non-goals (for the PRD to settle)

Related

Test Requirements

Acceptance criteria (outcomes)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Voice-driven inline editing & correction by voice ("scratch that", "select the last sentence", "make this a list") #87

Description

Problem / why this matters

What competitors do (research, 2026)

Value for Thoth

Scope / non-goals (for the PRD to settle)

Related

Test Requirements

Acceptance criteria (outcomes)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions