Skip to content

feat: add realtime whisper with OSD preview#176

Merged
goodroot merged 17 commits into
mainfrom
goodroot/whisper-realtime
May 22, 2026
Merged

feat: add realtime whisper with OSD preview#176
goodroot merged 17 commits into
mainfrom
goodroot/whisper-realtime

Conversation

@goodroot

Copy link
Copy Markdown
Owner

Enables the new OpenAI real-time whisper endpoint. And incorporates the deltas into the visualizer. So that you can see your voice transcribe in real-time prior to submitting.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown

Continue PR Review

Automated review via Continue CLI run through opub.

Note: this review is partial because the PR diff was truncated to 120000 bytes from 245558 bytes.

Summary

Partial review only: the provided diff is truncated at 120000 of 245558 bytes, so this does not cover the full PR. Within the visible changes, I found a likely startup crash in mic-osd initialization and a regression where live preview text can be dropped whenever mic-osd is disabled/unavailable even though the realtime backend still emits partials.

Findings

  • lib/mic_osd/main.py: MicOSD.__init__() unconditionally does VISUALIZATIONS["waveform"] as the default fallback. In this PR, failed imports set VISUALIZATIONS = {}; that makes startup raise KeyError before main() can return the intended “Unavailable” message. This turns missing GTK/PyCairo deps into a hard crash instead of graceful degradation. Guard the fallback before indexing, or short-circuit MicOSD construction when _MIC_OSD_IMPORT_ERROR is set.
  • lib/main.py: realtime partial callbacks are always wired via self.whisper_manager.set_realtime_partial_callback(self._set_mic_osd_preview_text), but _set_mic_osd_preview_text() is a no-op unless self._mic_osd_runner exists. That means if mic-osd is disabled/unavailable, partial transcript events are silently discarded. If the websocket client expects the callback to forward or retain partials for any other consumer (capture mode, future UI, logs/tests), this is a behavior regression tied to OSD availability. The callback should either be registered only when preview is supported, or write through an abstraction that is independent of OSD enablement.

Suggested follow-ups

  • Add tests for mic-osd unavailable/missing-dependency startup to verify it exits cleanly without KeyError.
  • Add tests covering realtime partial preview lifecycle: start recording, partial updates, stop/cancel/error cleanup, and behavior when mic_osd_enabled=false.
  • Review the omitted half of the diff before merge; the largest risk here is unreviewed websocket/realtime transcription changes outside the visible OSD plumbing.

@goodroot goodroot merged commit b80a6c5 into main May 22, 2026
2 checks passed
@goodroot goodroot deleted the goodroot/whisper-realtime branch May 22, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant