Scriber

AI-powered speech-to-text workflows for desktop and web.
Live dictation, YouTube transcription, file transcription, transcript management, summaries, and export.

Status • Features • Quick Start • Usage • Architecture • API • Configuration • Development • Troubleshooting

Status

Last verified: 2026-06-01

Scriber is a local-first transcription app with a Python backend, a React web UI, and a legacy Tkinter fallback UI. The current primary runtime is Windows with tray integration, global hotkeys, microphone device monitoring, and local SQLite persistence.

Current implementation highlights:

Live microphone transcription with WebSocket status/audio/transcript events.
YouTube and file transcription with persistent jobs, retry scheduling, and resume support.
Multi-provider STT support, including cloud providers and local ONNX/NeMo paths.
SQLite transcript storage with WAL mode, metadata list loading, pagination, and FTS5 search.
DeviceMonitor for microphone hotplug handling with native Windows endpoint events where available and polling fallback.
Recording-aware PortAudio refresh: device refreshes are deferred while a recording stream is active and run once after the stream becomes idle.
Short-lived microphone device-resolution cache for selected/favorite mic lookup.
Route-level frontend lazy loading for non-default pages and a single shared WebSocket connection.

Known limits:

SCRIBER_MIC_ALWAYS_ON exists as a setting, but it is not a true app-level always-on/prewarmed microphone stream yet. Per-session streams are closed during cleanup to avoid orphaned PortAudio resources.
Frontend transcript-list virtualization/infinite loading is still open.
Vite production build can still warn about an initial chunk over 500 kB; manual vendor chunking is still open.
Some upload preprocessing and export generation still run synchronously in async request paths.
Very long live sessions can still hit O(n^2)-style string growth when appending final transcript chunks.

Features

Live Microphone Dictation

Global hotkey, default Ctrl+Alt+S.
Modes:
- toggle: press once to start, press again to stop.
- push_to_talk: record while the hotkey is held.
Live WebSocket events for state, status, audio level, warnings, transcripts, session lifecycle, history updates, and errors.
Favorite microphone selection with fallback to selected/default device.
Device hotplug detection via DeviceMonitor.
Low input-level warning flow for muted/quiet microphones.
Recording overlay with preparing/recording/transcribing states.
Text injection into the active app through auto, sendinput, paste, or type.

YouTube Transcription

YouTube search and video lookup through the YouTube Data API.
Download and audio extraction through yt-dlp and ffmpeg.
Persistent job lifecycle with retry/resume support.
Transcript entries are saved as youtube records.

File Transcription

Multipart upload through POST /api/file/transcribe.
Supported audio formats: .mp3, .wav, .m4a, .flac, .aac, .ogg.
Supported video formats: .mp4, .mov, .webm, .avi, .mkv, .m4v.
Video audio extraction through ffmpeg.
Default audio upload limit: 200 MB.
Raw video upload hard limit: 2048 MB.
Extracted/compressed audio is limited by the final audio/provider limit.

STT Providers

Provider coverage includes:

Soniox realtime and async
Mistral realtime and async
AssemblyAI Universal-3-Pro async
Deepgram
OpenAI
Azure Speech
Azure MAI Transcribe
Gladia
Groq
Speechmatics
ElevenLabs
Google
AWS Transcribe
Smallest
ONNX local models
NeMo local models

Provider routing, retry scheduling, and circuit-breaker logic exist in the backend. Verify provider-specific behavior in code before changing a provider contract.

Transcript Management

SQLite persistence in transcripts.db.
Transcript list pagination with offset/limit.
Type filtering by mic, youtube, or file.
FTS5-backed search.
Detail view with full content and summary.
Delete, cancel, summarize, export.
Export as PDF or DOCX.
Optional automatic summarization after job completion.

Local Models

ONNX model list, download, status, delete.
Quantization options: int8, fp16, fp32.
Optional ONNX GPU flag.
NeMo model list, download, delete.

Screenshots

Live Mic

YouTube

File Upload

Transcript Detail

Settings

Quick Start

Prerequisites

Python 3.10+
Node.js 20+ for the web UI
ffmpeg available on PATH for YouTube/file audio extraction
Windows recommended for tray, global hotkey, overlay, and microphone device monitoring

Windows

git clone https://github.com/MyButtermilk/Scriber.git
cd Scriber
start.bat

start.bat handles:

Python check
virtual environment setup
dependency installation when needed
initial .env creation if missing
tray/web startup when Node and Frontend/ are available
Tkinter fallback when the web UI cannot be started
backend health check at http://127.0.0.1:8765/api/health
browser open at http://localhost:5000

Linux/macOS

./start.sh

The shell script sets up dependencies and starts the Tkinter path. The full tray/hotkey/device-monitor experience is Windows-focused.

Manual Backend and Frontend

# Backend only
python -m src.web_api

# Frontend client only
cd Frontend
npm install
npm run dev:client

# Frontend Express/Vite dev host
cd Frontend
npm run dev

# Frontend production build and start
cd Frontend
npm run build
npm start

Default URLs:

Backend: http://127.0.0.1:8765
Web UI: http://localhost:5000
WebSocket: ws://127.0.0.1:8765/ws

Additional entrypoints:

python -m src.tray
python -m src.main

Usage

Web Routes

/: Live Mic
/youtube: YouTube transcription
/file: File transcription
/transcript/:id: Transcript detail
/settings: Settings

Live Mic

Select the STT provider and microphone in Settings.
Optional: set a favorite microphone. It is preferred when available.
Start from the UI or with the configured hotkey.
Wait for the overlay/state to switch from preparing to recording before speaking.
Stop recording through UI or hotkey.
The final transcript is saved as a mic entry and can be summarized/exported.

Important microphone behavior:

DeviceMonitor keeps the frontend microphone list updated after USB/dock changes.
PortAudio refresh is deferred during active recording to avoid native races.
Mic selection is cached briefly to avoid repeated device scans on consecutive starts.
SCRIBER_MIC_ALWAYS_ON=1 does not yet keep a reusable app-level mic stream alive.

YouTube

Set YOUTUBE_API_KEY.
Search or paste a video URL/ID.
Start transcription.
Track job progress in the UI and transcript history.

File Upload

Open /file.
Drop or select an audio/video file.
The backend validates size/type, extracts audio for videos, and starts a transcription job.
Results appear in transcript history.

Settings

The backend settings API manages:

hotkey and recording mode
STT provider and provider-specific models
language
microphone and favorite microphone
injection method
API keys
ONNX/NeMo local models
summarization model, prompt, and auto-summary setting
visualizer bar count

AWS credentials are not fully managed through apiKeys; use the standard AWS environment variables.

Architecture

flowchart LR
    User["Browser / Hotkey / Tray"] -->|"HTTP + WebSocket"| Backend["Python Backend\nsrc.web_api"]
    Backend --> Controller["ScriberWebController"]
    Controller --> Pipeline["ScriberPipeline\nProviderRouter"]
    Controller --> DB[("SQLite\ntranscripts.db")]
    Controller --> Jobs["JobStore\nRetryScheduler"]
    Controller --> Monitor["DeviceMonitor\nMic Resolution Cache"]
    Pipeline --> Providers["STT Providers\nCloud + Local"]
    Pipeline --> Mic["MicrophoneInput\nsounddevice"]
    Backend <--> Frontend["React UI\nFrontend/client"]

Runtime Paths

Live Mic:
- POST /api/live-mic/start|stop|toggle
- microphone stream
- Pipecat/STT pipeline
- WebSocket events
- transcript persistence
- optional text injection
YouTube:
- YouTube Data API lookup
- yt-dlp download
- ffmpeg audio extraction
- STT pipeline/direct provider path
- job persistence and retry/resume
File:
- multipart upload
- size/type validation
- optional ffmpeg extraction/compression
- STT pipeline/direct provider path
- transcript persistence
Frontend:
- REST for commands and data
- single shared WebSocket for live events
- React Query for server state

Backend Modules

src/web_api.py: REST, WebSocket, settings, jobs, transcript API.
src/pipeline.py: provider creation, STT pipeline, analyzer cache, mic resolution.
src/microphone.py: sounddevice transport and audio callback.
src/audio_devices.py: deduplication, host API priority, compatibility.
src/device_monitor.py: hotplug detection and PortAudio refresh.
src/database.py: SQLite persistence and FTS.
src/runtime/: provider router and retry scheduler.
src/core/: state machine, circuit breaker, error taxonomy, event contracts, tracing.

Frontend Architecture

Vite 7 + React 19 + TypeScript.
Wouter routing.
TanStack Query for API data.
Single WebSocketProvider.
LiveMic is eagerly loaded for the default route.
YouTube, File, Settings, TranscriptDetail, and NotFound are lazy-loaded chunks.
Tailwind v4 CSS-first setup through Frontend/client/src/index.css.
Radix/shadcn-style primitives and existing neumorphic classes.

API

System

GET /api/health
GET /api/state
GET /api/metrics/hot-path?limit=n

limit for hot-path metrics is clamped to 1..500.

WebSocket

GET /ws

Core event types:

state
status
transcript
audio_level
input_warning
transcribing
session_started
session_finished
history_updated
error

Live Mic

POST /api/live-mic/start
POST /api/live-mic/stop
POST /api/live-mic/toggle

Transcripts

GET /api/transcripts?offset=0&limit=50&type={mic|youtube|file}&q={query}
GET /api/transcripts/{id}
DELETE /api/transcripts/{id}
POST /api/transcripts/{id}/summarize
POST /api/transcripts/{id}/cancel
GET /api/transcripts/{id}/export/{format}

limit defaults to 50 and is clamped to 1..100. Export format is pdf or docx.

YouTube

GET /api/youtube/search?q={query}&maxResults={n}&pageToken={token}
GET /api/youtube/video?id={id}
GET /api/youtube/video?url={url}
POST /api/youtube/transcribe

File

POST /api/file/transcribe

Expected body: multipart/form-data with field file.

Settings, Devices, Autostart

GET /api/settings
PUT /api/settings
GET /api/microphones
GET /api/autostart
POST /api/autostart

Local Models

GET /api/onnx/models
GET /api/onnx/models/{model_id}
POST /api/onnx/download
DELETE /api/onnx/models/{model_id}
GET /api/nemo/models
POST /api/nemo/download
DELETE /api/nemo/models/{model_id}

ONNX model status/delete can use an optional quantization query parameter.

Configuration

Configuration is loaded from environment variables and .env. Multi-line summarization prompt state can also be stored in settings.json.

Do not commit .env, settings.json, transcripts.db, downloads/, or generated local artifacts.

Web/API

SCRIBER_WEB_HOST=127.0.0.1
SCRIBER_WEB_PORT=8765
SCRIBER_ALLOWED_ORIGINS=

Default CORS allows localhost, 127.0.0.1, and ::1. SCRIBER_ALLOWED_ORIGINS=* allows all origins.

Frontend

VITE_BACKEND_URL=http://127.0.0.1:8765
PORT=5000

Recording and Provider Selection

SCRIBER_HOTKEY=ctrl+alt+s
SCRIBER_MODE=toggle
SCRIBER_DEFAULT_STT=soniox
SCRIBER_STT_FALLBACKS=
SCRIBER_LANGUAGE=auto
SCRIBER_DEBUG=0
SCRIBER_CUSTOM_VOCAB=

Provider Models

SCRIBER_SONIOX_MODE=realtime
SCRIBER_SONIOX_ASYNC_MODEL=stt-async-v4
SCRIBER_SONIOX_RT_MODEL=stt-rt-v4
SCRIBER_MISTRAL_RT_MODEL=voxtral-mini-transcribe-realtime-2602
SCRIBER_MISTRAL_ASYNC_MODEL=voxtral-mini-2602
SCRIBER_OPENAI_STT_MODEL=gpt-4o-mini-transcribe-2025-12-15
SCRIBER_AZURE_MAI_REGION=northeurope

Microphone and Injection

SCRIBER_MIC_DEVICE=default
SCRIBER_FAVORITE_MIC=
SCRIBER_MIC_ALWAYS_ON=0
SCRIBER_MIC_BLOCK_SIZE=512
SCRIBER_MIC_DEVICE_CACHE_TTL_SEC=10.0
SCRIBER_MIC_LOW_RMS_THRESHOLD=0.001
SCRIBER_MIC_LOW_RMS_CLEAR_THRESHOLD=0.0025
SCRIBER_MIC_LOW_RMS_WARN_AFTER_SECS=6.0
SCRIBER_INJECT_METHOD=auto
SCRIBER_PASTE_PRE_DELAY_MS=80
SCRIBER_PASTE_RESTORE_DELAY_MS=1500

SCRIBER_MIC_ALWAYS_ON is currently not a real persistent prewarm stream. Leave it off unless you are testing the surrounding setting flow.

Uploads, Jobs, Timeouts

SCRIBER_UPLOAD_MAX_MB=200
SCRIBER_UPLOAD_MAX_BYTES=
SCRIBER_DOWNLOADS_DIR=downloads
SCRIBER_JOB_MAX_ATTEMPTS=3
SCRIBER_JOB_RETRY_BASE_SEC=5
SCRIBER_JOB_RETRY_MAX_SEC=120
SCRIBER_TIMEOUT_FILE_TRANSCRIBE_SEC=600
SCRIBER_TIMEOUT_YOUTUBE_TRANSCRIBE_SEC=600
SCRIBER_TIMEOUT_YOUTUBE_DOWNLOAD_SEC=300

Circuit Breaker and Diagnostics

SCRIBER_BREAKER_FAILURE_THRESHOLD=3
SCRIBER_BREAKER_COOLDOWN_SEC=30
SCRIBER_VALIDATE_WS_CONTRACTS=0
SCRIBER_HOTKEY_DISPATCH_DEBOUNCE_SEC=0.25
SCRIBER_LOG_STDERR=1

Summarization

SCRIBER_SUMMARIZATION_MODEL=gemini-flash-latest
SCRIBER_AUTO_SUMMARIZE=0
SCRIBER_SUMMARY_MIN_WORDS=180
SCRIBER_SUMMARY_MAX_WORDS=2200
SCRIBER_SUMMARIZATION_PROMPT=...

Current default summarization model: gemini-flash-latest.

API Keys

SONIOX_API_KEY=...
MISTRAL_API_KEY=...
ASSEMBLYAI_API_KEY=...
DEEPGRAM_API_KEY=...
OPENAI_API_KEY=...
AZURE_SPEECH_KEY=...
AZURE_SPEECH_REGION=...
GLADIA_API_KEY=...
GROQ_API_KEY=...
SPEECHMATICS_API_KEY=...
ELEVENLABS_API_KEY=...
GOOGLE_API_KEY=...
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
YOUTUBE_API_KEY=...

AWS uses standard SDK environment variables:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=...

Local Models and UI

SCRIBER_ONNX_MODEL=nemo-parakeet-tdt-0.6b-v3
SCRIBER_ONNX_QUANTIZATION=int8
SCRIBER_ONNX_USE_GPU=0
SCRIBER_NEMO_MODEL=parakeet-primeline
SCRIBER_VISUALIZER_BAR_COUNT=60

Development

Backend Commands

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python check_imports.py
python -m src.web_api

Frontend Commands

cd Frontend
npm install
npm run dev:client
npm run check
npm run build
npm start

Do not run npm run dev:client and npm run dev at the same time on the default port.

Tests

pytest
pytest tests/test_device_monitor.py
pytest tests/test_microphone_device_resolution.py tests/test_microphone_callback.py
pytest tests/test_web_api_security.py::test_origin_allowed_defaults
pytest -k origin_allowed

Current test layout includes backend, runtime, core, data, contract, and perf tests under tests/.

Useful focused tests:

Device monitor and mic selection:
- pytest tests/test_device_monitor.py tests/test_microphone_device_resolution.py
Microphone callback/channel handling:
- pytest tests/test_microphone_channel_selection.py tests/test_microphone_callback.py
Pipeline lifecycle:
- pytest tests/test_pipeline_stop.py tests/test_web_api_lifecycle.py
WebSocket contracts:
- pytest tests/contract/test_ws_events.py
Provider routing/circuit breaker:
- pytest tests/runtime/test_provider_router.py tests/core/test_provider_circuit_breaker.py

Quality Checks

python -m py_compile src\microphone.py src\pipeline.py src\web_api.py
git diff --check

cd Frontend
npm run check
npm run build

Project Structure

Scriber/
├── src/
│   ├── web_api.py                  # aiohttp REST + WebSocket API
│   ├── pipeline.py                 # STT pipeline and provider factory
│   ├── microphone.py               # sounddevice input transport
│   ├── audio_devices.py            # mic normalization/dedup/compatibility
│   ├── device_monitor.py           # hotplug detection and PortAudio refresh
│   ├── audio_file_input.py         # ffmpeg file input transport
│   ├── config.py                   # env + settings.json configuration
│   ├── database.py                 # SQLite persistence and FTS
│   ├── injector.py                 # text injection
│   ├── summarization.py            # Gemini/OpenAI summaries
│   ├── youtube_api.py              # YouTube Data API
│   ├── youtube_download.py         # yt-dlp + ffmpeg extraction
│   ├── export.py                   # PDF/DOCX export
│   ├── overlay.py                  # recording overlay
│   ├── tray.py                     # tray lifecycle
│   ├── main.py                     # Tkinter fallback
│   ├── core/                       # state, contracts, tracing, breakers
│   ├── data/                       # job and metrics stores
│   └── runtime/                    # provider routing and retry scheduling
├── Frontend/
│   ├── client/                     # React app
│   ├── server/                     # Express/Vite host
│   └── shared/                     # shared TS schema/types
├── tests/                          # pytest suite
├── docs/                           # architecture and status docs
├── start.bat
├── start.sh
├── requirements.txt
└── README.md

Troubleshooting

Backend does not start

Run:

python -m src.web_api

Then check latest.log / structured logs if present. Also run:

python check_imports.py

Web UI does not load

Check:

backend health: http://127.0.0.1:8765/api/health
frontend port: http://localhost:5000
VITE_BACKEND_URL if backend host/port is customized
CORS via SCRIBER_ALLOWED_ORIGINS

No microphone appears

Check:

GET /api/microphones
Windows microphone privacy settings
selected/favorite mic in Settings
dock/USB reconnect

The DeviceMonitor should pick up hotplug changes. During active recording, PortAudio refresh is intentionally deferred until after stop.

Favorite microphone is not used

Confirm the device label in GET /api/microphones.
Clear or update SCRIBER_FAVORITE_MIC.
Device resolution is cached briefly; changing mic settings or hotplug events invalidate the cache.

First words are cut off

Wait until the overlay/state switches from preparing to recording.
SCRIBER_MIC_ALWAYS_ON is not true app-level prewarming yet.
Check docs/Mic-Performance-Enhancement.md for current mic latency status.

YouTube transcription fails

Set YOUTUBE_API_KEY.
Verify yt-dlp and ffmpeg availability.
Check timeout settings and provider API keys.

File upload fails

Verify extension and size limits.
For video, ensure ffmpeg can extract audio.
Check provider-specific upload limits in backend logs/settings.

Local models are missing

Check ONNX/NeMo dependencies.
Use the Settings UI or /api/onnx/models and /api/nemo/models.
Ensure model directories are writable.

Roadmap / Open Engineering Work

Real app-level microphone prewarming for SCRIBER_MIC_ALWAYS_ON.
Frontend transcript-list virtualization or infinite query.
Vite manual vendor chunking for smaller initial chunks.
WebSocket no-client fast path before JSON serialization and task scheduling.
Background/off-thread upload preprocessing and export generation.
O(n^2) live transcript content append behavior in very long sessions.
More hardware regression tests for dock/USB mic add/remove and favorite fallback.
Stronger typed API contract between backend and frontend.
Smaller backend modules by splitting src/web_api.py into domains.

License

MIT license metadata is used by the project. A standalone root LICENSE file is not currently present.

Efficient, resumable, multi-provider speech-to-text workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.claude		.claude
.vscode		.vscode
Frontend		Frontend
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.mcp.json		.mcp.json
.tmp_readme_subtasks.csv		.tmp_readme_subtasks.csv
AGENTS.md		AGENTS.md
AntigravityBugs.md		AntigravityBugs.md
BUGS.md		BUGS.md
CODE_REVIEW.md		CODE_REVIEW.md
OPTIMIZATIONS_IMPLEMENTED.md		OPTIMIZATIONS_IMPLEMENTED.md
PERFORMANCE_ANALYSIS.md		PERFORMANCE_ANALYSIS.md
README.md		README.md
TRANSCRIPTION_SPEED_IMPROVEMENTS.md		TRANSCRIPTION_SPEED_IMPROVEMENTS.md
check_imports.py		check_imports.py
cleanup_soniox.py		cleanup_soniox.py
frontend.md		frontend.md
improvements.md		improvements.md
latest.structured.jsonl		latest.structured.jsonl
list_mics.py		list_mics.py
performance.md		performance.md
pipeline.md		pipeline.md
requirements.txt		requirements.txt
settings.json		settings.json
start.bat		start.bat
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

Scriber

Scriber

Status

Features

Live Microphone Dictation

YouTube Transcription

File Transcription

STT Providers

Transcript Management

Local Models

Screenshots

Live Mic

YouTube

File Upload

Transcript Detail

Settings

Quick Start

Prerequisites

Windows

Linux/macOS

Manual Backend and Frontend

Usage

Web Routes

Live Mic

YouTube

File Upload

Settings

Architecture

Runtime Paths

Backend Modules

Frontend Architecture

API

System

WebSocket

Live Mic

Transcripts

YouTube

File

Settings, Devices, Autostart

Local Models

Configuration

Web/API

Frontend

Recording and Provider Selection

Provider Models

Microphone and Injection

Uploads, Jobs, Timeouts

Circuit Breaker and Diagnostics

Summarization

API Keys

Local Models and UI

Development

Backend Commands

Frontend Commands

Tests

Quality Checks

Project Structure

Troubleshooting

Backend does not start

Web UI does not load

No microphone appears

Favorite microphone is not used

First words are cut off

YouTube transcription fails

File upload fails

Local models are missing

Roadmap / Open Engineering Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages