Skip to content

feat: Audio (Voice) mode + model routing & selection redesign#409

Open
alichherawalla wants to merge 138 commits into
feat/pro-feature-registryfrom
feat/audio-mode-pro
Open

feat: Audio (Voice) mode + model routing & selection redesign#409
alichherawalla wants to merge 138 commits into
feat/pro-feature-registryfrom
feat/audio-mode-pro

Conversation

@alichherawalla

Copy link
Copy Markdown
Collaborator

Summary

This branch builds Audio (Voice) mode end to end and reworks how models are routed, loaded, and selected across the app. The audio feature lives entirely in the private pro submodule behind a slot/hook seam, so free builds keep their default behaviour and never link pro code.

It is the full feat/audio-mode-pro line of work, branched off feat/pro-feature-registry (137 commits).

What's in it

Voice mode (pro submodule, behind the slot/hook seam)

  • Pluggable TTS engine interface with a Kokoro adapter; OuteTTS was dropped to keep a single, well-supported engine. Adding an engine requires zero changes to core UI/stores.
  • Streaming TTS: the assistant message is synthesized and played sentence-by-sentence as it streams, with thinking content never spoken.
  • Audio message UI: waveform + an overlaid seekbar with a cumulative (monotonic) playback position across streamed sentences, real-time speed control, and a warm reusable audio context to avoid start-up lag.
  • Chat/Audio mode toggle inlined into the chat input; chat-mode long-press "Select text" for partial copy.
  • RevenueCat init is isolated so pro features still load when billing is unavailable, with entitlement gating covered by tests.

Model routing & residency

  • A residency manager is now authoritative for model memory (memory budget + eviction); the old fast/optimised loading-strategy setting was removed.
  • Text-vs-image routing via a SmolLM2 classifier (auto-provisioned), with the last text model remembered and preloaded on chat open.
  • Models load inline instead of taking over the full screen; selected models are warmed at boot in order (text → image → TTS → STT).

Models UX redesign

  • Home screen: the per-type model cards collapse into one Models control (Text/Image/Voice/Speech captioned icons) that opens a shared manager sheet and per-type pickers. The card gets the standard surface + shadow treatment.
  • Chat header: a generic "Models" selector opening the same shared manager sheet, instead of the inline model name + image badge.
  • New Voice Models tab (pro panel + non-pro upsell) and a multi-model Transcription (speech-to-text) tab (select/download/delete per model).
  • Download Manager surfaces voice + transcription models under a single Voice Models filter.
  • Settings: the standalone Voice Transcription and Text to Speech settings screens were removed (their controls moved into the model tabs / are always-on).

Housekeeping

  • Shared model sheets moved to src/components/models/ so home and chat use one implementation.
  • knip added (tuned config) and the dead code it surfaced removed.
  • Screen tests updated to the redesigned UI; added coverage for streaming TTS, multi-model transcription, voice routing, and the new model controls.

Pro stays private

pro is a git submodule (offgrid-pro). This repo only tracks its commit pointer — no pro source is committed here. None of the commits in this PR modify the pro gitlink or .gitmodules.

Verification

  • Full JS/TS suite green (5,629 passing), tsc --noEmit clean, eslint clean.
  • The Android ggml-hexagon/*.so binaries are intentionally left unstaged.

alichherawalla and others added 30 commits April 7, 2026 16:41
Implements on-device text-to-speech using OuteTTS 0.3 (454 MB) +
WavTokenizer (73 MB) via llama.rn, with react-native-audio-api for playback.

Two interface modes (user-switchable from Settings):
- Chat Mode: play/stop TTSButton on each assistant message bubble
- Audio Mode: waveform bubbles with auto-TTS after streaming, transcript expand,
  speed cycling, and PCM audio persisted to disk per message for repeat playback

New files:
- src/constants/ttsModels.ts — model URLs, RAM thresholds, cache config
- src/services/ttsService.ts — download, load, generate, persist, play
- src/stores/ttsStore.ts — Zustand store with Chat + Audio Mode actions
- src/hooks/useTTS.ts — convenience hook with RAM gate and weighted progress
- src/components/TTSButton/index.tsx — Chat Mode play/stop per message
- src/components/AudioMessageBubble/index.tsx — waveform bubble component
- src/screens/TTSSettingsScreen/index.tsx — download, mode, speed, cache

Modified:
- Message type: audioPath, waveformData, audioDurationSeconds, isGeneratingAudio
- ChatMessage: Audio Mode branch + TTSButton in meta row
- SettingsScreen: Text to Speech nav row
- Navigation: TTSSettings route
- stores/index.ts, services/index.ts: exports

Tests: 42 unit + integration tests covering service, store, and full flows

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Revert ChatMessage to main (avoids pre-existing complexity lint failure
  when the file enters the push-range diff)
- Add Audio Mode + TTSButton to MessageRenderer instead — clean, under limit
- Move audioPath/waveformData/audioDurationSeconds/isGeneratingAudio fields
  from types/index.ts to types/tts.ts via module augmentation (keeps index.ts
  under the 350-line max)
- Add react-native-audio-api global mock to jest.setup.ts so all test suites
  that transitively import ttsService can resolve the native module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In finalizeStreamingMessage, after addMessage() saves the assistant reply,
check if Audio Mode is active and model is loaded — if so, fire
useTTSStore.generateAndSave() in the background so the waveform bubble
auto-generates instead of spinning indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, TTSButton placement

Critical fixes for TTS Audio Mode:

- Add updateMessageAudio() to chatStore — writes audioPath, waveformData,
  audioDurationSeconds, isGeneratingAudio back to the conversation message
  (without this, the waveform bubble spun forever after generation)

- Wire auto-TTS trigger in useChatScreen via useEffect on isStreamingForThisConversation:
  detects streaming → stopped, checks Audio Mode + model loaded, calls
  triggerAudioModeGeneration() which sets isGeneratingAudio:true, fires
  generateAndSave, then writes audio fields or clears the flag on error

- Fix isGenerating logic: show spinner only when isGeneratingAudio===true,
  not for every assistant message missing audioPath (which made all old
  messages spin forever in Audio Mode)

- Fix TTSButton placement: add metaExtra prop to ChatMessage/MessageMetaRow
  so TTSButton renders inline in the timestamp row rather than below the bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Voice row (volume icon + Chat/Audio/N/A badge) to the quick
settings popover in the chat input. Tapping it:
- Toggles between Chat and Audio mode when models are downloaded
- Auto-loads/unloads the TTS model on switch
- Navigates to TTSSettings when models are not yet downloaded

This makes Audio Mode accessible without leaving the chat screen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ChatInput test mock for src/stores was missing useTTSStore, causing
Popovers.tsx (which now uses useTTSStore) to throw on render.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. checkDownloadStatus() never called on TTSSettingsScreen mount
   → store always showed models as not downloaded after fresh app start

2. speak() race condition: stop() during generation didn't prevent playback
   → set isSpeakingFlag=true before generate(), check it after, use finally

3. RNFS.stat() on directory reports block size (~0), not total file size
   → replaced with readDir() recursive sum of individual .pcm file sizes

4. Historical messages without audio showed broken play button in Audio Mode
   → AudioMessageBubble only rendered when msg.audioPath || msg.isGeneratingAudio

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaced stat() mock with readDir() mocks matching the new recursive
file-size summation approach.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces slider controls with a [–] value [+] stepper row for
precise numeric input in settings screens. Supports min/max/step,
optional decimal formatting, and testID for E2E automation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes @react-native-community/slider from GenerationSettingsModal,
ModelSettingsScreen, and TTSSettingsScreen. Every numeric control
(temperature, top-p, GPU layers, speed, etc.) now uses the stepper
for touch-friendly precise adjustment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MediaAttachment gains audioFormat and audioDurationSeconds fields
- audioRecorderService.stopRecording() now returns { path, durationSeconds }
  instead of just the path, enabling accurate audio bubble scrubbing
- ChatInput/Attachments.addAudioAttachment stores the duration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…send

In Audio Mode, user voice recordings now appear as right-aligned audio
bubbles instead of text messages, making both sides of the conversation
audio-native.

- Voice.ts: adds file-based transcription path (audioRecorderService +
  whisperService.transcribeFile) and onAutoSend callback for atomic send
  with audio attachment. Multimodal models skip transcription entirely.
- ChatInput: passes onAutoSend in Audio Mode; builds MediaAttachment
  inline to avoid async state-update race; uses attachmentsRef for sync reads.
- AudioMessageBubble: adds isUser prop for right-aligned primary-tinted style.
- MessageRenderer: renders user audio attachments as AudioMessageBubble
  before the normal message path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The streaming-complete useEffect only listed isStreamingForThisConversation
in its deps, so activeConversation was captured stale. When streaming ended,
the last message was always the old value — TTS generation was never triggered.

Fix: read conversation and last message directly from useChatStore.getState()
inside the effect instead of relying on the closed-over activeConversation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When no Whisper model is installed and the user taps the mic, show a
CustomAlert offering to download Whisper Small (466 MB) immediately,
rather than navigating away to VoiceSettings.

UnavailableButton also now shows a download icon + percentage while
the model is being fetched, so feedback is in-place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a TEXT TO SPEECH section alongside IMAGE GENERATION and TEXT
GENERATION in the chat settings modal. Shows mode toggle (chat/audio),
enable switch, speed stepper, and auto-play toggle. Deep-links to
TTSSettingsScreen for full configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WHISPER_MODELS grows from 5 to 10 entries covering English-only and
Multilingual variants for tiny/base/small/medium, plus Large v3 Turbo
and Large v3.

whisperService.downloadFromUrl(url, modelId) downloads any ggml .bin
file from an arbitrary URL — enables installing community models from
HuggingFace. whisperStore exposes it as downloadFromUrl action.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrites the voice settings screen with three sections:
- Active model card with inline download progress and remove action
- Curated models grouped by English-only / Multilingual (all sizes,
  tiny → large-v3)
- Live HuggingFace search bar (500 ms debounce) that queries ASR repos;
  tap a repo to expand and browse its ggml .bin files; tap a file to
  confirm and download via downloadFromUrl

huggingFaceService gains searchWhisperRepos() and getWhisperFiles()
to power the HF search without coupling to the LLM model browser.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
llmMessages builds an input_audio content block from audio attachments
when the active model reports audio support, bypassing Whisper entirely.
llm.ts exposes getMultimodalSupport() so the voice layer can detect this.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ttsStore: adds interfaceMode, speed, autoPlay, enabled settings;
  generateAndSave flow for Audio Mode; updateMessageAudio
- ttsService: OuteTTS generate+save path for AI audio bubbles
- TTSButton: play/stop per-message with generation spinner
- KokoroTTSManager + kokoroModels: scaffold for Tier 1 Kokoro TTS
  (not yet wired to react-native-executorch, marked not started)
- App.tsx: mounts KokoroTTSManager near root
- packages: react-native-executorch, background-downloader, dr.pogodin/react-native-fs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ChatMessage: long-press action sheet gains Speak option (delegates to ttsStore)
- ModelSettingsScreen: suppress pre-existing exhaustive-deps lint warning
- Tests: update GenerationSettingsModal and ModelSettingsScreen tests for
  NumericStepper (gpu-layers-stepper-increment) replacing slider testIDs
- TTS_IMPLEMENTATION_PLAN: rewritten to reflect Audio Mode bidirectional
  voice conversation, stale closure fix, and implementation status

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sages

Two bugs causing broken Audio Mode:

1. AudioRecorder was recording at the system default rate (~44.1 kHz),
   producing WAV that Whisper interprets as static ('TV static' / [SOUND]).
   Fix: pass a preset with sampleRate:16000, BitDepth.Bit16 so the file
   is Whisper-compatible 16 kHz mono int16 PCM from the start.

2. buildOAIMessages was always including audio attachments as input_audio
   content blocks, even for models that don't support audio input (e.g.
   remote Qwen 3.5 2B / Gemma 42B). Those models replied 'I cannot hear
   audio'. Fix: buildOAIMessages now accepts supportsAudio flag (default
   false) and only emits input_audio parts when the model declares audio
   support. llm.ts passes multimodalSupport.audio when calling it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
playFromFile was treating WAV bytes as raw Float32 PCM — designed for
OuteTTS output only. WAV files have a 44-byte RIFF header plus int16
samples; reinterpreting them as Float32 produces pure static.

Fix: use AudioContext.decodeAudioData(filePath) which properly parses
the WAV header and decodes samples. The file:// prefix is added if
missing.

MessageRenderer now wraps user and assistant audio bubbles in a
container View with paddingHorizontal:16 and marginVertical:8,
matching the ChatMessage container layout so bubbles align correctly
with the chat edges instead of touching screen borders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Audio type attachments were falling through to the FadeInImage branch,
causing Image to try to load the WAV file path — resulting in a broken
image placeholder that stretched the user bubble very wide (the 'super
long' bubble issue).

Audio attachments now render as a compact mic icon + 'Voice message'
badge (matching the document badge style), keeping the bubble compact.
In Audio Mode they never reach this code — they render as AudioMessageBubble.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add isAudioModeMessage to Message type and updateMessageAudio signature.
Set flag in triggerAudioModeGeneration so mode switches don't reformat
old text messages. MessageRenderer now checks msg.isAudioModeMessage
instead of global ttsMode for assistant audio bubbles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bug 2: handlePlayPause calls speak() for AI bubbles (empty audioPath)
instead of playMessage with empty string. Remove isGenerating spinner.
Bug 3: WaveformBars gets flex:1 + overflow:hidden, WAVEFORM_BARS 40→28,
bubble overflow:hidden, maxWidth 80%→88%.
Bug 4: user bubble flips play row order (speed+duration left, play right).
Bug 5: voice cycling chip on AI bubbles reads/writes kokoroVoiceId.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix guard: was checking isModelLoaded (OuteTTS, always false) instead
  of kokoroReady — so isAudioModeMessage was never stamped and all AI
  messages rendered as text in audio mode
- Add sentence-level streaming TTS: Kokoro now starts speaking each
  sentence as soon as LLM finishes generating it, instead of waiting
  for the full response
- Fix waveform invisible in idle state: min bar height 3→6px and
  empty waveform now renders a sine-wave placeholder instead of
  nearly-invisible flat bars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds memory-rag capability and conversationRagService spec so Jarvis
can retrieve relevant context from past conversations and inject it
into the system prompt — giving it cross-chat intelligence without
requiring the user to repeat themselves.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Stamp isAudioModeMessage BEFORE checking TTS engine readiness — so
  AI messages always render as audio bubbles even when Kokoro hasn't
  downloaded yet
- Add minWidth: 220 to audio bubble so flex:1 waveform container has
  space to expand (previously collapsed to 0 since bubble shrinks to
  content in flex-end alignment)
- Audio mode input: hide text pill, show centered VoiceRecordButton
  with 'Hold to speak' / 'Release to send' hint — clearly communicates
  the interface mode
- User voice recordings now render as AudioMessageBubble in BOTH chat
  and audio mode — tap play to hear your recording back regardless of
  which interface is active

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- MessageRenderer now renders ALL assistant messages as audio bubbles
  when interfaceMode=audio (not just isAudioModeMessage-stamped ones),
  fixing old messages showing as text after enabling audio mode
- Removed voiceChip from play row; added dedicated voice row below
  controls with mic icon + voice name + chevron-right to cycle voices
- AudioMessageBubble: streaming-only messages (no audioPath) correctly
  fall through to speak(transcript) for on-demand playback
- ChatInput audio mode: added +/settings buttons back on left side so
  users can attach photos and configure tools while in audio mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alichherawalla and others added 30 commits June 20, 2026 00:35
Type the whisper HF jest.fn mocks with rest params so spreading args
into them type-checks under tsc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add OuteTTSEngine download tests: routes through the shared background
download engine, falls back to RNFS when unavailable, treats truncated
on-disk files as not-downloaded, and rejects/cleans up incomplete
downloads. Bumps the pro submodule to the rerouted engines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Transcription tab had a custom search bar (icon + border) that looked
different from the Text/Image tabs. Reuse the Models screen's shared
searchContainer/searchInput styles (and deviceBanner) so the search field
is identical across all model tabs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- CLAUDE.md: require searching for and reusing existing components,
  styles, hooks, and services before building new ones (prevents UI/logic
  drift like the divergent search field).
- docs/design/MODEL_ROUTING.md: design plan for dynamic model
  routing/orchestration — text-vs-image classification, load-on-demand
  with memory-budget eviction, STT/TTS as I/O modalities, and a phased
  rollout grounded in the existing intentClassifier + activeModelService.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- §5.3: default classifier is SmolLM2-135M-Instruct (~100MB, runs on the
  existing llama.rn runtime, kept pinned/reserved in the budget);
  heuristics-first; all-MiniLM-L6-v2 (embeddings) noted as the better-per-MB
  upgrade if a small embeddings runtime is ever added.
- §6: routing + the classifier only run when 2+ generation models are
  available; a single model is used directly with zero overhead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Core of model routing's memory guarantee (docs/design/MODEL_ROUTING.md
§5.1-5.2): keep only what's needed resident in RAM.

- policy.ts (pure): computeBudgetMB derives a RAM budget from device memory
  (min of 60% and total-minus-1.5GB headroom); planEviction picks victims so
  an incoming model fits - generation models (text/image) are mutually
  exclusive, pinned models (the ~100MB SMOL classifier) are never evicted,
  otherwise LRU.
- index.ts: ModelResidencyManager.ensureResident(spec, {load, unload}) runs
  the plan, unloads victims, then loads the target; register() accounts for
  already-loaded/pinned models; evictAll() for memory warnings. Load/unload
  are injected, so it's decoupled from the text/image/whisper/tts services
  and unit-testable.

Not yet wired into the live send path (next step). 11 unit tests cover the
budget math, mutual exclusion, pinned protection, LRU, and the manager.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"Enhance Image Prompts" runs the prompt through a text model, so it can't
work without one. Both toggles (Model Settings + the generation settings
modal) are now disabled and dimmed with a "Download a text model to enable"
hint when no text model is downloaded. The generation service already skips
enhancement when no text model is loaded, so this is the matching UI gate.

Tests updated to seed a text model where the enabled state is asserted, plus
a new test for the disabled-without-text-model case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route text + image model loads through modelResidencyManager so memory is
managed by a single budget (docs/design/MODEL_ROUTING.md), overriding the
old hardcoded "<=4GB unloads text" logic:

- activeModelService load paths call makeRoomFor() to evict (LRU, by
  estimated runtime RAM) and fit the device budget before loading, register
  the model on load, and release on unload.
- The old per-load critical-memory gate is replaced by the residency
  budget: a model that can't fit even after eviction is blocked.
- Policy is budget-driven, not hard mutual-exclusion: a high-RAM device can
  keep a text + image model co-resident if they fit; a constrained device
  evicts to make room. makeRoomFor reports whether the model fits.

Tests updated for the budget-driven behaviour (co-resident when it fits,
evict/block when it doesn't); residency manager beforeEach reset added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tting

The fast/optimised swap-strategy setting is obsolete now the residency
manager owns model swapping (docs/design/MODEL_ROUTING.md). Remove it
everywhere:

- appStore: drop the setting + default; rehydrate strips it from old
  persisted state.
- ModelLoadingStrategy type removed.
- intentClassifier: always restore the original text model after
  classifying (the residency manager fits it back into memory); the
  "memory mode keeps classifier loaded" branch is gone.
- Remove both UI toggles (Model Settings + generation settings modal) and
  the chat preload's strategy gate.

Tests updated/removed accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- appStore.lastTextModelId: persisted preference set when the user picks a
  text model (in useModelLoading). Unlike activeModelId it is not cleared
  when the residency manager evicts the model, so routing can reload it on
  demand.
- ChatScreen preloads lastTextModelId in the background on open (when no
  generation model is already loaded), so the user can start typing while
  it loads.

Foundation for on-demand text-model routing (next: classify + load/select).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the chat has only an image model (or none) loaded and the user sends a
message, it now classifies the request and routes correctly instead of
always generating an image:

- shouldRouteToImageGenerationFn classifies by fast heuristics when no text
  model is loaded (a chat request returns false instead of forcing image).
- handleSendFn: for a chat request with no text model, ensureTextModelForChat
  loads the last-selected text model (residency evicts the image model to
  fit) or opens the model selector when none was ever chosen.
- startGenerationFn routes by live model state (llmService.isModelLoaded), so
  a model loaded mid-send generates text rather than mis-routing to image.
- ChatMessageArea shows a "Loading <model>" bar above the input while the
  model loads (also covers the chat-open background preload).

Tests cover heuristic routing with no text model and the load-or-select branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When no text model is loaded, route text-vs-image with the configured
classifier model (SmolLM2) via the LLM intent classifier for real
intelligence, instead of keyword heuristics. Falls back to heuristics only
when no classifier model is downloaded. Shows the "Understanding your
request..." bar while classifying.

The classifier loads through activeModelService, so the residency manager
accounts for it; with llama.rn's single context it can't be pinned
separately from the main text model, so it stays loaded until a text model
is needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
So LLM routing works out of the box: when image-only routing needs a
classifier but none is configured, download SmolLM2-135M-Instruct (~100-145MB)
in the background via the normal text-model path (visible in the Download
Manager) and select it as settings.classifierModelId on completion. Fetches
the GGUF from HuggingFace dynamically (prefers Q8_0) so it's robust to exact
filenames. Heuristics handle that first turn; subsequent turns use the SMOL
model. Guarded against duplicate downloads and no-ops once a classifier is set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DEV_UNLOCK_PRO = __DEV__, which is true in jest, so loadProFeatures always
activated Pro and the "no activation without entitlement" assertions failed.
Set __DEV__ = false in these suites (restored after) so they exercise the
production gating they're meant to verify. Full suite now green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chat showed a full-screen LoadingScreen whenever a model was loading,
replacing the conversation. Remove it so the chat stays visible and model
loading is shown inline via the "Loading model" bar above the input (the
image model already loaded inline). Bumps pro (voice-model spinner).

Fixes a test that passed vacuously because the full-screen loading hid the
chat body (its loadedSettings was incomplete, so hasPendingSettings was
actually true) — now loadedSettings matches the full settings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…STT)

Kill the cold-start wait: on launch, preload the user's selected models in
priority order, in the background, sequentially (one native load at a time so
the UI stays responsive).

- modelResidency.canLoadWithoutEviction: a model is preloaded only if it fits
  the RAM budget without evicting a higher-priority one already warmed, so it
  self-limits on small devices (text always wins; the rest fill the remainder).
- modelPreloader walks text/image/STT (and TTS via the audio.preload hook),
  loading each only if available + fits + not already loaded.
- whisperStore registers the STT model with the residency manager on load so
  the budget accounts for it.
- App boot fires preloadSelectedModels() after the UI is shown (fire-and-forget).

Bumps pro (audio.preload hook). Tests cover the fits check, priority ordering,
skip-on-no-fit, run-once, and the empty case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a chat request arrived with no text model, routing opened the model
selector but dropped the message (the input had already cleared), forcing a
retype. Now the message is stashed when the selector opens and replayed
automatically once the user picks a text model.

- handleSendFn calls setPendingMessage(text, attachments) before aborting.
- useChatScreen.handleModelSelect replays the stashed message after the model
  loads, then clears it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jest.fn mocks were called with args via spread; type them with rest params
so tsc passes (babel ran them fine, but the type-check failed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- new audio.onStreamingToken hook fired from chatStore streaming sink; pro
  consumes it for real-time sentence-by-sentence TTS.
- chat-mode "Select text" action: long-press menu opens a selectable sheet for
  partial copy (gated to chat mode; audio bubbles unaffected).
- bump pro submodule (streaming TTS, unified playback, live speed, no-text
  audio mode, select-text-free transcript handling).
- accumulated session work: model routing/residency, warm preload, hardware
  SoC/NPU detection, generation service, RAG embedding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntences

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the two bulky model cards with a single compact Models control: a
labelled strip of four type icons (Text/Image/Voice/Speech), emerald when that
type has an active model. Tap → a manager bottom sheet with drill-in rows:
- Text/Image reuse the existing model picker
- Speech → Whisper picker (single active STT model, download+select)
- Voice → the pro voice picker (Kokoro voices)

Adds a reactive voiceSummary to the core ui-mode store (mirrored from pro) so the
Voice icon reflects voice state without core importing pro. Bumps pro
(single-Kokoro voice picker).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the header's text-model name + image badge with a compact "Models ▾"
affordance that opens the same Models bottom sheet used on home. Text/Image rows
open the existing chat model selector; Speech/Voice open the Whisper/Kokoro
pickers. Decouple ModelsManagerSheet from the home-only LoadingState type so it's
screen-agnostic (chat cross-imports the sheets for now; shared extraction is a
follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…screens

Voice and transcription are now managed entirely via the Models flow (Models tab
+ home/chat Models sheet), so both standalone settings screens are removed:
- delete the two Settings rows, the VoiceSettings/TTSSettings routes, and
  VoiceSettingsScreen (+ its test)
- clean the paths that opened them: chat generation-settings link, and the
  audio-mode toggle now routes to Models → Voice
- bump pro (drops the TTS settings screen registration)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Whisper now tracks every model present on disk (presentModelIds), not just the
single active one. The Transcription tab + the Speech picker show each on-disk
model as downloaded with the active one checked; tapping a downloaded-but-
inactive model selects it (no re-download), download is per-model, and delete is
per-model. Adds selectModel / deleteModelById / refreshPresentModels.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hook; bump pro

ActiveModelsSection was orphaned when the home cards became the Models strip.
Remove it (+ its test mock), the audioSummaryLabel HOOKS entry, and fix the
autoPlay-referencing tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uting

- waveform helpers (meanAbsAmplitude / buildWaveformEnvelope / waveformFromText)
- stream playback clock
- streamingSpeech coordinator: gating (voice mode + engine ready), thinking is
  never spoken, queue drains through the engine, trailing-partial flush, reset
- whisperStore multi-model: refreshPresentModels / selectModel (no re-download)
  / deleteModelById (active + non-active)
- ttsStore: play→synthesize routing, seek no-op for streaming, setEngine
  fallback to default, live engine speed on updateSettings

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… home Models card

Relocate ModelsSummaryRow/ModelsManagerSheet/VoiceModelsSheet/WhisperPickerSheet
from src/screens/HomeScreen/components to src/components/models so the home and
chat screens consume one shared implementation instead of two parallel copies.
Give the collapsed home Models card the surface+shadows.small treatment to match
the other home cards. Add RNTL coverage for ModelsSummaryRow and
ModelsManagerSheet, and rewrite the VoiceModelsPanel test for the voice-picker
behaviour.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add knip.json scoped to the core app (src/**, with App.tsx + tests + the pro
loader as entries; pro/ is a separate repo with its own usage and stays out of
the project graph). Remove the dead code knip found and grep confirmed unused
across both src and pro:

- delete unused LoadingOverlay screen component
- drop the legacy ChatToolbar alias (only QueueRow is consumed)
- remove the dead LoadingScreen chat component and its orphaned import
- remove unused exports: items extractQuantization (huggingface has its own),
  getHook, showLoadingAlert, processSSELines, parseSSEFromText (+ barrel
  re-export), PILL_ICONS_WIDTH, SYSTEM_PROMPT_RESERVE, CONTEXT_SAFETY_MARGIN,
  PRO_URL
- drop redundant default exports duplicating named ones (DebugLogsScreen,
  RemoteServerModal, RemoteServersScreen)

Kept intentional surfaces knip flags but that are real: the _clear*ForTesting
seams, the provider ModelLoadState/ProviderFactory types, ThemeMode, and
stripMarkdownForSpeech (consumed by pro, invisible to a src-scoped graph).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The earlier home/chat/settings redesign left several RNTL suites asserting UI
that no longer exists. Update the tests (no app code) to assert the current
behaviour:

- HomeScreen / HomeScreenSpotlight: assert the collapsed ModelsSummaryRow
  (Models label + Text/Image/Voice/Speech captions) and that per-model details,
  load/unload, eject, and "browse more" now live inside the manager + picker
  sheets opened from it; drop the deleted LoadingOverlay mock.
- ChatScreen: header now shows a generic "Models" selector that opens the
  manager sheet (then the per-type picker), not the model name inline; reset
  lastTextModelId between tests for isolation.
- SettingsScreen: the Voice Transcription and Text to Speech rows were removed;
  assert their absence and cover the remaining rows.
- ChatInputModeToggle: not-ready tap navigates to ModelsTab { initialTab:
  'voice' }.
- TranscriptionModelsTab: cover the multi-model store API (presentModelIds,
  selectModel without re-download, per-model deleteModelById).

Full suite: 5629 passing, tsc + eslint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Advance the pro pointer to the conflict-resolved merge of
feat/email-calendar-tools into fix/kokoro-install-status, so this branch
references a pro commit that exists on origin and the pro PR merges cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant