A drop-in replacement for the stock Android TTS engine. Synthesis happens
locally on the phone via sherpa-onnx
— no network, no account, no telemetry. Every app on the device that
calls into TextToSpeech (TalkBack, navigation, reader apps, the browser's
read-aloud) speaks with whichever voice you install.
- 600+ voices across 7 model families (Piper, VITS, Matcha, Kokoro,
Kitten, Supertonic, ZipVoice, Pocket), tens of languages. Full index
at
docs/MODELS.md(auto-regenerated weekly by the catalog-refresh workflow, which scrapes both the upstream docs and thetts-modelsrelease assets). - In-app catalog browser with per-(speaker, language) audition clips
rendered offline by
tools/samples/render_samples.py, hosted at HayaiTTS-samples. - Voice cloning for ZipVoice / Pocket voices: in-app mic recorder,
file-import via SAF, reference transcript + target text, generate
through
SherpaTtsRuntime.synthesizeCloned. Surfaced as "Clone this voice" in the Voice Detail overflow menu whenever the installed voice's family supports reference-audio cloning. - Streaming runtime —
SherpaTtsRuntime.synthesizeStreaminginvokesOfflineTts.generateWithCallback, exposing FloatArray chunks as the JNI layer produces them; the TTS service still buffers for now. - System TTS engine,
MANAGE_VOICESactivity-alias for system settings cog, foreground-service downloader, in-app update channel with auto-poll.
- Grab the universal
*.apkfrom the latest release (hayaitts-vX.Y.Z.apk). Sideload it — the Play Store doesn't permit apps that ship GPL-3 binaries in this configuration. - Open the app. The bundled English voice (Piper Amy) ships inside the APK so first-run synthesis works offline immediately. Browse to download more.
- Settings → System → Languages & input → Text-to-speech → HayaiTTS.
The exact path varies by OEM (Samsung One UI calls it "Speech");
MANAGE_VOICESactivity-alias is wired in the manifest so the cog next to HayaiTTS on that page lands you on Library directly.
Update channels (Stable / Beta / Nightly) are switched from Settings → Update channel; nightlies publish to a separate HayaiApp/HayaiTTS-nightly repo so the main Releases page stays focused on production builds.
- Catalog scraper walks two upstream sources:
- The sherpa-onnx docs index — rich per-voice metadata (named speakers, sample rate, license, demo URL).
- A direct walk of every
.tar.bz2asset on thetts-modelsrelease — fills the gap for families k2-fsa publishes but hasn't documented yet (currently the entire ZipVoice / Pocket cloning catalog plus hundreds of Piper variants).
- Result is 600+ voices across the 7 runtime-supported families
(Piper, VITS, Matcha, Kokoro, Kitten, Supertonic, ZipVoice, Pocket).
Live count + per-family breakdown in
docs/MODELS.md. - Catalog auto-refreshed weekly from the
sherpa-onnx TTS model index;
see
catalog-refresh.yml. - Per-voice sha256 verification on download; resumable downloads via WorkManager with backoff.
- WiFi-only download toggle, configurable storage location (internal / external SD), live storage-migrator for moving installed voices between locations without losing them.
- Custom voice import: sherpa-onnx-compatible
.tar.bz2bundles can be installed from disk via the SAF import flow.
- Filter by tier (low / mid / high), gender, language, model family, capability (e.g. "Voice cloning"). All filters compose; result count reflects the active filter set.
- Inline search field in the floating top bar, plus dedicated search for the long languages list inside the filter sheet.
- Global allowed-languages setting in Settings → Defaults gates the whole catalog so users with a narrow language interest never see the other 60+ tags.
- Per-(speaker, language) audition: tap play on any catalog row to hear a 5-second clip before downloading.
- Voice Detail: hero card, license, sample-rate / family / tier chip
strip, speaker picker (avatars for multi-speaker models, friendly
fallback labels for anonymous
speaker_Nplaceholders). - Studio (Playground): live speed / pitch / length / noise / noise-W sliders, persisted per voice. Waveform amplitude indicator, sample history with delete + replay. In-bar voice picker so users tune multiple voices without leaving the tab.
- Default voice per locale: each installed voice can be pinned as the default for any locale it ships. The TTS engine routes by locale at synthesis time.
- Live download / extraction progress (determinate, matches the rest of the app).
- Synthesis telemetry: RTF, synth ms, audio ms, char count, caller package per request. Filterable.
- Cache + storage stats.
- M3 Expressive theme, monochrome by design, dynamic-color opted out.
- Floating-pill top bar that doesn't collapse on scroll — content scrolls under the bar; status-bar inset handled once at the bar level.
- M3 navigation bar with five tabs; bottom-nav insets correctly applied so content never gets cut off by the system navigation.
- Bottom-sheet quick switcher (
HayaiQuickSwitcher) hoisted at the activity root so every screen can flip the default voice with one tap. - Auto-updater: 6 h debounce on launch-time auto-check; manual "Check for updates" in Settings; in-app APK install via FileProvider.
- Crash reporter: an
UncaughtExceptionHandlerredirects to a:crashprocess activity that auto-copies the stack trace + device metadata to clipboard for paste into a GitHub issue.
- No network during synthesis. The only network calls are:
- Catalog JSON refresh (one
raw.githubusercontent.comGET). - The voice bundle download you explicitly trigger.
- The audition MP3 streamed from the samples repo when you tap play.
- Catalog JSON refresh (one
- No analytics, no crash reporting service, no accounts.
- All voice data lives in
filesDir/voices/(or external SD if selected).
The catalog is the source of truth for everything Browse shows.
- Machine-readable:
catalog/v1/models.json— oneVoiceCardper entry, fields match the KotlinVoiceCarddata class exactly so the on-device JSON parser is a singleJson.decodeFromString. - Human-readable:
docs/MODELS.md— auto-regenerated from the JSON on every catalog refresh (seetools/catalog/build_model_list.py). - Refresh cadence: weekly, every Monday 06:30 UTC. Hand-trigger via
Actions → catalog-refresh → Run workflow.
| Family | Cloning | Notes |
|---|---|---|
| Piper | — | VITS, 10–60 MB, ~70 languages, hundreds of named voices |
| VITS | — | Bare VITS checkpoints (LJSpeech, VCTK, Coqui, etc.) |
| Kokoro | — | Higher-quality VITS variant, 80–360 MB, multi-speaker |
| Kitten | — | Tiny English-only, fastest synthesis |
| Matcha | — | Diffusion + side-vocoder |
| Supertonic | — | 2026 model, 30 languages × 10 speakers in one bundle |
| ZipVoice | ✓ | Reference-audio cloning, flow-matching (Apache-2.0) |
| ✓ | Voice-embedding cloning, smaller weights (Apache-2.0) |
For exact counts and per-voice details (bundle URL, sample rate, license,
speakers, languages, size) see docs/MODELS.md — that
file is auto-generated from catalog/v1/models.json after every refresh.
Reference-audio cloning maps to sherpa-onnx's
OfflineTts.generateWithConfig(text, GenerationConfig),
where GenerationConfig carries:
referenceAudio: FloatArray— mono clip in[-1, 1]referenceSampleRate: Int— e.g. 16000referenceText: String— exact transcript of what the reference saysnumSteps: Int— flow-matching diffusion steps (8 by default)
The runtime entry point is
SherpaTtsRuntime.synthesizeCloned(...);
the user-facing flow is the
Voice Cloning screen
reached from any installed cloning-capable voice. Specific upstream
bundles that will work as soon as the scraper sees them:
sherpa-onnx-zipvoice-zh-en-emilia(full precision)sherpa-onnx-zipvoice-distill-zh-en-emilia/-int8/-fp32sherpa-onnx-pocket-tts-2026-01-26/-int8
Built on Kotlin + Jetpack Compose (M3 Expressive), Koin for DI,
Room for installed-voice state, WorkManager for downloads,
OkHttp for catalog refresh, and the upstream
sherpa-onnx AAR
(vendored at app/libs/sherpa-onnx-1.13.2.aar) for the JNI runtime.
ui/ ───────► UseCase ──────► Repository ──────► Data source ──────► sherpa-onnx JNI
└──► Room (installed voices)
└──► DataStore (settings)
Dependency direction is enforced by
Konsist tests in
app/src/test/java/.../core/konsist/; the suite gates CI.
domain/is pure Kotlin. Noandroid.*, noandroidx.*(exceptandroidx.annotation).data/may importdomainbut neverui.ui/may import both, plus Compose.- Threading: every coroutine launch flows through the injected
DispatcherProvider; importingkotlinx.coroutines.Dispatchersoutsidecore/is a Konsist-blocked violation. - Errors surface as
Outcome<T>; repositories never throw. - The TTS engine bar is a single shared composable
(
HayaiTopBar) used by every screen — plain MaterialTopAppBaris forbidden.
Full guide: docs/ARCHITECTURE.md. Build commands +
release signing + history: ONBOARDING.md.
JDK 21 required (Android Studio ships JBR 21). Gradle toolchain is pinned,
so the system JDK version doesn't matter as long as org.gradle.java.home
points at a JDK-21 install.
git clone https://github.com/HayaiApp/HayaiTTS.git
cd HayaiTTS
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-arm64-v8a-debug.apkPowerShell on Windows:
$env:JAVA_HOME = "C:\Program Files\Android\Android Studio\jbr"
.\gradlew assembleDebug "-Dorg.gradle.java.home=$env:JAVA_HOME"Lint is treated as a hard error, the Konsist architecture suite runs on
:app:test, and lintVitalRelease runs as part of assembleRelease —
keep all three green.
| Workflow | Trigger | Effect |
|---|---|---|
build_check.yml |
PR / push | Build assembleDebug + assembleRelease, run Konsist + unit tests |
build_push.yml |
manual workflow_dispatch |
Signed release. v… tags → main repo (stable + beta), r… tags → HayaiApp/HayaiTTS-nightly |
catalog-refresh.yml |
weekly + manual | Re-scrape upstream sherpa-onnx index, regenerate catalog/v1/models.json and docs/MODELS.md, commit to main |
render-samples.yml |
weekly + manual | Render per-(speaker, language) audition MP3s, publish to the HayaiTTS-samples releases |
The Android version baked into the APK comes from the git tag (the
release workflow sets HAYAITTS_VERSION_NAME=${VERSION_TAG}), so the
"Current version" row in Settings → Updates shows e.g.
Beta · 2.0.0-b3 (203) or Nightly · r142 (142).
Tracked in this repo's issues. Near-term:
- Catalog scraper: pick up ZipVoice + Pocket bundles even when the
upstream documentation index doesn't list them. Currently the only
thing blocking voice cloning from being end-to-end useable is that
build_catalog.pydiscovers slugs from the docssearchindex.jsand the cloning models are only on the release page. - Streaming playback in
HayaiTtsService. The runtime API (synthesizeStreaming) is in place; the service still buffers the full FloatArray before pushing PCM to the framework callback, which blocks low-latency read-aloud apps. Wiring chunk-by-chunkcallback.audioAvailable(...)is the next step. - Per-speaker metadata: most upstream catalogs ship
speaker_0…Nwith no gender / age / style annotations. The display layer already collapses anonymous placeholders to "Voice N"; ingesting community-curated metadata where it exists is the upgrade path. - More filter dimensions: voice description, training corpus, release year, style tags.
PRs welcome. Quick rules:
- For UI changes attach a screenshot or short recording. Compose previews and Konsist don't capture motion or haptics, which is where most of the UX value lives.
- Strings ship in all 10 supported locales (
res/values-*). CI does not block on missing translations yet, but lint will warn — copy your key into every locale file (ar,de,es,fr,it,ja,ko,pt-rBR,ru,zh-rCN) before opening the PR. - New configuration options go through DI in
app/di/AppModule.ktrather than direct singletons. - See
CLAUDE.mdfor the project's coding conventions and forbidden patterns.
Why isn't it on Google Play? Bundling GPL-3 binaries (sherpa-onnx) violates Play's distribution terms when combined with the app's own GPL-3 licensing. Sideload the APK from Releases.
Will it drain my battery? Synthesis is sub-second on a 2020+ phone. The engine sleeps between requests; the foreground-service downloader stops itself once the queue is empty.
iOS?
Not possible — iOS doesn't permit third-party processes to replace
AVSpeechSynthesizer. (This is also why HayaiTTS isn't a wrapper around
a cloud TTS API.)
Where do the voices come from? sherpa-onnx curates them from upstream projects: Piper, Kokoro-82M, Kitten, Matcha-TTS, Supertonic, ZipVoice, Pocket.
Is the cloning model going to be open?
ZipVoice and Pocket are research models from k2-fsa; both are
Apache-2-licensed on HuggingFace. Hayai surfaces them once the
sherpa-onnx-zipvoice-…tar.bz2 releases land on the upstream index.
GPL-3.0 — see LICENSE. Individual voice licenses vary and are captured per-entry in the catalog (most are MIT or Apache-2.0). Check the upstream source on Hugging Face / GitHub before redistributing any individual voice commercially.
- Hayai — the manga and novel reader that uses HayaiTTS for read-aloud.
- HayaiTTS-samples — per-(speaker, language) audition clips, regenerated weekly.
- HayaiTTS-nightly —
nightly
r…builds for users on the Nightly update channel.