HayaiTTS

Offline, on-device neural text-to-speech for Android.

A drop-in replacement for the stock Android TTS engine. Synthesis happens locally on the phone via sherpa-onnx — no network, no account, no telemetry. Every app on the device that calls into TextToSpeech (TalkBack, navigation, reader apps, the browser's read-aloud) speaks with whichever voice you install.

600+ voices across 7 model families (Piper, VITS, Matcha, Kokoro, Kitten, Supertonic, ZipVoice, Pocket), tens of languages. Full index at docs/MODELS.md (auto-regenerated weekly by the catalog-refresh workflow, which scrapes both the upstream docs and the tts-models release assets).
In-app catalog browser with per-(speaker, language) audition clips rendered offline by tools/samples/render_samples.py, hosted at HayaiTTS-samples.
Voice cloning for ZipVoice / Pocket voices: in-app mic recorder, file-import via SAF, reference transcript + target text, generate through SherpaTtsRuntime.synthesizeCloned. Surfaced as "Clone this voice" in the Voice Detail overflow menu whenever the installed voice's family supports reference-audio cloning.
Streaming runtime — SherpaTtsRuntime.synthesizeStreaming invokes OfflineTts.generateWithCallback, exposing FloatArray chunks as the JNI layer produces them; the TTS service still buffers for now.
System TTS engine, MANAGE_VOICES activity-alias for system settings cog, foreground-service downloader, in-app update channel with auto-poll.

Install

Grab the universal *.apk from the latest release (hayaitts-vX.Y.Z.apk). Sideload it — the Play Store doesn't permit apps that ship GPL-3 binaries in this configuration.
Open the app. The bundled English voice (Piper Amy) ships inside the APK so first-run synthesis works offline immediately. Browse to download more.
Settings → System → Languages & input → Text-to-speech → HayaiTTS. The exact path varies by OEM (Samsung One UI calls it "Speech"); MANAGE_VOICES activity-alias is wired in the manifest so the cog next to HayaiTTS on that page lands you on Library directly.

Update channels (Stable / Beta / Nightly) are switched from Settings → Update channel; nightlies publish to a separate HayaiApp/HayaiTTS-nightly repo so the main Releases page stays focused on production builds.

Features

Catalog & install

Catalog scraper walks two upstream sources:
1. The sherpa-onnx docs index — rich per-voice metadata (named speakers, sample rate, license, demo URL).
2. A direct walk of every .tar.bz2 asset on the tts-models release — fills the gap for families k2-fsa publishes but hasn't documented yet (currently the entire ZipVoice / Pocket cloning catalog plus hundreds of Piper variants).
Result is 600+ voices across the 7 runtime-supported families (Piper, VITS, Matcha, Kokoro, Kitten, Supertonic, ZipVoice, Pocket). Live count + per-family breakdown in docs/MODELS.md.
Catalog auto-refreshed weekly from the sherpa-onnx TTS model index; see catalog-refresh.yml.
Per-voice sha256 verification on download; resumable downloads via WorkManager with backoff.
WiFi-only download toggle, configurable storage location (internal / external SD), live storage-migrator for moving installed voices between locations without losing them.
Custom voice import: sherpa-onnx-compatible .tar.bz2 bundles can be installed from disk via the SAF import flow.

Browse + filters

Filter by tier (low / mid / high), gender, language, model family, capability (e.g. "Voice cloning"). All filters compose; result count reflects the active filter set.
Inline search field in the floating top bar, plus dedicated search for the long languages list inside the filter sheet.
Global allowed-languages setting in Settings → Defaults gates the whole catalog so users with a narrow language interest never see the other 60+ tags.
Per-(speaker, language) audition: tap play on any catalog row to hear a 5-second clip before downloading.

Per-voice surfaces

Voice Detail: hero card, license, sample-rate / family / tier chip strip, speaker picker (avatars for multi-speaker models, friendly fallback labels for anonymous speaker_N placeholders).
Studio (Playground): live speed / pitch / length / noise / noise-W sliders, persisted per voice. Waveform amplitude indicator, sample history with delete + replay. In-bar voice picker so users tune multiple voices without leaving the tab.
Default voice per locale: each installed voice can be pinned as the default for any locale it ships. The TTS engine routes by locale at synthesis time.

Activity tab

Live download / extraction progress (determinate, matches the rest of the app).
Synthesis telemetry: RTF, synth ms, audio ms, char count, caller package per request. Filterable.
Cache + storage stats.

Quality of life

M3 Expressive theme, monochrome by design, dynamic-color opted out.
Floating-pill top bar that doesn't collapse on scroll — content scrolls under the bar; status-bar inset handled once at the bar level.
M3 navigation bar with five tabs; bottom-nav insets correctly applied so content never gets cut off by the system navigation.
Bottom-sheet quick switcher (HayaiQuickSwitcher) hoisted at the activity root so every screen can flip the default voice with one tap.
Auto-updater: 6 h debounce on launch-time auto-check; manual "Check for updates" in Settings; in-app APK install via FileProvider.
Crash reporter: an UncaughtExceptionHandler redirects to a :crash process activity that auto-copies the stack trace + device metadata to clipboard for paste into a GitHub issue.

Privacy

No network during synthesis. The only network calls are:
- Catalog JSON refresh (one raw.githubusercontent.com GET).
- The voice bundle download you explicitly trigger.
- The audition MP3 streamed from the samples repo when you tap play.
No analytics, no crash reporting service, no accounts.
All voice data lives in filesDir/voices/ (or external SD if selected).

Voice catalog

The catalog is the source of truth for everything Browse shows.

Machine-readable: catalog/v1/models.json — one VoiceCard per entry, fields match the Kotlin VoiceCard data class exactly so the on-device JSON parser is a single Json.decodeFromString.
Human-readable: docs/MODELS.md — auto-regenerated from the JSON on every catalog refresh (see tools/catalog/build_model_list.py).
Refresh cadence: weekly, every Monday 06:30 UTC. Hand-trigger via Actions → catalog-refresh → Run workflow.

Family	Cloning	Notes
Piper	—	VITS, 10–60 MB, ~70 languages, hundreds of named voices
VITS	—	Bare VITS checkpoints (LJSpeech, VCTK, Coqui, etc.)
Kokoro	—	Higher-quality VITS variant, 80–360 MB, multi-speaker
Kitten	—	Tiny English-only, fastest synthesis
Matcha	—	Diffusion + side-vocoder
Supertonic	—	2026 model, 30 languages × 10 speakers in one bundle
ZipVoice	✓	Reference-audio cloning, flow-matching (Apache-2.0)
Pocket	✓	Voice-embedding cloning, smaller weights (Apache-2.0)

For exact counts and per-voice details (bundle URL, sample rate, license, speakers, languages, size) see docs/MODELS.md — that file is auto-generated from catalog/v1/models.json after every refresh.

Reference-audio cloning maps to sherpa-onnx's OfflineTts.generateWithConfig(text, GenerationConfig), where GenerationConfig carries:

referenceAudio: FloatArray — mono clip in [-1, 1]
referenceSampleRate: Int — e.g. 16000
referenceText: String — exact transcript of what the reference says
numSteps: Int — flow-matching diffusion steps (8 by default)

The runtime entry point is SherpaTtsRuntime.synthesizeCloned(...); the user-facing flow is the Voice Cloning screen reached from any installed cloning-capable voice. Specific upstream bundles that will work as soon as the scraper sees them:

sherpa-onnx-zipvoice-zh-en-emilia (full precision)
sherpa-onnx-zipvoice-distill-zh-en-emilia / -int8 / -fp32
sherpa-onnx-pocket-tts-2026-01-26 / -int8

Architecture

Built on Kotlin + Jetpack Compose (M3 Expressive), Koin for DI, Room for installed-voice state, WorkManager for downloads, OkHttp for catalog refresh, and the upstream sherpa-onnx AAR (vendored at app/libs/sherpa-onnx-1.13.2.aar) for the JNI runtime.

ui/  ───────► UseCase ──────► Repository ──────► Data source ──────► sherpa-onnx JNI
                                                       └──► Room (installed voices)
                                                       └──► DataStore (settings)

Dependency direction is enforced by Konsist tests in app/src/test/java/.../core/konsist/; the suite gates CI.

domain/ is pure Kotlin. No android.*, no androidx.* (except androidx.annotation).
data/ may import domain but never ui.
ui/ may import both, plus Compose.
Threading: every coroutine launch flows through the injected DispatcherProvider; importing kotlinx.coroutines.Dispatchers outside core/ is a Konsist-blocked violation.
Errors surface as Outcome<T>; repositories never throw.
The TTS engine bar is a single shared composable (HayaiTopBar) used by every screen — plain Material TopAppBar is forbidden.

Full guide: docs/ARCHITECTURE.md. Build commands + release signing + history: ONBOARDING.md.

Build

JDK 21 required (Android Studio ships JBR 21). Gradle toolchain is pinned, so the system JDK version doesn't matter as long as org.gradle.java.home points at a JDK-21 install.

git clone https://github.com/HayaiApp/HayaiTTS.git
cd HayaiTTS
./gradlew assembleDebug
adb install -r app/build/outputs/apk/debug/app-arm64-v8a-debug.apk

PowerShell on Windows:

$env:JAVA_HOME = "C:\Program Files\Android\Android Studio\jbr"
.\gradlew assembleDebug "-Dorg.gradle.java.home=$env:JAVA_HOME"

Lint is treated as a hard error, the Konsist architecture suite runs on :app:test, and lintVitalRelease runs as part of assembleRelease — keep all three green.

CI / release pipeline

Workflow	Trigger	Effect
`build_check.yml`	PR / push	Build `assembleDebug` + `assembleRelease`, run Konsist + unit tests
`build_push.yml`	manual `workflow_dispatch`	Signed release. `v…` tags → main repo (stable + beta), `r…` tags → HayaiApp/HayaiTTS-nightly
`catalog-refresh.yml`	weekly + manual	Re-scrape upstream sherpa-onnx index, regenerate `catalog/v1/models.json` and `docs/MODELS.md`, commit to `main`
`render-samples.yml`	weekly + manual	Render per-(speaker, language) audition MP3s, publish to the `HayaiTTS-samples` releases

The Android version baked into the APK comes from the git tag (the release workflow sets HAYAITTS_VERSION_NAME=${VERSION_TAG}), so the "Current version" row in Settings → Updates shows e.g. Beta · 2.0.0-b3 (203) or Nightly · r142 (142).

Roadmap

Tracked in this repo's issues. Near-term:

Catalog scraper: pick up ZipVoice + Pocket bundles even when the upstream documentation index doesn't list them. Currently the only thing blocking voice cloning from being end-to-end useable is that build_catalog.py discovers slugs from the docs searchindex.js and the cloning models are only on the release page.
Streaming playback in HayaiTtsService. The runtime API (synthesizeStreaming) is in place; the service still buffers the full FloatArray before pushing PCM to the framework callback, which blocks low-latency read-aloud apps. Wiring chunk-by-chunk callback.audioAvailable(...) is the next step.
Per-speaker metadata: most upstream catalogs ship speaker_0…N with no gender / age / style annotations. The display layer already collapses anonymous placeholders to "Voice N"; ingesting community-curated metadata where it exists is the upgrade path.
More filter dimensions: voice description, training corpus, release year, style tags.

Contributing

PRs welcome. Quick rules:

For UI changes attach a screenshot or short recording. Compose previews and Konsist don't capture motion or haptics, which is where most of the UX value lives.
Strings ship in all 10 supported locales (res/values-*). CI does not block on missing translations yet, but lint will warn — copy your key into every locale file (ar, de, es, fr, it, ja, ko, pt-rBR, ru, zh-rCN) before opening the PR.
New configuration options go through DI in app/di/AppModule.kt rather than direct singletons.
See CLAUDE.md for the project's coding conventions and forbidden patterns.

FAQ

Why isn't it on Google Play? Bundling GPL-3 binaries (sherpa-onnx) violates Play's distribution terms when combined with the app's own GPL-3 licensing. Sideload the APK from Releases.

Will it drain my battery? Synthesis is sub-second on a 2020+ phone. The engine sleeps between requests; the foreground-service downloader stops itself once the queue is empty.

iOS? Not possible — iOS doesn't permit third-party processes to replace AVSpeechSynthesizer. (This is also why HayaiTTS isn't a wrapper around a cloud TTS API.)

Where do the voices come from? sherpa-onnx curates them from upstream projects: Piper, Kokoro-82M, Kitten, Matcha-TTS, Supertonic, ZipVoice, Pocket.

Is the cloning model going to be open? ZipVoice and Pocket are research models from k2-fsa; both are Apache-2-licensed on HuggingFace. Hayai surfaces them once the sherpa-onnx-zipvoice-…tar.bz2 releases land on the upstream index.

License

GPL-3.0 — see LICENSE. Individual voice licenses vary and are captured per-entry in the catalog (most are MIT or Apache-2.0). Check the upstream source on Hugging Face / GitHub before redistributing any individual voice commercially.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
app		app
catalog/v1		catalog/v1
docs		docs
gradle		gradle
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
HANDOFF.md		HANDOFF.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
ONBOARDING.md		ONBOARDING.md
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts
signing.properties.template		signing.properties.template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HayaiTTS

Install

Features

Catalog & install

Browse + filters

Per-voice surfaces

Activity tab

Quality of life

Privacy

Voice catalog

Architecture

Build

CI / release pipeline

Roadmap

Contributing

FAQ

License

Related

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HayaiTTS

Install

Features

Catalog & install

Browse + filters

Per-voice surfaces

Activity tab

Quality of life

Privacy

Voice catalog

Architecture

Build

CI / release pipeline

Roadmap

Contributing

FAQ

License

Related

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages