Native Unity voice-AI plugin. Fully offline. English. Privacy-first. Mic → Whisper → memory + RAG → Qwen3 GGUF → Kokoro → audio. One package. Zero cloud.
Sauti ("voice" in Swahili) lets a Unity game or VR experience hold a real spoken conversation with an AI character — entirely on the player's device, with no API keys, no cloud bill, and no audio ever leaving the headset.
- 🎤 Speech in. Whisper Small / Tiny ONNX, English, ~300 ms TTFA on desktop CPU.
- 🧠 Three-layer memory. Conversation history + temporary KV facts + RAG over a knowledge base you author yourself.
- 🤖 LLM brain. Qwen3-1.7B GGUF via llama.cpp on flagship; smaller variants on Quest.
- 🔊 Voice out. Kokoro 82M ONNX with 11 voices.
- 🎮 Drop-in for Unity 6+. Three UPM packages, one Editor menu, done.
- 🖱️ Two parallel APIs (v1.3+). Pure C# for programmers (
new KokoroTtsRunner(...)), drag-and-dropSautiSpeaker/SautiKnowledgeBase/SautiAgentMonoBehaviours +Voice Profile/Knowledge Config/LLM ConfigScriptableObjects for designers. Same runtime — choose either.
🎤 Mic → Whisper ONNX → text → Memory (history + RAG + temp KV) → Qwen3 GGUF → tokens → Kokoro ONNX → 🔊 Audio
STT Three-layer enriched prompt LLM TTS
Two strictly-partitioned runtimes (ONNX Runtime + llama.cpp) — they share no memory and no GPU context, only C# strings. See memory/voice_ai_architecture.md for the full spec.
You have two ways to consume Sauti.
git clone https://github.com/SeedeXR/sauti-unity-plugin.git
cd sauti-unity-plugin
# Then: Unity Hub → Add project from disk → select this folderOne command — tools/setup-sauti.sh (macOS/Linux/WSL) — handles all three install steps + the model downloads:
# From a checked-out copy of this repo:
./tools/setup-sauti.sh --project-path /path/to/YourUnityProject
# Or, if you only have the script and want a fresh install:
curl -fsSL https://raw.githubusercontent.com/SeedeXR/sauti-unity-plugin/main/tools/setup-sauti.sh -o setup-sauti.sh
chmod +x setup-sauti.sh
./setup-sauti.sh --project-path /path/to/YourUnityProjectWhat it does, in order:
- Writes the bootstrap to
Packages/manifest.json— Sauti dep (via Git URL by default) +npmjsscoped registry +com.github.asus4.onnxruntimepeer. Idempotent — re-runs are no-ops if the entries are already present. - Invokes Unity in batchmode to run
Sauti.Editor.Setup.SautiSetupWizard.FixAllHeadless— adds the remaining peer deps (LLMUnity, whisper.unity, Collections, Mathematics) and writes the scripting-define symbols across Standalone/Android/iOS/WebGL. - Downloads the AI models from Hugging Face into
<project>/Assets/StreamingAssets/VoiceAI/with SHA-256 verification. Default profile (--models essential, ~1.4 GB): Kokoro 82M + 1 voice + MiniLM + Whisper Tiny + Qwen3-1.7B.--models alladds the other 10 voices + Whisper Small.--models noneskips downloads.
Common variants:
# Local tarball install (no internet for Sauti's source — still needs HF for models):
./tools/setup-sauti.sh --project-path <proj> \
--source tarball --tarball dist/com.sauti.voice-ai-1.3.2.tgz
# Just verify the model files match their SHA-256s, don't redownload anything:
./tools/setup-sauti.sh --project-path <proj> --no-bootstrap --no-wizard --verify
# Bootstrap only, defer the rest:
./tools/setup-sauti.sh --project-path <proj> --no-wizard --models noneRun ./tools/setup-sauti.sh --help for the full option list.
Step 1 — Paste this bootstrap into your project's Packages/manifest.json:
{
"scopedRegistries": [
{
"name": "npmjs",
"url": "https://registry.npmjs.com",
"scopes": ["com.github.asus4"]
}
],
"dependencies": {
"com.sauti.voice-ai": "https://github.com/SeedeXR/sauti-unity-plugin.git?path=packaging/com.sauti.voice-ai",
"com.github.asus4.onnxruntime": "0.4.7"
}
}That's the minimum bootstrap — Sauti itself + the one peer dep + the scoped registry Unity needs to find it.
Step 2 — Open the project in Unity. First import takes 1–3 min (Git clone + UPM resolution). The Sauti Setup Wizard auto-opens; if it doesn't, run Sauti → Verify Setup from the menu bar.
Step 3 — Click "Fix everything I can" in the wizard. It writes the remaining peer deps (LLMUnity, whisper.unity, Unity Collections, Unity Mathematics) into your manifest.json and sets the two scripting-define symbols (SAUTI_LLMUNITY_AVAILABLE, SAUTI_WHISPER_UNITY_AVAILABLE) across Standalone/Android/iOS/WebGL. Unity re-resolves packages once.
Step 4 — Download the ~1.6 GB of AI models (the only thing Sauti can't auto-fetch — Hugging Face license walls). Clone the source repo and copy Assets/StreamingAssets/VoiceAI/, or wait for the post-v1.3 model downloader.
Same logic as the GUI wizard, no dialogs:
unity -batchmode -quit -projectPath <project> \
-executeMethod Sauti.Editor.Setup.SautiSetupWizard.FixAllHeadless- Tarball file: download
com.sauti.voice-ai-<version>.tgzfrom Releases, put it underPackages/tarballs/, replace the Git URL in Step 1 with"file:tarballs/com.sauti.voice-ai-1.3.2.tgz". - Package Manager GUI:
Window → Package Manager → ➕ → Install package from tarball→ select the.tgz. You still need the scoped registry + ONNX line from Step 1. - Build the tarball yourself:
tools/package-sauti.sh --skip-testsfrom a checked-out source repo →dist/com.sauti.voice-ai-<version>.tgz.
# 1. Open project in Unity (auto-imports ~1.6 GiB of AI models from ai-models/)
# 2. Build the RAG knowledge base:
# Menu: Sauti → Build Knowledge Base
# 3. Open one of the six experiment scenes:
# experiments/01-tts-hello/HelloScene.unity (smallest — just text-to-speech)
# experiments/05-full-voice-loop/VoiceLoopScene.unity (the integrated demo)
# 4. Press Play.See the Quickstart guide for the full walkthrough.
No-code path: drop in a JSON template, set a voice id, ship.
- NPC dialogue — single character, configurable persona / voice / knowledge tag
- Quest narrator — branching world narrator with chapter cues
- Voice command routing — speech → game action mapping
- VR companion — location-aware persistent companion (Quest)
- Knowledge feed — bulk ingestion of game lore into the RAG database
- Structured output — let the LLM trigger deterministic game mechanics
Code-first path: composable subsystems with clean C# interfaces.
Sauti.Memory.TemporaryMemory— session-scoped KV factsSauti.Memory.SautiRag— injectable RAG retrieval wrapperSauti.Editor.Rag.KnowledgeBaseChunker— paragraph-boundary chunkerSauti.Editor.Rag.MiniLmRagEmbedder— 384-dim sentence-transformer embedderSauti.Tts.KokoroTtsRunner— Kokoro 82M TTS with 11 built-in voicesSauti.Editor.Rag.RagDatabaseBuilder—[MenuItem("Sauti/Build Knowledge Base")]
All subsystems are dependency-injectable, fence upstream packages behind preprocessor symbols, and have 33+ NUnit EditMode tests.
Each is a Unity scene with a single MonoBehaviour orchestrator + a README explaining what it proves.
| # | Experiment | Demonstrates |
|---|---|---|
| 1 | 01-tts-hello |
Type → Kokoro → audio |
| 2 | 02-stt-loopback |
Push-to-talk → Whisper → text |
| 3 | 03-llm-chat |
Text → Qwen3 → streamed tokens + sentence events |
| 4 | 04-rag-grounding |
A/B toggle proving RAG changes the LLM's answer |
| 5 | 05-full-voice-loop |
The integrated headline demo |
| 6 | 06-vr-quest-npc |
Spatialised VR NPC on Quest with controller trigger |
- No internet connection required or used at runtime.
- No telemetry, no analytics, no model downloads after install.
- All four models live on disk in
Assets/StreamingAssets/VoiceAI/and load from there. - User audio and conversation history stay on the device. Per-session memory clears on app exit.
- Android caveat: models copy from the compressed
.jartoApplication.persistentDataPathon first launch.
| Platform | STT | LLM | Embeddings | TTS |
|---|---|---|---|---|
| Windows / macOS / Linux | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| iOS / Android (flagship) | Whisper Small | Qwen3-1.7B Q5_K_M | MiniLM | Kokoro |
| Meta Quest 2 / 3 | Whisper Tiny | Qwen3-1.7B Q5_K_M* | MiniLM | Kokoro |
| Android (low-end) | Whisper Tiny | Qwen3-1.7B Q5_K_M* | MiniLM | Kokoro |
* v1.2 Quest path uses Qwen3-1.7B (1.26 GB; tight on Quest 3's 8 GB RAM but functional). Gemma3-1B Q4_K_M was the original Quest pick but is deferred to a future release pending Gemma TOS acceptance. See per-platform notes.
Engineered + tested. All four pipeline stages compile cleanly in Unity 6.4. 38/38 EditMode tests pass. Real knowledge.db builds in 226 ms from the Frostmere sample knowledge base. Scene assembly + hardware validation on Quest are the remaining human-side tasks.
See SHIP_READINESS.md for the step-by-step go-live guide.
| Surface | State |
|---|---|
| Compile | ✓ 0 errors, 0 warnings |
| EditMode tests (Sauti) | ✓ 50 / 50 pass — Unit 35, Integration 6, Regression 9 |
| Upstream tests (whisper.unity, onnxruntime-unity) | ✓ 3 / 3 |
| Knowledge.db build | ✓ End-to-end against real MiniLM weights |
| Six experiment scaffolds | ✓ Code + READMEs + scene-creation guides |
UPM tarball build (tools/package-sauti.sh) |
✓ End-to-end, 88 KB tarball, SHA-256 emitted |
| GitHub Actions: docs + package | ✓ Wired to main push + v* tag |
Six .unity scene files |
⏳ Manual creation (Editor GUI) |
| Quest hardware validation | ⏳ Needs physical device |
| Topic | Where |
|---|---|
| Canonical pipeline spec | memory/voice_ai_architecture.md |
| Ship readiness checklist | SHIP_READINESS.md |
| Full docs site (mkdocs) | https://SeedeXR.github.io/sauti-unity-plugin |
| Session log (audit trail) | memory/handover_session.md |
| Memory + agent files | memory/ (15 docs) |
| Per-experiment guides | experiments/*/README.md |
sauti-unity-plugin/
├── Assets/ Unity asset tree (repo root is the Unity project)
│ ├── Sauti/Runtime/ C# memory + TTS runner subsystems
│ ├── Sauti/Editor/ MiniLM embedder + RAG menu builder
│ ├── Sauti/Tests/Editor/ 50 NUnit EditMode tests (unit + integration + regression)
│ └── StreamingAssets/VoiceAI/ 1.6 GiB of AI models (runtime location)
├── Packages/manifest.json 6 UPM dependencies (auto-fetched)
├── ProjectSettings/ Unity project config
├── packaging/com.sauti.voice-ai/ UPM package source (Runtime/, Editor/, Tests/, Samples~/, Documentation~/)
├── tools/ Build scripts (package-sauti.sh)
├── ai-models/ Source-of-truth model checkout
├── docs/ MkDocs source tree (this docs site)
├── experiments/ Six runnable demos
├── knowledge-base/ Plain-text source for the RAG database
├── memory/ Append-only doc + session log
├── templates/ JSON narrative templates
├── instructions/ Engineering operations guide
├── .github/workflows/ docs.yml + package.yml
├── mkdocs.yml Docs site config
├── README.md This file
└── SHIP_READINESS.md Step-by-step go-live guide
┌──────────────────────────────────────────────────────────────────┐
│ Sauti voice-AI pipeline │
│ │
│ ┌──────────┐ ┌─────────────────┐ ┌─────────┐ ┌────────────┐ │
│ │ Whisper │→ │ Three-Layer │→ │ Qwen3 │→ │ Kokoro │ │
│ │ STT ONNX │ │ Memory: │ │ GGUF │ │ TTS ONNX │ │
│ │ │ │ • L1 history │ │ │ │ │ │
│ │ │ │ • L2 KV facts │ │ │ │ │ │
│ │ │ │ • L3 RAG (MiniLM│ │ │ │ │ │
│ │ │ │ over knowledge│ │ │ │ │ │
│ │ │ │ .db) │ │ │ │ │ │
│ └──────────┘ └─────────────────┘ └─────────┘ └────────────┘ │
│ │ │ │
│ └──────────────── String only ──────────────────┘ │
│ │
│ ┌───────────────────────────────┐ ┌─────────────────────────┐ │
│ │ ONNX Runtime │ │ llama.cpp (LLMUnity) │ │
│ │ (asus4/onnxruntime-unity) │ │ (undreamai/LLMUnity) │ │
│ │ STT • Embeddings • TTS │ │ LLM only │ │
│ │ DirectML│CoreML│NNAPI│CUDA │ │ Metal│Vulkan│NEON│CPU │ │
│ └───────────────────────────────┘ └─────────────────────────┘ │
│ ── no shared memory · no shared GPU context · strings only ── │
└──────────────────────────────────────────────────────────────────┘
Sauti is built on a session-based workflow with append-only handover logs. See contributing and memory/handover_session.md for the audit trail.
Apache 2.0. See LICENSE (TBD — Apache-2.0 confirmed per memory/project_context.md § 1).
Each bundled AI model has its own license, recorded per-entry in ai-models/<stage>/manifest.json:
| Model | License |
|---|---|
| Whisper Small / Tiny INT8 | MIT |
| Qwen3-1.7B Q5_K_M | Apache-2.0 |
| all-MiniLM-L6-v2 INT8 | Apache-2.0 |
| Kokoro 82M INT8 + voices | Apache-2.0 |
- Whisper by OpenAI · ONNX export by onnx-community
- Qwen3 by Alibaba · GGUF quant by unsloth
- all-MiniLM-L6-v2 by sentence-transformers · INT8 by Xenova
- Kokoro 82M · ONNX by onnx-community
- whisper.unity by Macoron
- LLMUnity by undreamai
- onnxruntime-unity by asus4