Skip to content

SeedeXR/sauti-unity-plugin

Repository files navigation

Sauti Unity Plugin

Native Unity voice-AI plugin. Fully offline. English. Privacy-first. Mic → Whisper → memory + RAG → Qwen3 GGUF → Kokoro → audio. One package. Zero cloud.

Unity 6+ LTS Apache 2.0 Offline-first Docs


What it is

Sauti ("voice" in Swahili) lets a Unity game or VR experience hold a real spoken conversation with an AI character — entirely on the player's device, with no API keys, no cloud bill, and no audio ever leaving the headset.

  • 🎤 Speech in. Whisper Small / Tiny ONNX, English, ~300 ms TTFA on desktop CPU.
  • 🧠 Three-layer memory. Conversation history + temporary KV facts + RAG over a knowledge base you author yourself.
  • 🤖 LLM brain. Qwen3-1.7B GGUF via llama.cpp on flagship; smaller variants on Quest.
  • 🔊 Voice out. Kokoro 82M ONNX with 11 voices.
  • 🎮 Drop-in for Unity 6+. Three UPM packages, one Editor menu, done.
  • 🖱️ Two parallel APIs (v1.3+). Pure C# for programmers (new KokoroTtsRunner(...)), drag-and-drop SautiSpeaker/SautiKnowledgeBase/SautiAgent MonoBehaviours + Voice Profile/Knowledge Config/LLM Config ScriptableObjects for designers. Same runtime — choose either.
🎤 Mic  →  Whisper ONNX  →  text  →  Memory (history + RAG + temp KV)  →  Qwen3 GGUF  →  tokens  →  Kokoro ONNX  →  🔊 Audio
            STT                          Three-layer enriched prompt           LLM                           TTS

Two strictly-partitioned runtimes (ONNX Runtime + llama.cpp) — they share no memory and no GPU context, only C# strings. See memory/voice_ai_architecture.md for the full spec.


Quick install

You have two ways to consume Sauti.

A. Clone the full repo (recommended for first explore)

git clone https://github.com/SeedeXR/sauti-unity-plugin.git
cd sauti-unity-plugin
# Then: Unity Hub → Add project from disk → select this folder

B. Install as a UPM package (recommended for downstream projects)

One command — tools/setup-sauti.sh (macOS/Linux/WSL) — handles all three install steps + the model downloads:

# From a checked-out copy of this repo:
./tools/setup-sauti.sh --project-path /path/to/YourUnityProject

# Or, if you only have the script and want a fresh install:
curl -fsSL https://raw.githubusercontent.com/SeedeXR/sauti-unity-plugin/main/tools/setup-sauti.sh -o setup-sauti.sh
chmod +x setup-sauti.sh
./setup-sauti.sh --project-path /path/to/YourUnityProject

What it does, in order:

  1. Writes the bootstrap to Packages/manifest.json — Sauti dep (via Git URL by default) + npmjs scoped registry + com.github.asus4.onnxruntime peer. Idempotent — re-runs are no-ops if the entries are already present.
  2. Invokes Unity in batchmode to run Sauti.Editor.Setup.SautiSetupWizard.FixAllHeadless — adds the remaining peer deps (LLMUnity, whisper.unity, Collections, Mathematics) and writes the scripting-define symbols across Standalone/Android/iOS/WebGL.
  3. Downloads the AI models from Hugging Face into <project>/Assets/StreamingAssets/VoiceAI/ with SHA-256 verification. Default profile (--models essential, ~1.4 GB): Kokoro 82M + 1 voice + MiniLM + Whisper Tiny + Qwen3-1.7B. --models all adds the other 10 voices + Whisper Small. --models none skips downloads.

Common variants:

# Local tarball install (no internet for Sauti's source — still needs HF for models):
./tools/setup-sauti.sh --project-path <proj> \
    --source tarball --tarball dist/com.sauti.voice-ai-1.3.2.tgz

# Just verify the model files match their SHA-256s, don't redownload anything:
./tools/setup-sauti.sh --project-path <proj> --no-bootstrap --no-wizard --verify

# Bootstrap only, defer the rest:
./tools/setup-sauti.sh --project-path <proj> --no-wizard --models none

Run ./tools/setup-sauti.sh --help for the full option list.

Or do it manually — three lines + one click

Step 1 — Paste this bootstrap into your project's Packages/manifest.json:

{
  "scopedRegistries": [
    {
      "name": "npmjs",
      "url": "https://registry.npmjs.com",
      "scopes": ["com.github.asus4"]
    }
  ],
  "dependencies": {
    "com.sauti.voice-ai":           "https://github.com/SeedeXR/sauti-unity-plugin.git?path=packaging/com.sauti.voice-ai",
    "com.github.asus4.onnxruntime": "0.4.7"
  }
}

That's the minimum bootstrap — Sauti itself + the one peer dep + the scoped registry Unity needs to find it.

Step 2 — Open the project in Unity. First import takes 1–3 min (Git clone + UPM resolution). The Sauti Setup Wizard auto-opens; if it doesn't, run Sauti → Verify Setup from the menu bar.

Step 3 — Click "Fix everything I can" in the wizard. It writes the remaining peer deps (LLMUnity, whisper.unity, Unity Collections, Unity Mathematics) into your manifest.json and sets the two scripting-define symbols (SAUTI_LLMUNITY_AVAILABLE, SAUTI_WHISPER_UNITY_AVAILABLE) across Standalone/Android/iOS/WebGL. Unity re-resolves packages once.

Step 4 — Download the ~1.6 GB of AI models (the only thing Sauti can't auto-fetch — Hugging Face license walls). Clone the source repo and copy Assets/StreamingAssets/VoiceAI/, or wait for the post-v1.3 model downloader.

Headless / CI install

Same logic as the GUI wizard, no dialogs:

unity -batchmode -quit -projectPath <project> \
  -executeMethod Sauti.Editor.Setup.SautiSetupWizard.FixAllHeadless

Alternative install methods

  • Tarball file: download com.sauti.voice-ai-<version>.tgz from Releases, put it under Packages/tarballs/, replace the Git URL in Step 1 with "file:tarballs/com.sauti.voice-ai-1.3.2.tgz".
  • Package Manager GUI: Window → Package Manager → ➕ → Install package from tarball → select the .tgz. You still need the scoped registry + ONNX line from Step 1.
  • Build the tarball yourself: tools/package-sauti.sh --skip-tests from a checked-out source repo → dist/com.sauti.voice-ai-<version>.tgz.

Quickstart (5 min)

# 1. Open project in Unity (auto-imports ~1.6 GiB of AI models from ai-models/)
# 2. Build the RAG knowledge base:
#    Menu: Sauti → Build Knowledge Base
# 3. Open one of the six experiment scenes:
#    experiments/01-tts-hello/HelloScene.unity  (smallest — just text-to-speech)
#    experiments/05-full-voice-loop/VoiceLoopScene.unity  (the integrated demo)
# 4. Press Play.

See the Quickstart guide for the full walkthrough.


What you get

For game designers

No-code path: drop in a JSON template, set a voice id, ship.

  • NPC dialogue — single character, configurable persona / voice / knowledge tag
  • Quest narrator — branching world narrator with chapter cues
  • Voice command routing — speech → game action mapping
  • VR companion — location-aware persistent companion (Quest)
  • Knowledge feed — bulk ingestion of game lore into the RAG database
  • Structured output — let the LLM trigger deterministic game mechanics

→ Designer guide

For Unity developers

Code-first path: composable subsystems with clean C# interfaces.

  • Sauti.Memory.TemporaryMemory — session-scoped KV facts
  • Sauti.Memory.SautiRag — injectable RAG retrieval wrapper
  • Sauti.Editor.Rag.KnowledgeBaseChunker — paragraph-boundary chunker
  • Sauti.Editor.Rag.MiniLmRagEmbedder — 384-dim sentence-transformer embedder
  • Sauti.Tts.KokoroTtsRunner — Kokoro 82M TTS with 11 built-in voices
  • Sauti.Editor.Rag.RagDatabaseBuilder[MenuItem("Sauti/Build Knowledge Base")]

All subsystems are dependency-injectable, fence upstream packages behind preprocessor symbols, and have 33+ NUnit EditMode tests.

→ Developer guide

Six runnable experiments

Each is a Unity scene with a single MonoBehaviour orchestrator + a README explaining what it proves.

# Experiment Demonstrates
1 01-tts-hello Type → Kokoro → audio
2 02-stt-loopback Push-to-talk → Whisper → text
3 03-llm-chat Text → Qwen3 → streamed tokens + sentence events
4 04-rag-grounding A/B toggle proving RAG changes the LLM's answer
5 05-full-voice-loop The integrated headline demo
6 06-vr-quest-npc Spatialised VR NPC on Quest with controller trigger

→ Experiments overview


Privacy & offline-first

  • No internet connection required or used at runtime.
  • No telemetry, no analytics, no model downloads after install.
  • All four models live on disk in Assets/StreamingAssets/VoiceAI/ and load from there.
  • User audio and conversation history stay on the device. Per-session memory clears on app exit.
  • Android caveat: models copy from the compressed .jar to Application.persistentDataPath on first launch.

Platform support

Platform STT LLM Embeddings TTS
Windows / macOS / Linux Whisper Small Qwen3-1.7B Q5_K_M MiniLM Kokoro
iOS / Android (flagship) Whisper Small Qwen3-1.7B Q5_K_M MiniLM Kokoro
Meta Quest 2 / 3 Whisper Tiny Qwen3-1.7B Q5_K_M* MiniLM Kokoro
Android (low-end) Whisper Tiny Qwen3-1.7B Q5_K_M* MiniLM Kokoro

* v1.2 Quest path uses Qwen3-1.7B (1.26 GB; tight on Quest 3's 8 GB RAM but functional). Gemma3-1B Q4_K_M was the original Quest pick but is deferred to a future release pending Gemma TOS acceptance. See per-platform notes.


Project status

Engineered + tested. All four pipeline stages compile cleanly in Unity 6.4. 38/38 EditMode tests pass. Real knowledge.db builds in 226 ms from the Frostmere sample knowledge base. Scene assembly + hardware validation on Quest are the remaining human-side tasks.

See SHIP_READINESS.md for the step-by-step go-live guide.

Surface State
Compile ✓ 0 errors, 0 warnings
EditMode tests (Sauti) ✓ 50 / 50 pass — Unit 35, Integration 6, Regression 9
Upstream tests (whisper.unity, onnxruntime-unity) ✓ 3 / 3
Knowledge.db build ✓ End-to-end against real MiniLM weights
Six experiment scaffolds ✓ Code + READMEs + scene-creation guides
UPM tarball build (tools/package-sauti.sh) ✓ End-to-end, 88 KB tarball, SHA-256 emitted
GitHub Actions: docs + package ✓ Wired to main push + v* tag
Six .unity scene files ⏳ Manual creation (Editor GUI)
Quest hardware validation ⏳ Needs physical device

Documentation

Topic Where
Canonical pipeline spec memory/voice_ai_architecture.md
Ship readiness checklist SHIP_READINESS.md
Full docs site (mkdocs) https://SeedeXR.github.io/sauti-unity-plugin
Session log (audit trail) memory/handover_session.md
Memory + agent files memory/ (15 docs)
Per-experiment guides experiments/*/README.md

Repository map

sauti-unity-plugin/
├── Assets/                              Unity asset tree (repo root is the Unity project)
│   ├── Sauti/Runtime/                   C# memory + TTS runner subsystems
│   ├── Sauti/Editor/                    MiniLM embedder + RAG menu builder
│   ├── Sauti/Tests/Editor/              50 NUnit EditMode tests (unit + integration + regression)
│   └── StreamingAssets/VoiceAI/         1.6 GiB of AI models (runtime location)
├── Packages/manifest.json               6 UPM dependencies (auto-fetched)
├── ProjectSettings/                     Unity project config
├── packaging/com.sauti.voice-ai/        UPM package source (Runtime/, Editor/, Tests/, Samples~/, Documentation~/)
├── tools/                               Build scripts (package-sauti.sh)
├── ai-models/                           Source-of-truth model checkout
├── docs/                                MkDocs source tree (this docs site)
├── experiments/                         Six runnable demos
├── knowledge-base/                      Plain-text source for the RAG database
├── memory/                              Append-only doc + session log
├── templates/                           JSON narrative templates
├── instructions/                        Engineering operations guide
├── .github/workflows/                   docs.yml + package.yml
├── mkdocs.yml                           Docs site config
├── README.md                            This file
└── SHIP_READINESS.md                    Step-by-step go-live guide

Architecture at a glance

┌──────────────────────────────────────────────────────────────────┐
│                       Sauti voice-AI pipeline                     │
│                                                                   │
│  ┌──────────┐  ┌─────────────────┐  ┌─────────┐  ┌────────────┐  │
│  │ Whisper  │→ │ Three-Layer     │→ │ Qwen3   │→ │ Kokoro     │  │
│  │ STT ONNX │  │ Memory:         │  │ GGUF    │  │ TTS ONNX   │  │
│  │          │  │ • L1 history    │  │         │  │            │  │
│  │          │  │ • L2 KV facts   │  │         │  │            │  │
│  │          │  │ • L3 RAG (MiniLM│  │         │  │            │  │
│  │          │  │   over knowledge│  │         │  │            │  │
│  │          │  │   .db)          │  │         │  │            │  │
│  └──────────┘  └─────────────────┘  └─────────┘  └────────────┘  │
│       │                                                  │        │
│       └────────────────  String only  ──────────────────┘        │
│                                                                   │
│  ┌───────────────────────────────┐ ┌─────────────────────────┐  │
│  │ ONNX Runtime                  │ │ llama.cpp (LLMUnity)    │  │
│  │ (asus4/onnxruntime-unity)     │ │ (undreamai/LLMUnity)    │  │
│  │ STT • Embeddings • TTS        │ │ LLM only                │  │
│  │ DirectML│CoreML│NNAPI│CUDA    │ │ Metal│Vulkan│NEON│CPU   │  │
│  └───────────────────────────────┘ └─────────────────────────┘  │
│  ── no shared memory · no shared GPU context · strings only ──   │
└──────────────────────────────────────────────────────────────────┘

Contributing

Sauti is built on a session-based workflow with append-only handover logs. See contributing and memory/handover_session.md for the audit trail.


License

Apache 2.0. See LICENSE (TBD — Apache-2.0 confirmed per memory/project_context.md § 1).

Each bundled AI model has its own license, recorded per-entry in ai-models/<stage>/manifest.json:

Model License
Whisper Small / Tiny INT8 MIT
Qwen3-1.7B Q5_K_M Apache-2.0
all-MiniLM-L6-v2 INT8 Apache-2.0
Kokoro 82M INT8 + voices Apache-2.0

Credits

About

A Native Unity Plugin For AI Voice & Recorded Voice Narration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors