An open-source Unity framework for building real-time AI-powered virtual characters with local LLM chat, speech recognition, expressive facial animation, and procedural lip sync.
Designed for developers creating:
- AI NPCs
- VR/AR assistants
- Virtual companions
- Interactive training simulations
- Metaverse avatars
- Conversational agents
- Digital humans
- Fully local AI inference support through
LLMAgent - Offline-friendly architecture
- Real-time streaming responses
- Extensible prompt and memory systems
- Microphone voice capture
- Local speech-to-text transcription using Whisper Tiny
- Push-to-talk or toggle microphone modes
- Real-time conversational interaction
- Local neural speech synthesis using Sentis TTS
- Audio generation with phoneme output
- Queue-based playback system
- Interruptible speech support
- Expression state machine:
- Idle
- Listening
- Thinking
- Speaking
- Emotion-driven character feedback
- Animator-friendly architecture
- Phoneme-to-viseme mapping
- Procedural mouth animation
- Compatible with blendshapes and rigged characters
- Optional integration layer
- Built with Unity UI Toolkit
- Chat window toggle support
- Send button + enter key submission
- Voice and text hybrid interaction
This project can be used for:
- AI NPCs in games
- VR training simulations
- Interactive museum guides
- Metaverse social avatars
- AI customer support agents
- Educational assistants
- AI-powered storytellers
- Companion characters
- Research projects involving conversational AI
User Input (Voice/Text)
โ
Speech-To-Text (Optional)
โ
LLMAgent (Local AI)
โ
AI Response Text
โ
SentisTTS
โ
Audio + Phonemes
โ
SpeechController
โ
Lip Sync + Facial Expressions
Assets/
โ
โโโ UI/
โ โโโ ChatUI.uxml
โ
โโโ Scripts/
โ โโโ PCVoiceAIController.cs
โ โโโ SpeechController.cs
โ โโโ LipSyncCopyVisemes.cs
โ โโโ FacialExpressionController.cs
โ โโโ ...
โ
โโโ Models/
โ โโโ STT/
โ โโโ TTS/
โ
โโโ Scenes/
Main interaction controller responsible for:
- UI binding
- Chat input handling
- Microphone control
- AI request/response flow
- State management
Handles:
- Audio playback
- Speech queueing
- Playback interruption
- Lip sync forwarding
- Voice state events
Responsible for:
- Phoneme parsing
- Viseme mapping
- Blendshape driving
- Mouth animation timing
Optional if you only require audio playback.
Controls procedural facial states:
- Idle
- Listening
- Thinking
- Speaking
Can be extended for:
- Emotions
- Mood systems
- Eye movement
- Blinking
- AI-driven expressions
- Unity
6.x/2024+ - Windows PC (recommended for local inference)
- Microphone input device
- Unity Sentis-compatible setup
Official Unity Sentis Whisper model:
https://huggingface.co/unity/inference-engine-whisper-tiny
Official Unity Sentis TTS model:
https://huggingface.co/unity/inference-engine-jets-text-to-speech
git clone https://github.com/yourusername/LLM-Character.gitOpen the project using:
- Unity 6.x
- Unity 2024+
Download:
- Whisper Tiny
- Jets TTS
Place them inside:
Assets/Models/
Open the demo scene and configure the following references inside PCVoiceAIController:
| Field | Description |
|---|---|
stt |
Speech-to-text component |
tts |
Text-to-speech component |
llmAgent |
Local LLM agent |
speechController |
Audio playback manager |
facialController |
Facial animation controller |
micIcon |
Microphone icon |
stopIcon |
Stop button icon |
sendIcon |
Send icon |
msgIcon |
Chat icon |
Assign:
- Whisper
.onnxmodel โSentisSTT - Jets TTS
.onnxmodel โSentisTTS
inside the Unity Inspector.
- Type a message
- Press Enter or click Send
- Receive AI-generated responses
- Click the microphone button
- Speak naturally
- The AI transcribes and responds automatically
- AI responses are synthesized into speech
- Playback can be interrupted using the stop button
Character states update automatically:
- Listening while recording
- Thinking during inference
- Speaking during TTS playback
This framework is intentionally modular.
You can easily add:
- VR support
- Multiplayer avatars
- Emotion engines
- Memory systems
- Streaming LLM APIs
- OpenAI integration
- Local GGUF models
- Animation rigs
- Eye tracking
- Gesture systems
This project aims to provide a clean foundation for:
- AI avatar research
- Indie game development
- VR/AR interaction systems
- Offline conversational agents
- Digital human experimentation
Contributions are welcome.
Contributions, pull requests, and feature suggestions are encouraged.
Possible contribution areas:
- Better lip sync systems
- Emotion AI
- VR integrations
- Optimization
- Cross-platform support
- Mobile deployment
- Improved UI/UX
- Additional AI backends
- VR integration support
- Full-body animation support
- Streaming LLM responses
- Memory/context persistence
- Emotion-aware dialogue
- Multi-character conversations
- WebGPU inference support
- Multiplayer avatar synchronization
- Animation graph integration
- Mobile optimization
This project is currently in active development.
Expect:
- Frequent updates
- API changes
- Experimental systems
- Community-driven improvements
MIT License
Copyright (c) 2026
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software.
Built with:
- Unity
- Unity Sentis
- Hugging Face
Special thanks to the open-source AI and Unity communities.