A fully-offline, movie-style voice assistant for Windows — English + Hindi, 24 skills, biometric security, hand & eye control, and Doctor-Strange-style hand magic.
Speech recognition runs locally (Vosk). It talks back, shows a live arc-reactor HUD, recognizes your face, obeys only your voice, and — with an optional AI brain — answers anything.
| 🗣️ Fully offline speech | Vosk-powered recognition — no cloud needed for the core experience |
| 🌐 Bilingual | Understands & replies in English and Hindi (native Hinglish, not translation) |
| 🧠 Optional AI brain | Plug in a Claude API key for conversational answers with memory |
| 🎬 Holographic HUD | Fullscreen sci-fi dashboard — 3D reactor, globe, live waveform, system gauges |
| 👤 Face recognition | Greets you by name, self-learning, fully offline (OpenCV LBPH) |
| 🔒 Voice lock | Obeys only your enrolled voiceprint — ignores everyone else |
| ✋ Hand & eye control | Your hand becomes the mouse; or drive the cursor with your head/eyes |
| 🪄 Hand magic | Doctor-Strange / Iron-Man gesture spells — portals, lightning, energy balls |
| 🛠️ 24 voice skills | Apps, web, media, timers, maths, notes, system reports & more |
| 📦 Standalone build | One click → a portable .exe you can share (no Python needed) |
# 1. Install dependencies
py -3.13 -m venv .venv
.venv\Scripts\python.exe -m pip install -r requirements.txt
# 2. Download speech models (see table below)
# 3. Run
run.batWhen the reactor pulses blue, it's listening:
🎙️ "Jarvis, open chrome" · "Jarvis, system status" · "Jarvis, magic mode" · "Jarvis, speak Hindi"
| Language | Download | Extract into |
|---|---|---|
| English | vosk-model-small-en-in-0.4 (~36 MB) | model\ |
| Hindi | vosk-model-small-hi-0.22 (~42 MB) | model-hi\ |
Fuzzy phrase matching — natural sentences work, you don't need exact words.
| Say "Jarvis, ..." | Does |
|---|---|
open chrome / vs code / <any installed app> |
Opens it (auto-discovers Start Menu apps) |
search for <x> / youtube <x> / wikipedia <x> |
Web search |
volume up / mute, play / pause / next song |
Volume & media |
minimize / close window, switch app, show desktop |
Window control |
set a timer for 5 minutes / remind me in 10 minutes |
Timers & reminders (survive restart) |
what is 15 percent of 2000 |
Maths & percentages |
take a note <x> / read my notes |
Notes |
what's the capital of japan / any question |
Looks it up, reads the answer aloud |
system status / battery / diagnostics |
Spoken CPU / RAM / battery report |
magic mode / iron man / doctor strange |
Fullscreen cinematic hand-magic ✨ |
hand tracking / virtual mouse |
Hand becomes the mouse |
eye control |
Hands-free head/eye cursor |
speak hindi / speak english |
Switch language |
take a screenshot / type <words> |
Screenshot / dictation |
lock the computer / shutdown / restart |
System (confirms first) |
…and more — see the full list in the sections below.
All launched by voice:
- "magic mode" → a fullscreen, cinematic 3D magic window. Starts LOCKED — perform your secret key spell (default: fist → open palm → horns 🤘) to unlock. Then cast spells with real 3D depth: portals, tilted shields, lightning between your hands, a 3D energy ball you grow and throw, a force-field dome, and time-freeze. Only your enrolled voice can launch it.
- "hand tracking" → your hand becomes the mouse (in-air and on-surface).
- "eye control" → hands-free cursor; move your head, blink to click.
Say "Jarvis, show yourself" — the whole screen becomes a sci-fi dashboard: a 3D arc reactor with live voice spectrum, a live clock, real CPU/RAM/battery gauges, a rotating 3D wireframe globe, your live voice waveform, and a scrolling comms log. Esc/F to return.
- Voice lock — run
enroll.batonce; JARVIS builds a 256-number voiceprint and from then on obeys only your voice (in testing: owner matched 0.99, others rejected at 0.45). - Face recognition — run
enroll_face.bat; greets you by name, self-learns over time, fully offline (OpenCV LBPH). (Your enrolled face/voice data stays local and is git-ignored.) - Hidden settings — the API-key panel stays invisible until you speak a secret word ("override protocol").
Say "speak Hindi" → JARVIS switches to natural Hinglish ("Mera naam JARVIS hai, sir", "Abhi 3 baj rahe hain") — replies, greetings, and hourly chimes all switch. It also understands Hindi commands ("awaaz badhao", "chrome chalu karo", "agla gaana") via a Hindi speech model running alongside the English one.
Say "override protocol" to unlock Settings, paste a Claude API key, and JARVIS answers any question conversationally — with short-term memory for follow-ups ("capital of Japan?" → "and its population?"). Without a key, offline DuckDuckGo/Wikipedia answers work fine. (The key is stored locally in config.json, which is git-ignored — copy config.json.example to start.)
jarvis.py → main app + window/UI loop
core.py → speech recognition, TTS, command routing
skills.py → all 24 voice skills (extend here)
brain.py → optional Claude AI integration + conversation memory
answers.py → offline web answers (DuckDuckGo / Wikipedia)
hud.py → holographic fullscreen dashboard
boot.py → cinematic boot sequence
faceauth.py → offline face recognition (OpenCV LBPH)
voiceauth.py → voiceprint speaker verification
proactive.py → self-initiated greetings / chimes
reminders.py → persistent timers & reminders
appindex.py → Start-Menu app discovery
Add your own command: open skills.py, copy a Skill class, set its triggers, implement run(), register it in build_skills(). App aliases live in appindex.py.
Double-click build.bat → dist\JARVIS\JARVIS.exe, a portable folder you can zip and share (no Python needed). See DISTRIBUTION.md.