A terminal emulator with a built-in voice & text AI agent.
Runs as a web app or a native desktop app — same codebase, same features.
- Full terminal emulation — xterm.js with 256-color support, scrollback, and bracketed paste
- Multi-tab sessions — open, close, and switch between independent terminal tabs
- Voice agent — hold a button and talk; the agent sees your terminal, runs commands, and speaks back
- Text agent — type a message instead; same capabilities, no microphone needed
- Dictation — voice-to-text transcription pasted directly into the terminal
- Drag-and-drop file upload — drop a file onto the terminal and its path is inserted at the cursor
- Aliases — map URL paths to shell commands via
aliases.conf - PWA support — installable from the browser with offline caching
- Dark theme — designed for extended terminal use
Browser / pywebview
├── xterm.js terminal emulation
├── app.js tabs, voice, drag-drop, settings
└── transport layer
├── PywebviewTransport (desktop: JS ↔ Python bridge)
└── BrowserTransport (web: HTTP JSON to server.py)
Python backend
├── app_core.py PTY session management, file uploads
├── voice_chat.py Gemini agent with terminal tools
├── speech.py TTS synthesis with Inworld-first, Google fallback
├── agent.py Gemini tool-calling runtime
└── env_config.py API key management
Both modes share the same runtime (app_core.py) and frontend (app.js). The only difference is the transport layer.
uv venv .venv
uv pip install --python .venv/bin/python -r requirements-web.txt
python server.pyOpen http://localhost:8080/envoy/
uv venv .venv
uv pip install --python .venv/bin/python -r requirements-desktop.txt
./envoy-desktopAn optional launch target can open a specific alias or path:
./envoy-desktop /python # resolved via aliases.conf
./envoy-desktop /projects/foo # resolved relative to $HOMEVoice and agent features require API keys. Set them as environment variables, in a .env file, or through the in-app settings dialog.
| Key | Required for |
|---|---|
GOOGLE_API_KEY |
Voice & text agent (Gemini) |
GROQ_API_KEY |
Dictation (Whisper) |
INWORLD_API_KEY |
Spoken agent responses (TTS, preferred when set) |
Spoken responses use Inworld TTS when INWORLD_API_KEY is set; otherwise they fall back to Google TTS via GOOGLE_API_KEY.
The terminal itself works without any keys configured.
| Shortcut | Action |
|---|---|
Ctrl+T |
New tab |
Ctrl+W |
Close tab |
Ctrl+Tab |
Next tab |
Ctrl+\ |
Toggle toolbar |
Ctrl+Shift+Space |
Dictation |
Ctrl+Shift+A |
Voice agent |
Ctrl+Shift+E |
Text agent / paste editor |
Ctrl+Shift+0 / + / - |
Reset / increase / decrease font size |
MIT
