envoy

A terminal emulator with a built-in voice & text AI agent.
Runs as a web app or a native desktop app — same codebase, same features.

Features

Full terminal emulation — xterm.js with 256-color support, scrollback, and bracketed paste
Multi-tab sessions — open, close, and switch between independent terminal tabs
Voice agent — hold a button and talk; the agent sees your terminal, runs commands, and speaks back
Text agent — type a message instead; same capabilities, no microphone needed
Dictation — voice-to-text transcription pasted directly into the terminal
Drag-and-drop file upload — drop a file onto the terminal and its path is inserted at the cursor
Aliases — map URL paths to shell commands via aliases.conf
PWA support — installable from the browser with offline caching
Dark theme — designed for extended terminal use

Architecture

Browser / pywebview
  ├── xterm.js          terminal emulation
  ├── app.js            tabs, voice, drag-drop, settings
  └── transport layer
        ├── PywebviewTransport   (desktop: JS ↔ Python bridge)
        └── BrowserTransport     (web: HTTP JSON to server.py)

Python backend
  ├── app_core.py       PTY session management, file uploads
  ├── voice_chat.py     Gemini agent with terminal tools
  ├── speech.py         TTS synthesis with Inworld-first, Google fallback
  ├── agent.py          Gemini tool-calling runtime
  └── env_config.py     API key management

Both modes share the same runtime (app_core.py) and frontend (app.js). The only difference is the transport layer.

Quickstart

Web mode

uv venv .venv
uv pip install --python .venv/bin/python -r requirements-web.txt
python server.py

Open http://localhost:8080/envoy/

Desktop mode

uv venv .venv
uv pip install --python .venv/bin/python -r requirements-desktop.txt
./envoy-desktop

An optional launch target can open a specific alias or path:

./envoy-desktop /python        # resolved via aliases.conf
./envoy-desktop /projects/foo  # resolved relative to $HOME

API Keys

Voice and agent features require API keys. Set them as environment variables, in a .env file, or through the in-app settings dialog.

Key	Required for
`GOOGLE_API_KEY`	Voice & text agent (Gemini)
`GROQ_API_KEY`	Dictation (Whisper)
`INWORLD_API_KEY`	Spoken agent responses (TTS, preferred when set)

Spoken responses use Inworld TTS when INWORLD_API_KEY is set; otherwise they fall back to Google TTS via GOOGLE_API_KEY.

The terminal itself works without any keys configured.

Keyboard Shortcuts

Shortcut	Action
`Ctrl+T`	New tab
`Ctrl+W`	Close tab
`Ctrl+Tab`	Next tab
`Ctrl+\`	Toggle toolbar
`Ctrl+Shift+Space`	Dictation
`Ctrl+Shift+A`	Voice agent
`Ctrl+Shift+E`	Text agent / paste editor
`Ctrl+Shift+0` / `+` / `-`	Reset / increase / decrease font size

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
static		static
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
aliases.conf.example		aliases.conf.example
app_core.py		app_core.py
desktop.py		desktop.py
env_config.py		env_config.py
envoy-desktop		envoy-desktop
envoy-web		envoy-web
requirements-desktop.txt		requirements-desktop.txt
requirements-web.txt		requirements-web.txt
requirements.txt		requirements.txt
server.py		server.py
speech.py		speech.py
terminal_session.py		terminal_session.py
voice_chat.py		voice_chat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

envoy

Features

Architecture

Quickstart

Web mode

Desktop mode

API Keys

Keyboard Shortcuts

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

envoy

Features

Architecture

Quickstart

Web mode

Desktop mode

API Keys

Keyboard Shortcuts

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages