Autonomous coding agent with deep code understanding
Terminal-native. Desktop-ready. Knows your codebase like you do.
Theo Code is an autonomous coding agent that reads, plans, edits, and verifies code changes inside large repositories. It packages four things into one workspace:
- Code intelligence — Tree-Sitter parser (14 languages) with
agentic-search retrieval: the agent reaches for code via
grep,codesearch, andglobtools. - Agent runtime — state machine (Plan → Act → Observe → Reflect), sub-agent fan-out, budget enforcer, sandboxed tool execution.
- Provider abstraction — 26 LLM provider specs (Anthropic, OpenAI, xAI, Mistral, Groq, Cohere, Vertex, Bedrock, Ollama, vLLM, …) sharing one streaming/retry/converter pipeline.
- Surfaces —
theoCLI (14 subcommands), Tauri desktop, Vite UI, and a Python benchmark harness inapps/theo-benchmark/.
Every claim on this page is a gate you can run yourself.
git clone https://github.com/usetheodev/theo-code
cd theo-code
cargo build --workspace --exclude theo-code-desktop --release
./target/release/theo --helpSystem requirements: Rust 1.83+ (2024 edition), pkg-config. The
desktop app additionally needs libgtk-3-dev and Tauri prerequisites
on Linux; everything else builds without system deps.
# Authenticate with a provider (OAuth device flow or API key)
theo login # interactive picker
theo login --provider anthropic # pin a provider
# Single-shot task
theo "find every place that constructs a session token"
# Autonomous loop (Plan → Act → Observe → Reflect until done)
theo pilot "remove the panic on stale tool name and add a regression test"
# Interactive TUI
theotheo memory lint # memory subsystem hygiene
theo dashboard # observability HTTP server
theo subagent ls # persisted sub-agent runs
theo checkpoints ls # workdir shadow-git checkpoints
theo skill ls # installed skills
theo mcp list # cached MCP servers
theo trajectory export-rlhf --out f.jsonl # export rated runstheo --help lists 14 top-level subcommands, pinned by the smoke
test apps/theo-cli/tests/cli_help_smoke.rs::test_top_level_subcommand_count_is_fourteen:
agent Interactive REPL or single-shot task execution
pilot Autonomous loop until promise is fulfilled
memory Memory subsystem utilities (lint, inspect)
login Authenticate with a provider (OAuth device flow or API key)
logout Remove saved credentials
dashboard Start the observability dashboard HTTP server
subagent Manage persisted sub-agent runs
checkpoints Manage workdir checkpoints (shadow git repos)
agents Manage project agents approval
mcp Manage MCP discovery cache
skill Skill catalog: list / view / delete user-installed skills
trajectory Trajectory export tooling
evals Context Engineering evals (CDLC L2-L4)
help Print help for any subcommand
Cargo workspace with 14 lib crates + 3 binary apps under one Rust 2024 edition tree.
crates/
├── theo-domain pure types, state machines, zero deps
├── theo-engine-parser Tree-Sitter extraction (14 langs)
├── theo-engine-retrieval search + context assembly
├── theo-governance policy engine, sandbox cascade
├── theo-isolation bwrap / landlock / noop fallback
├── theo-infra-llm 26 provider specs, streaming, retry
├── theo-infra-auth OAuth PKCE, device flow, env keys
├── theo-infra-mcp Model Context Protocol client
├── theo-infra-memory memory persistence (ADR-008 pending)
├── theo-test-memory-fixtures fixtures for memory tests
├── theo-tooling 49 registered tools + registry
├── theo-agent-runtime agent loop, sub-agents, observability
├── theo-api-contracts serializable DTOs for IPC
└── theo-application use-cases, facade, CLI runtime re-exports
apps/
├── theo-cli (pkg `theo`) the CLI binary
├── theo-marklive markdown live renderer
├── theo-desktop Tauri shell (excluded from cargo test — GTK)
├── theo-benchmark Python harness (outside Rust workspace)
└── theo-ui Vite/TS UI (outside Rust workspace)
theo-domain → (nothing)
theo-engine-parser → theo-domain
theo-engine-retrieval → theo-domain, theo-engine-parser
theo-governance → theo-domain
theo-infra-* → theo-domain
theo-tooling → theo-domain
theo-agent-runtime → theo-domain, theo-governance,
theo-infra-llm, theo-infra-auth, theo-tooling,
theo-isolation, theo-infra-mcp
theo-api-contracts → theo-domain
theo-application → all crates above
apps/* → theo-application, theo-api-contracts
scripts/check-arch-contract.sh enforces this on every PR.
Every provider lives in crates/theo-infra-llm/src/provider/catalog/ as a
ProviderSpec const. Adding one means dropping a new const and wiring
its auth strategy.
amazon-bedrock azure azure-cognitive-services
anthropic cerebras chatgpt-codex
cloudflare-ai-gateway cloudflare-workers-ai cohere
deepinfra github-copilot gitlab
google-vertex google-vertex-anthropic groq
lm-studio mistral ollama
openai openrouter perplexity
sap-ai-core togetherai vercel
vllm xai
OAuth device flow is supported for anthropic and chatgpt-codex. The
rest use API keys (env or config).
C, C++, C#, Go, Java, JavaScript, Kotlin, PHP, Python, Ruby, Rust, Scala, Swift, TypeScript.
49 tools registered in DefaultRegistry (pinned by snapshot test
default_registry_tool_id_snapshot_is_pinned) plus 9 meta-tools
injected by theo-agent-runtime at dispatch time.
Registry tools (49):
| Category | Tool IDs |
|---|---|
| Filesystem | read, write, edit, apply_patch, glob, grep |
| Shell & process | bash, env_info |
| Git | git_status, git_diff, git_log, git_commit |
| HTTP | http_get, http_post, webfetch |
| Cognitive | think, reflect, memory, task_create, task_update |
| Memory | store_memory, recall_memory |
| Planning | plan_create, plan_update_task, plan_advance_phase, plan_log, plan_summary, plan_next_task, plan_replan, plan_failure_status |
| Multimodal | read_image, screenshot |
| Code intelligence | codebase_context, docs_search |
| Test generation | gen_property_test, gen_mutation_test |
| LSP sidecar | lsp_status, lsp_definition, lsp_references, lsp_hover, lsp_rename |
| Browser sidecar | browser_status, browser_open, browser_click, browser_screenshot, browser_type, browser_eval, browser_wait_for_selector, browser_close |
Meta-tools (9, injected by runtime):
| Tool | Purpose |
|---|---|
done |
Signal task completion |
skill |
Invoke auto-discovered skills |
delegate_task_single |
Spawn a sub-agent |
delegate_task_parallel |
Fan-out multiple sub-agents |
delegate_task_legacy |
Legacy delegation format |
batch |
Run up to 25 independent tool calls in parallel |
batch_execute |
Programmatic tool calling |
batch_for_subagent |
Batch variant for sub-agents |
tool_search |
Keyword lookup over deferred tools |
Not registered (code exists, not wired to runtime):
25 additional tool implementations exist in crates/theo-tooling/src/
but are not in DefaultRegistry. These include: 11 DAP/debug tools,
3 wiki tools, codesearch, websearch, computer_action, ls,
lsp (umbrella), multiedit, plan_exit, question, task,
skill (registry version).
| Sidecar | Status | Notes |
|---|---|---|
| LSP | validated | E2E with rust-analyzer; LspSessionManager::from_path() discovers servers. 5 tools registered. |
| Browser | partial | Playwright sidecar at crates/theo-tooling/scripts/playwright_sidecar.js; requires Node + chromium. 8 tools registered. |
| DAP | implemented, not wired | 11 debug_* tools with 140+ unit tests and DapSessionManager (415 LOC). Not registered in DefaultRegistry; no E2E smoke test. |
| Computer Use | implemented, not wired | computer_action tool (384 LOC) + platform driver (503 LOC, xdotool/cliclick). Not registered in DefaultRegistry. |
make audit runs the composite suite. Each technique is independently
runnable:
| Technique | Command | What it enforces |
|---|---|---|
| Architecture contract | make check-arch |
ADR-010 dep direction |
| File / function size | make check-sizes |
800 LOC / file ceiling, allowlist with sunsets |
| Unwrap / expect | make check-unwrap |
No unwrap/expect in production paths |
| Panic / todo | make check-panic |
No panic!()/todo!()/unimplemented!() in production paths |
| Unsafe SAFETY comment | make check-unsafe |
Every unsafe block has // SAFETY: within 8 lines above |
| Inline I/O tests | make check-io-tests |
I/O tests live in tests/, not inline |
| Secrets scan | make check-secrets |
gitleaks (or grep fallback) |
| Composite SOTA DoD | make check-sota-dod |
Every Tier 1 + Tier 2 DoD criterion |
CI workflow .github/workflows/audit.yml runs every gate on every PR.
Pre-existing debt is tracked, not amnestied. Each .claude/rules/*-allowlist.txt
has an entry-per-violation with a date column. check-* scripts fail when
a sunset has elapsed.
cargo test --workspace --exclude theo-code-desktop --lib --tests --no-fail-fastTests that pin invariants of the production surface:
test_top_level_subcommand_count_is_fourteen— pins the 14 subcommand countevery_subcommand_responds_to_help_with_exit_zero— every subcommand responds to--helpbuild_registry— everyDefaultRegistryentry is reachableagent_loop_new— the agent loop builder accepts only valid configs
scripts/check-*.sh run as part of the SOTA DoD composite.
See Quality model.
make check-bench-preflight # validate scenarios + harness
cd apps/theo-benchmark
python run_benchmark.py --suite smokeThe repo carries rule files in .claude/rules/ that block CI when
violated:
architecture.md— crate dep direction (ADR-010), prohibited importstesting.md— TDD (regression test before fix), AAA, deterministic, independentrust-conventions.md—thiserror, nounwrapin prod, newtypesintegration-first.md— features must be wired and tested end-to-enddomain-boundary.md— Memory/Context separation (D1, D4)
Together with .claude/rules/*-allowlist.txt files (each with sunsets),
they form the project's hygiene contract.
- Read
CLAUDE.mdbefore changing anything. - TDD is inquebrável. Bug fixes need a regression test before the fix.
- Update the changelog. Every PR adds an entry under
[Unreleased]inCHANGELOG.mdwith a(#PR)reference. - Don't break the dependency contract.
make check-archmust pass. - Don't widen allowlists without an ADR. Each entry has an ADR pointer and a sunset date.