A self-hosted web app that turns any locally-running Ollama model (Gemma, Llama, Qwen, DeepSeek, Mistral — anything with function-calling) into a Claude-Code-style assistant: chat with conversation history, run shell commands, edit files, drive your desktop, search the web, query APIs — with a per-conversation permission gate on every tool call.
┌────────────────┐ ┌─────────────────┐ ┌──────────────┐
│ React + Vite │ SSE │ FastAPI │ chat │ Ollama │
│ + shadcn/ui │◄────────►│ agent loop │◄────────►│ any model │
│ (port 5173) │ /api/* │ (port 8000) │ :11434 │ │
└────────────────┘ └─────────────────┘ └──────────────┘
⚠ Capability tracks the model you run. A bigger / better model gives you sharper reasoning, more reliable tool use, and longer plans; a small model will sometimes drop tool calls or hallucinate paths. If something feels broken, try a larger model first.
Prerequisites (any OS): Python 3.12+, Node 20+, Ollama running locally on http://localhost:11434, at least one function-calling Ollama model. On Windows the launcher scripts add the inbound LAN firewall rules automatically (UAC prompt); on Linux they try ufw then firewall-cmd then print a manual fallback.
Pull a model first either way:
ollama pull gemma4:e4b # recommended defaultThe repo ships matching .bat (Windows) and .sh (Linux + macOS) wrappers. They're idempotent — re-run after git pull to refresh deps + rebuild the frontend.
Windows (PowerShell or cmd.exe):
.\install.bat # venv + deps + frontend build + firewall rule + patched llama.cpp fetch
.\start.bat # production single-port at http://localhost:8000
.\dev.bat # dev mode with hot reload at http://localhost:5173
.\uninstall.bat # undo everything install.bat created (asks before deleting data\)Linux / macOS (bash):
chmod +x ./install.sh ./start.sh ./dev.sh ./uninstall.sh # one time
./install.sh # venv + deps + frontend build + firewall rule (ufw/firewalld)
./start.sh # production single-port at http://localhost:8000
./dev.sh # dev mode with hot reload at http://localhost:5173
./uninstall.sh # undo everything install.sh createdinstall auto-fetches the patched llama.cpp build (~100 MB, Windows x64 only today) into ~/.gigachat/llama-cpp/ so split-mode chats survive transient peer / iGPU blips without crashing — see vendor/llama.cpp-patches/README.md for the patch source + how to build for Linux / macOS yourself. Single-device chat works without it.
uninstall removes .venv/, frontend/node_modules/, frontend/dist/, and the firewall rules. data/ is kept by default (asks first) so a re-install picks up your chat history and P2P identity.
If you prefer the raw commands, or you're on a Linux distro whose firewall manager isn't ufw or firewalld:
# 1. Backend virtualenv + deps
python3 -m venv .venv
source .venv/bin/activate # POSIX
# .venv\Scripts\activate.bat # Windows cmd
# .\.venv\Scripts\Activate.ps1 # Windows PowerShell
python -m pip install --upgrade pip
python -m pip install -r backend/requirements.txt
# 2. Frontend deps + production build
cd frontend && npm install && npm run build && cd ..
# 3. (Optional, Windows x64 only — single-device chat skips this)
# Auto-fetch the patched llama.cpp release zip to ~/.gigachat/llama-cpp/.
python -c "from backend.p2p_llama_server import fetch_patched_llama_cpp; print(fetch_patched_llama_cpp())"
# 4. (Optional) Open inbound TCP 8000 / 50052 / 50053 / 8090 on your firewall
# so other Gigachat installs on the same LAN can pair with this device.
# Loopback-only usage (single device) doesn't need any firewall changes.
# 5. Run.
python -m backend.server # production (port 8000)
# OR for dev mode (two terminals):
python -m uvicorn backend.app:app --host 0.0.0.0 --port 8000 --reload
( cd frontend && npm run dev ) # second terminal — port 5173Other Gigachat installs on the same Wi-Fi can pair with this device via Settings → Compute pool once the backend is up — chat traffic stays local on each install.
- Chat with full history, persisted to SQLite — search, pin, tag, group by project, edit-and-regenerate the last user message.
- Run real tools — shell commands, file read/write/edit, screenshot + click, Chrome browser automation, web search, OpenAPI calls, sandboxed Docker, SSH, email, Home Assistant, audio transcription. Full catalog: docs/TOOLS.md.
- Computer use — the model can see your desktop and drive your mouse/keyboard (multimodal model required). Coordinate-grid screenshots, accessibility-tree clicks, bounded waits, batched primitives.
- Per-message permission gate — every write-class tool call pauses with a diff or command preview until you approve. Read-only / Plan / Approve edits / Allow everything modes.
- Watch the reasoning — desktop side strip shows the active tool, its args, a pulsing "Thinking" card. No need to scroll the transcript to see what's running.
- Quality modes — same model, more compute. Refine (self-critique + revise), Consensus (sample + synthesize), Personas (diverse reasoning overlays), Auto. Closes the gap to GPT-4 / Claude class on small-to-mid models.
- Compute pool — pair other Gigachat installs on your LAN over a 6-digit PIN (Bluetooth-style). Big models that don't fit one machine layer-split across the pool via llama.cpp
--rpc. Speculative decoding recruits idle peers. - Public pool — opt-in global swarm. Use other peers' GPUs for models you don't have locally; donate idle compute back. End-to-end encrypted (X25519 + ChaCha20-Poly1305). See docs/P2P.md.
- Long-running tasks — schedule prompts at an ISO datetime / recurring interval, set a chat into autonomous loop mode, monitor a file/URL until a condition flips.
- Memory + skills — long-term facts (per-conversation OR global), procedural skills the agent can save and recall, lifecycle hooks that fire on agent events.
- Survive crashes — every conversation has an
idle/running/errorstate; the startup resumer either re-enters interrupted turns or flips state back to idle. - Stream tokens live over Server-Sent Events; queue follow-up messages while a turn is in flight without locking the composer.
| Model | Size | Notes |
|---|---|---|
gemma4:e2b |
7.2 GB | fastest, fits in 8 GB VRAM |
gemma4:e4b / gemma4:latest |
9.6 GB | recommended default — best quality on 16 GB RAM + 8 GB VRAM |
gemma4:26b |
18 GB | usually too big for ≤16 GB RAM |
gemma4:31b |
20 GB | requires a workstation |
llama3.1:8b, qwen2.5:7b, mistral-nemo |
4-5 GB | good chat alternatives; desktop-use needs a multimodal variant |
llava, qwen2.5-vl, gemma4:* |
varies | pick one for computer-use / screenshot tools |
The model picker at the top of every chat lets you switch per-conversation. Default filter shows only models whose Ollama capabilities list includes tools; flip the wrench-icon footer toggle to Show all models when you want to try one without that flag (Gigachat then auto-falls-back to prompt-space tool calling).
Auto-tuned default: at first run the backend probes RAM / VRAM / GPU kind and picks the largest recommended Gemma 4 variant that fits. Override via Settings → General.
⚠ Known split-mode incompatibility — Gemma 3n PLE variants (
gemma4:e2b,gemma4:e4b,gemma4:latest). These models use the new Per-Layer Embedding architecture which the patched llama.cpp build (pinned at upstreamb9002) doesn't fully recognize — split-mode chat fails at model load withwrong number of tensors; expected 2131, got 720. Workaround: these models still work fine on a single device through Ollama (the standard chat path). Only the--rpcsplit-mode path is affected. Fix on the way: rebasing the Gigachat resilience patch onto a newer llama.cpp tag (b9100+) will restore split-mode for Gemma 3n. Until then, pick a standard transformer model (llama3.1:8b,qwen2.5:7b,mistral-nemo,dolphin-mixtral:8x7b) when you want pool-wide split.
A header dropdown picks how tool calls are gated, per conversation:
| Mode | Icon | Read tools | Write tools |
|---|---|---|---|
| Read-only | 👁 | run silently | refused before approval card |
| Plan mode | 📋 | run silently | refused; agent must end with [PLAN READY] to unlock the Execute plan button |
| Approve edits (default) | 🛡 | run silently | pause with diff/command/reason card, wait for click |
| Allow everything | ⚡ | run silently | run silently |
Approval cards show the full command (bash), the unified diff (write/edit), and the model's reason field. Side-by-side diff toggle is one click.
⚠ Use Allow everything only when actively watching. A hostile tool output can try to prompt-inject the model into firing destructive tools. The default Approve edits is the safe choice. Full threat model: docs/SECURITY.md.
A second header dropdown picks a per-conversation quality mode. Every mode uses only the chat model the user picked — small models close the gap to GPT-4 / Claude class by spending more compute on the same model, not by routing to a stronger judge.
| Mode | Compute | Best for |
|---|---|---|
| Standard (default) | 1× | Cheap chat, low latency. |
| Refine | ~2× | Code, writing, reasoning. Same model critiques its own answer (under JSON-schema-constrained decoding) and revises if needed. |
| Consensus | ~3-4× | Math and logic. Sample additional candidates at varied temperatures, synthesize the best answer. |
| Personas | ~4× | Hard, open-ended questions. Same model, different reasoning-style overlays per sample (analyst / pragmatist / skeptic), synthesize. |
| Auto | adaptive | Best default for varied chat. Difficulty heuristic picks refine / consensus / personas — or skips on trivial turns. |
One sidebar footer button (⚙ Settings) hosts eight tabs:
- General — default chat model, hardware summary, recommended-but-not-installed model hint (no automatic background download — pulls happen only when you pick a model that isn't local).
- Compute pool — identity, public-pool toggle, LAN discovery, paired devices with live status + per-workload routing toggles. Single source of truth for "other devices doing work for me."
- Memories — global memory CRUD (one entry per row, optional
topicfor grouping; edits propagate immediately, no save button). - Secrets — named API tokens / credentials referenced via
{{secret:NAME}}. Values hidden by default; click reveal to show one. - Schedules — every queued prompt with next-run / interval / cwd. Add / delete from the UI; rows back the agent's
schedule_tasktool too. - Tools — user-defined Python tools (review, pause, delete, or add new ones with code + schema + deps form).
- Hooks — register shell commands at agent lifecycle points (
user_prompt_submit,pre_tool/post_tool,turn_done). Each receives a structured JSON payload on stdin; stdout is injected back as a system-note. Full guide: docs/HOOKS.md. - Docs — URL-indexed documentation sites for
docs_search. Live status chip per site; reindex button. - MCP — external Model Context Protocol servers.
Pair other Gigachat installs in Settings → Compute pool and the host automatically uses their CPU + RAM + GPU + VRAM alongside its own. The router decides per-request whether to keep things on host (fast, no LAN hop), dispatch to a paired peer (LAN-encrypted), or — for models too big to fit one machine — layer-split across the pool via llama.cpp --rpc. Speculative decoding recruits idle peers as draft-model accelerators.
Public pool (default ON) joins the global Gigachat swarm via a stateless rendezvous service:
- You donate spare GPU/CPU cycles when idle.
- You can use other peers' GPUs when a model isn't on your local devices. The router prefers local always; the swarm is the fallback.
- All compute traffic is end-to-end encrypted (X25519 + ChaCha20-Poly1305 envelopes, sender-ephemeral forward secrecy).
- Model bytes never flow peer-to-peer — when nobody local has the model, it auto-pulls from the OFFICIAL Ollama registry.
The project ships pointing at a public Cloud Run rendezvous so a fresh install joins the swarm the moment Public Pool toggles on. Self-host your own from rendezvous/ if you want full control.
Deep dive: docs/P2P.md (encryption, rendezvous, TURN relay, TLS pinning, fairness scheduler) and docs/COMPUTE_POOL.md (routing internals, llama.cpp flags, speculative decoding).
Default: backend binds 0.0.0.0. Two layers of access control then filter every request:
- Loopback (the local browser on the same machine) — full access.
- LAN clients (RFC1918 / IPv6-ULA / link-local) — only the P2P endpoints (encrypted compute proxy + pair handshake). The chat UI returns a clear 403 with "loopback only — install Gigachat on the other device and pair via Compute pool". P2P endpoints carry their own X25519 + Ed25519 envelope crypto, so no password layer is needed on top.
- Public IPs / Tailscale CGNAT — flat 403. The app stays on the user's own physical network.
There is no password feature. Two devices on the same Wi-Fi see each other via mDNS, pair with a 6-digit PIN (Bluetooth-style), and from then on share compute over an end-to-end encrypted channel — no shared secret to set up, no auth.json to write. Each device's chat UI runs on its own loopback.
If you want hard isolation (no P2P discovery, no compute-pool participation — e.g. on an untrusted public Wi-Fi), set GIGACHAT_HOST=127.0.0.1 and the backend binds loopback-only.
Each conversation has a cwd that all commands run from. The chat-header dialog has a Browse… button that opens the native OS folder picker; the chosen path is validated server-side. Once set, cwd is immutable.
AGENTS.md / CLAUDE.md auto-injection — on every turn the backend walks from cwd up to the filesystem root and concatenates every AGENTS.md and CLAUDE.md it finds into the system prompt (outermost first, innermost last — nearer-in instructions win). Both names are treated equally so a repo that ships only one still works; nested sub-projects can override parent rules.
File checkpoints — every write_file / edit_file snapshots the prior contents under data/checkpoints/<conv_id>/<stamp>/<hash>.bin and exposes a one-click restore.
- Default bind is 127.0.0.1. Nothing on your LAN reaches it until you opt in.
- Approve edits is the safe default. A hostile tool output can try to prompt-inject the model.
- Use a strong random password in LAN mode (
python -c "import secrets; print(secrets.token_urlsafe(24))"). - Computer use controls your real desktop. Close private windows before handing the mouse over. Don't ask the agent to type passwords or 2FA codes.
- Scheduled tasks run unattended in Allow-everything mode. Be specific in the prompt.
Full threat model + risk catalog: docs/SECURITY.md.
python -m pytest -m smoke # fast tier, ~70 s, 488 tests
python -m pytest # everything (Windows-only tests skipped on Linux)
# One-time setup so `git push` runs the smoke tier automatically:
git config core.hooksPath .githooks
The isolated_db fixture rewires db.DB_PATH to a tmp file per test, so the suite never touches data/app.db.
| Symptom | Fix |
|---|---|
| "Ollama not reachable" toast | Run ollama serve in a terminal. |
| Model picker is empty | ollama list to confirm; ollama pull gemma4:e4b. |
| Responses very slow | Likely swapping model weights to RAM. Try gemma4:e2b. |
| Approval click does nothing | Check the backend console for errors. |
| Dev server port 5173 in use | Kill the other Vite process or change the port in frontend/vite.config.js. |
web_search rate-limited |
DuckDuckGo occasionally rate-limits. Wait a minute; persistent? pip install -U ddgs. |
doc_index / doc_search: "no vector" |
ollama pull nomic-embed-text. |
| Settings → Compute pool: rendezvous "Disconnected" / "Not configured" | Confirm Public Pool toggle is on. The default Cloud Run URL ships with the app; override or self-host via the URL editor. |
PermissionError: [Errno 13] Permission denied: '...\\AppData\\Roaming\\Python\\Python3xx\\site-packages\\typing_extensions.py' on backend startup |
Mixed system + user site-packages install. Cleanest fix: .\install.bat (creates .venv\ and installs every dep there; the scheduled task uses that venv exclusively). Quick patch without venv: del "%APPDATA%\Python\Python3xx\site-packages\typing_extensions.py" then re-run install.bat. |
pip uninstall typing-extensions fails with uninstall-no-record-file |
The existing copy was put there manually / by a partial install — pip can't safely remove it. Run .\install.bat to use the venv path, which bypasses the global install entirely. |
| Browser on another LAN device gets a "loopback only" 403 | By design — the chat UI is loopback-only on every install. Use Gigachat from the same machine it's running on; for cross-device compute, install Gigachat on both and pair via Settings → Compute pool. |
| Another device on the LAN can't pair with mine (mDNS / direct connect fails) | Confirm Windows Defender has the OpenSSH SSH Server / Gigachat firewall rule enabled for the Private profile and that the Ethernet/Wi-Fi adapter is classified Private (not Public). Tailscale CGNAT (100.64.0.0/10) is intentionally refused; both devices must be on the same physical network. |
More: see docs/SECURITY.md for risk-specific knobs and docs/COMPUTE_POOL.md for pool-routing diagnostics.
- docs/TOOLS.md — full catalog of every tool the agent can call.
- docs/COMPUTE_POOL.md — routing internals, llama.cpp flags, speculative decoding, override-file mechanism.
- docs/P2P.md — P2P encryption, rendezvous, TURN relay, TLS pinning, fairness scheduler, API surface.
- docs/SECURITY.md — full threat model + risk catalog.
- docs/HOOKS.md — lifecycle hooks deep dive + recipes.
- ARCHITECTURE.md — for contributors: turn flow, load-bearing invariants, where to change what.
- rendezvous/README.md — deploying the rendezvous service to Cloud Run / a VPS.
