Gigachat

A self-hosted web app that turns any locally-running Ollama model (Gemma, Llama, Qwen, DeepSeek, Mistral — anything with function-calling) into a Claude-Code-style assistant: chat with conversation history, run shell commands, edit files, drive your desktop, search the web, query APIs — with a per-conversation permission gate on every tool call.

┌────────────────┐          ┌─────────────────┐          ┌──────────────┐
│  React + Vite  │  SSE     │  FastAPI        │  chat    │   Ollama     │
│  + shadcn/ui   │◄────────►│  agent loop     │◄────────►│   any model  │
│  (port 5173)   │  /api/*  │  (port 8000)    │  :11434  │              │
└────────────────┘          └─────────────────┘          └──────────────┘

⚠ Capability tracks the model you run. A bigger / better model gives you sharper reasoning, more reliable tool use, and longer plans; a small model will sometimes drop tool calls or hallucinate paths. If something feels broken, try a larger model first.

Quickstart

Prerequisites (any OS): Python 3.12+, Node 20+, Ollama running locally on http://localhost:11434, at least one function-calling Ollama model. On Windows the launcher scripts add the inbound LAN firewall rules automatically (UAC prompt); on Linux they try ufw then firewall-cmd then print a manual fallback.

Pull a model first either way:

ollama pull gemma4:e4b   # recommended default

Option 1 — Helper scripts (recommended)

The repo ships matching .bat (Windows) and .sh (Linux + macOS) wrappers. They're idempotent — re-run after git pull to refresh deps + rebuild the frontend.

Windows (PowerShell or cmd.exe):

.\install.bat       # venv + deps + frontend build + firewall rule + patched llama.cpp fetch
.\start.bat         # production single-port at http://localhost:8000
.\dev.bat           # dev mode with hot reload at http://localhost:5173
.\uninstall.bat     # undo everything install.bat created (asks before deleting data\)

Linux / macOS (bash):

chmod +x ./install.sh ./start.sh ./dev.sh ./uninstall.sh   # one time
./install.sh        # venv + deps + frontend build + firewall rule (ufw/firewalld)
./start.sh          # production single-port at http://localhost:8000
./dev.sh            # dev mode with hot reload at http://localhost:5173
./uninstall.sh      # undo everything install.sh created

install auto-fetches the patched llama.cpp build (~100 MB, Windows x64 only today) into ~/.gigachat/llama-cpp/ so split-mode chats survive transient peer / iGPU blips without crashing — see vendor/llama.cpp-patches/README.md for the patch source + how to build for Linux / macOS yourself. Single-device chat works without it.

uninstall removes .venv/, frontend/node_modules/, frontend/dist/, and the firewall rules. data/ is kept by default (asks first) so a re-install picks up your chat history and P2P identity.

Option 2 — Manual setup (any OS, no scripts)

If you prefer the raw commands, or you're on a Linux distro whose firewall manager isn't ufw or firewalld:

# 1. Backend virtualenv + deps
python3 -m venv .venv
source .venv/bin/activate                          # POSIX
# .venv\Scripts\activate.bat                       # Windows cmd
# .\.venv\Scripts\Activate.ps1                     # Windows PowerShell
python -m pip install --upgrade pip
python -m pip install -r backend/requirements.txt

# 2. Frontend deps + production build
cd frontend && npm install && npm run build && cd ..

# 3. (Optional, Windows x64 only — single-device chat skips this)
#    Auto-fetch the patched llama.cpp release zip to ~/.gigachat/llama-cpp/.
python -c "from backend.p2p_llama_server import fetch_patched_llama_cpp; print(fetch_patched_llama_cpp())"

# 4. (Optional) Open inbound TCP 8000 / 50052 / 50053 / 8090 on your firewall
#    so other Gigachat installs on the same LAN can pair with this device.
#    Loopback-only usage (single device) doesn't need any firewall changes.

# 5. Run.
python -m backend.server                           # production (port 8000)
# OR for dev mode (two terminals):
python -m uvicorn backend.app:app --host 0.0.0.0 --port 8000 --reload
( cd frontend && npm run dev )                     # second terminal — port 5173

Other Gigachat installs on the same Wi-Fi can pair with this device via Settings → Compute pool once the backend is up — chat traffic stays local on each install.

What it can do

Chat with full history, persisted to SQLite — search, pin, tag, group by project, edit-and-regenerate the last user message.
Run real tools — shell commands, file read/write/edit, screenshot + click, Chrome browser automation, web search, OpenAPI calls, sandboxed Docker, SSH, email, Home Assistant, audio transcription. Full catalog: docs/TOOLS.md.
Computer use — the model can see your desktop and drive your mouse/keyboard (multimodal model required). Coordinate-grid screenshots, accessibility-tree clicks, bounded waits, batched primitives.
Per-message permission gate — every write-class tool call pauses with a diff or command preview until you approve. Read-only / Plan / Approve edits / Allow everything modes.
Watch the reasoning — desktop side strip shows the active tool, its args, a pulsing "Thinking" card. No need to scroll the transcript to see what's running.
Quality modes — same model, more compute. Refine (self-critique + revise), Consensus (sample + synthesize), Personas (diverse reasoning overlays), Auto. Closes the gap to GPT-4 / Claude class on small-to-mid models.
Compute pool — pair other Gigachat installs on your LAN over a 6-digit PIN (Bluetooth-style). Big models that don't fit one machine layer-split across the pool via llama.cpp --rpc. Speculative decoding recruits idle peers.
Public pool — opt-in global swarm. Use other peers' GPUs for models you don't have locally; donate idle compute back. End-to-end encrypted (X25519 + ChaCha20-Poly1305). See docs/P2P.md.
Long-running tasks — schedule prompts at an ISO datetime / recurring interval, set a chat into autonomous loop mode, monitor a file/URL until a condition flips.
Memory + skills — long-term facts (per-conversation OR global), procedural skills the agent can save and recall, lifecycle hooks that fire on agent events.
Survive crashes — every conversation has an idle/running/error state; the startup resumer either re-enters interrupted turns or flips state back to idle.
Stream tokens live over Server-Sent Events; queue follow-up messages while a turn is in flight without locking the composer.

Picking a model

Model	Size	Notes
`gemma4:e2b`	7.2 GB	fastest, fits in 8 GB VRAM
`gemma4:e4b` / `gemma4:latest`	9.6 GB	recommended default — best quality on 16 GB RAM + 8 GB VRAM
`gemma4:26b`	18 GB	usually too big for ≤16 GB RAM
`gemma4:31b`	20 GB	requires a workstation
`llama3.1:8b`, `qwen2.5:7b`, `mistral-nemo`	4-5 GB	good chat alternatives; desktop-use needs a multimodal variant
`llava`, `qwen2.5-vl`, `gemma4:*`	varies	pick one for computer-use / screenshot tools

The model picker at the top of every chat lets you switch per-conversation. Default filter shows only models whose Ollama capabilities list includes tools; flip the wrench-icon footer toggle to Show all models when you want to try one without that flag (Gigachat then auto-falls-back to prompt-space tool calling).

Auto-tuned default: at first run the backend probes RAM / VRAM / GPU kind and picks the largest recommended Gemma 4 variant that fits. Override via Settings → General.

⚠ Known split-mode incompatibility — Gemma 3n PLE variants (gemma4:e2b, gemma4:e4b, gemma4:latest). These models use the new Per-Layer Embedding architecture which the patched llama.cpp build (pinned at upstream b9002) doesn't fully recognize — split-mode chat fails at model load with wrong number of tensors; expected 2131, got 720. Workaround: these models still work fine on a single device through Ollama (the standard chat path). Only the --rpc split-mode path is affected. Fix on the way: rebasing the Gigachat resilience patch onto a newer llama.cpp tag (b9100+) will restore split-mode for Gemma 3n. Until then, pick a standard transformer model (llama3.1:8b, qwen2.5:7b, mistral-nemo, dolphin-mixtral:8x7b) when you want pool-wide split.

Permission modes

A header dropdown picks how tool calls are gated, per conversation:

Mode	Icon	Read tools	Write tools
Read-only	👁	run silently	refused before approval card
Plan mode	📋	run silently	refused; agent must end with `[PLAN READY]` to unlock the Execute plan button
Approve edits (default)	🛡	run silently	pause with diff/command/reason card, wait for click
Allow everything	⚡	run silently	run silently

Approval cards show the full command (bash), the unified diff (write/edit), and the model's reason field. Side-by-side diff toggle is one click.

⚠ Use Allow everything only when actively watching. A hostile tool output can try to prompt-inject the model into firing destructive tools. The default Approve edits is the safe choice. Full threat model: docs/SECURITY.md.

Quality modes

A second header dropdown picks a per-conversation quality mode. Every mode uses only the chat model the user picked — small models close the gap to GPT-4 / Claude class by spending more compute on the same model, not by routing to a stronger judge.

Mode	Compute	Best for
Standard (default)	1×	Cheap chat, low latency.
Refine	~2×	Code, writing, reasoning. Same model critiques its own answer (under JSON-schema-constrained decoding) and revises if needed.
Consensus	~3-4×	Math and logic. Sample additional candidates at varied temperatures, synthesize the best answer.
Personas	~4×	Hard, open-ended questions. Same model, different reasoning-style overlays per sample (analyst / pragmatist / skeptic), synthesize.
Auto	adaptive	Best default for varied chat. Difficulty heuristic picks refine / consensus / personas — or skips on trivial turns.

Settings drawer

One sidebar footer button (⚙ Settings) hosts eight tabs:

General — default chat model, hardware summary, recommended-but-not-installed model hint (no automatic background download — pulls happen only when you pick a model that isn't local).
Compute pool — identity, public-pool toggle, LAN discovery, paired devices with live status + per-workload routing toggles. Single source of truth for "other devices doing work for me."
Memories — global memory CRUD (one entry per row, optional topic for grouping; edits propagate immediately, no save button).
Secrets — named API tokens / credentials referenced via {{secret:NAME}}. Values hidden by default; click reveal to show one.
Schedules — every queued prompt with next-run / interval / cwd. Add / delete from the UI; rows back the agent's schedule_task tool too.
Tools — user-defined Python tools (review, pause, delete, or add new ones with code + schema + deps form).
Hooks — register shell commands at agent lifecycle points (user_prompt_submit, pre_tool / post_tool, turn_done). Each receives a structured JSON payload on stdin; stdout is injected back as a system-note. Full guide: docs/HOOKS.md.
Docs — URL-indexed documentation sites for docs_search. Live status chip per site; reindex button.
MCP — external Model Context Protocol servers.

Compute pool & P2P

Pair other Gigachat installs in Settings → Compute pool and the host automatically uses their CPU + RAM + GPU + VRAM alongside its own. The router decides per-request whether to keep things on host (fast, no LAN hop), dispatch to a paired peer (LAN-encrypted), or — for models too big to fit one machine — layer-split across the pool via llama.cpp --rpc. Speculative decoding recruits idle peers as draft-model accelerators.

Public pool (default ON) joins the global Gigachat swarm via a stateless rendezvous service:

You donate spare GPU/CPU cycles when idle.
You can use other peers' GPUs when a model isn't on your local devices. The router prefers local always; the swarm is the fallback.
All compute traffic is end-to-end encrypted (X25519 + ChaCha20-Poly1305 envelopes, sender-ephemeral forward secrecy).
Model bytes never flow peer-to-peer — when nobody local has the model, it auto-pulls from the OFFICIAL Ollama registry.

The project ships pointing at a public Cloud Run rendezvous so a fresh install joins the swarm the moment Public Pool toggles on. Self-host your own from rendezvous/ if you want full control.

Deep dive: docs/P2P.md (encryption, rendezvous, TURN relay, TLS pinning, fairness scheduler) and docs/COMPUTE_POOL.md (routing internals, llama.cpp flags, speculative decoding).

Network model (LAN, P2P, public)

Default: backend binds 0.0.0.0. Two layers of access control then filter every request:

Loopback (the local browser on the same machine) — full access.
LAN clients (RFC1918 / IPv6-ULA / link-local) — only the P2P endpoints (encrypted compute proxy + pair handshake). The chat UI returns a clear 403 with "loopback only — install Gigachat on the other device and pair via Compute pool". P2P endpoints carry their own X25519 + Ed25519 envelope crypto, so no password layer is needed on top.
Public IPs / Tailscale CGNAT — flat 403. The app stays on the user's own physical network.

There is no password feature. Two devices on the same Wi-Fi see each other via mDNS, pair with a 6-digit PIN (Bluetooth-style), and from then on share compute over an end-to-end encrypted channel — no shared secret to set up, no auth.json to write. Each device's chat UI runs on its own loopback.

If you want hard isolation (no P2P discovery, no compute-pool participation — e.g. on an untrusted public Wi-Fi), set GIGACHAT_HOST=127.0.0.1 and the backend binds loopback-only.

Working directory

Each conversation has a cwd that all commands run from. The chat-header dialog has a Browse… button that opens the native OS folder picker; the chosen path is validated server-side. Once set, cwd is immutable.

AGENTS.md / CLAUDE.md auto-injection — on every turn the backend walks from cwd up to the filesystem root and concatenates every AGENTS.md and CLAUDE.md it finds into the system prompt (outermost first, innermost last — nearer-in instructions win). Both names are treated equally so a repo that ships only one still works; nested sub-projects can override parent rules.

File checkpoints — every write_file / edit_file snapshots the prior contents under data/checkpoints/<conv_id>/<stamp>/<hash>.bin and exposes a one-click restore.

Safety basics

Default bind is 127.0.0.1. Nothing on your LAN reaches it until you opt in.
Approve edits is the safe default. A hostile tool output can try to prompt-inject the model.
Use a strong random password in LAN mode (python -c "import secrets; print(secrets.token_urlsafe(24))").
Computer use controls your real desktop. Close private windows before handing the mouse over. Don't ask the agent to type passwords or 2FA codes.
Scheduled tasks run unattended in Allow-everything mode. Be specific in the prompt.

Full threat model + risk catalog: docs/SECURITY.md.

Tests

python -m pytest -m smoke         # fast tier, ~70 s, 488 tests
python -m pytest                  # everything (Windows-only tests skipped on Linux)

# One-time setup so `git push` runs the smoke tier automatically:
git config core.hooksPath .githooks

The isolated_db fixture rewires db.DB_PATH to a tmp file per test, so the suite never touches data/app.db.

Troubleshooting

Symptom	Fix
"Ollama not reachable" toast	Run `ollama serve` in a terminal.
Model picker is empty	`ollama list` to confirm; `ollama pull gemma4:e4b`.
Responses very slow	Likely swapping model weights to RAM. Try `gemma4:e2b`.
Approval click does nothing	Check the backend console for errors.
Dev server port 5173 in use	Kill the other Vite process or change the port in `frontend/vite.config.js`.
`web_search` rate-limited	DuckDuckGo occasionally rate-limits. Wait a minute; persistent? `pip install -U ddgs`.
`doc_index` / `doc_search`: "no vector"	`ollama pull nomic-embed-text`.
Settings → Compute pool: rendezvous "Disconnected" / "Not configured"	Confirm Public Pool toggle is on. The default Cloud Run URL ships with the app; override or self-host via the URL editor.
`PermissionError: [Errno 13] Permission denied: '...\\AppData\\Roaming\\Python\\Python3xx\\site-packages\\typing_extensions.py'` on backend startup	Mixed system + user site-packages install. Cleanest fix: `.\install.bat` (creates `.venv\` and installs every dep there; the scheduled task uses that venv exclusively). Quick patch without venv: `del "%APPDATA%\Python\Python3xx\site-packages\typing_extensions.py"` then re-run `install.bat`.
`pip uninstall typing-extensions` fails with `uninstall-no-record-file`	The existing copy was put there manually / by a partial install — pip can't safely remove it. Run `.\install.bat` to use the venv path, which bypasses the global install entirely.
Browser on another LAN device gets a "loopback only" 403	By design — the chat UI is loopback-only on every install. Use Gigachat from the same machine it's running on; for cross-device compute, install Gigachat on both and pair via Settings → Compute pool.
Another device on the LAN can't pair with mine (mDNS / direct connect fails)	Confirm Windows Defender has the OpenSSH SSH Server / Gigachat firewall rule enabled for the Private profile and that the Ethernet/Wi-Fi adapter is classified Private (not Public). Tailscale CGNAT (`100.64.0.0/10`) is intentionally refused; both devices must be on the same physical network.

More: see docs/SECURITY.md for risk-specific knobs and docs/COMPUTE_POOL.md for pool-routing diagnostics.

Documentation index

docs/TOOLS.md — full catalog of every tool the agent can call.
docs/COMPUTE_POOL.md — routing internals, llama.cpp flags, speculative decoding, override-file mechanism.
docs/P2P.md — P2P encryption, rendezvous, TURN relay, TLS pinning, fairness scheduler, API surface.
docs/SECURITY.md — full threat model + risk catalog.
docs/HOOKS.md — lifecycle hooks deep dive + recipes.
ARCHITECTURE.md — for contributors: turn flow, load-bearing invariants, where to change what.
rendezvous/README.md — deploying the rendezvous service to Cloud Run / a VPS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gigachat

Quickstart

Option 1 — Helper scripts (recommended)

Option 2 — Manual setup (any OS, no scripts)

What it can do

Picking a model

Permission modes

Quality modes

Settings drawer

Compute pool & P2P

Network model (LAN, P2P, public)

Working directory

Safety basics

Tests

Troubleshooting

Documentation index

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 323 Commits
.githooks		.githooks
backend		backend
docs		docs
frontend		frontend
rendezvous		rendezvous
scripts		scripts
vendor/llama.cpp-patches		vendor/llama.cpp-patches
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
README.md		README.md
build.bat		build.bat
dev.bat		dev.bat
dev.sh		dev.sh
install.bat		install.bat
install.sh		install.sh
pytest.ini		pytest.ini
start.bat		start.bat
start.sh		start.sh
uninstall.bat		uninstall.bat
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

Gigachat

Quickstart

Option 1 — Helper scripts (recommended)

Option 2 — Manual setup (any OS, no scripts)

What it can do

Picking a model

Permission modes

Quality modes

Settings drawer

Compute pool & P2P

Network model (LAN, P2P, public)

Working directory

Safety basics

Tests

Troubleshooting

Documentation index

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages