Filo is a high-performance AI coding assistant written in modern C++.
It runs in multiple runtime modes:
- interactive terminal app (TUI)
- non-interactive prompter mode for scripts/CI
- MCP server over stdio
- HTTP daemon exposing MCP and/or OpenAI/Anthropic-compatible API endpoints
Filo focuses on speed, control, and local-first workflows without giving up multi-provider flexibility.
- Local providers are first-class: Ollama over localhost and embedded
llama.cppfor in-process GGUF inference(FILO_ENABLE_LLAMACPP=ON). - Router guardrails can exempt providers flagged as local (
enforce_on_local: false), keeping embedded local backends available when remote limits are hit. - The daemon listens on
127.0.0.1by default.
- In-process router engine with policy rules and strategies:
smart,fallback,latency,load_balance. - Automatic fallback chains with per-candidate retries.
- Guardrails for spend and quota reserves (
max_session_cost_usd, token/request/window reserve ratios). - Auto-classifier that scores prompt complexity and routes to fast/balanced/powerful tiers.
- Built-in
pythontool executes code inside an embedded interpreter. - Interpreter state persists across calls (variables/imports/functions carry over).
- Optional venv isolation via
FILO_PYTHON_VENV.
- C++26 core with streaming-first provider protocols
- TUI built with FTXUI
- Context mentions (
@file, quoted paths, and escaped paths like@My\ Folder/file.txt) - Agent Skills support with
.filo/skillsand on-demand activation Ctrl+Vclipboard paste support (text paste and clipboard-image insertion as@"<path>")- Session persistence and resume
- Global + workspace config layering
- MCP dispatcher shared across stdio and HTTP transports
- OAuth and API-key credentials
- CMake
>= 3.28 - C++26 compiler (GCC 15+ or Clang 17+ recommended)
- OpenSSL
- Python 3 (required when
FILO_ENABLE_PYTHON=ON, which is the default)
cmake --preset linux-debug
cmake --build --preset linux-debug
ctest --preset linux-debug --output-on-failure
./build/Linux/linux-debug/filocmake --preset xcode-debug
cmake --build --preset xcode-debug
ctest --preset xcode-debug --output-on-failure
./build/Darwin/xcode-debug/Debug/filoLinux:
cmake --preset linux-debug -DFILO_ENABLE_LLAMACPP=ON
cmake --build --preset linux-debugMinimal local provider example:
{
"default_provider": "local",
"providers": {
"local": {
"api_type": "llamacpp",
"model": "qwen2.5-coder-7b",
"model_path": "/absolute/path/to/model.gguf",
"context_size": 8192,
"gpu_layers": 35
}
}
}| Mode | Command |
|---|---|
| Interactive TUI | filo |
| Prompter (single-shot) | filo --prompt "Summarize this diff" |
| MCP over stdio | filo --mcp --headless or filo --mcp stdio --headless |
MCP over TCP (HTTP /mcp endpoint) |
filo --mcp tcp --headless --port 8080 |
| API gateway only | filo --api --headless --port 8080 |
| MCP + API gateway | filo --mcp tcp --headless --api --port 8080 |
Daemon transport notes:
--mcpwithout a value defaults tostdio.--mcp tcpstarts the HTTP daemon and exposes MCP on/mcp.--daemonis still accepted as a deprecated alias for--mcp tcp.- Set
FILO_MCP_BEARER_TOKENto requireAuthorization: Bearer <token>on/mcp. - For LAN worker deployments, use
--host 0.0.0.0only with a bearer token and network access controls. - The API gateway is off by default to keep daemon startup minimal and local-first.
--apistarts the same HTTP daemon and exposes OpenAI/Anthropic-compatible proxy endpoints:GET /v1/modelsPOST /v1/chat/completions(OpenAI-style)POST /v1/messages(Anthropic-style)
- Combine
--apiwith--mcp tcpif you want both/mcpand/v1/*on one port. - Model routing in API gateway endpoints:
policy/<policy_name>routes via filo smart router policy.<provider>/<model>routes directly to a configured provider/model.<provider>routes to that provider's default configured model.
Useful CLI flags:
--mcp [stdio|tcp]run as MCP server (default transport:stdio)--daemondeprecated alias for--mcp tcp--apienable optional OpenAI/Anthropic-compatible proxy mode--login <provider>authenticate and exit (openaiuses ChatGPT OAuth)--list-sessionslist resumable sessions--resume [id|index]resume a saved session--prompterforce non-interactive mode--prompt,-pprompt text--output-format,-oone oftext,json,stream-json--input-formatone oftext,stream-json--include-partial-messagesinclude deltas instream-json--continuecontinue the latest project-scoped session in prompter mode
Prompter examples:
# Direct prompt
filo --prompt "Review this patch for regressions"
# Stdin only
git diff | filo
# Prompt + stdin context
cat README.md | filo --prompt "Summarize the key setup steps"
# JSON output for automation
filo -p "Generate release notes from these commits" -o json
# Stream JSON events
filo -p "Explain the architecture" -o stream-json --include-partial-messages
# Continue latest project-scoped session
filo --continue -p "Now apply the follow-up refactor"Filo supports both API-key and OAuth-based providers.
Typical API-key setup:
export XAI_API_KEY="..."
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GEMINI_API_KEY="..."
export MISTRAL_API_KEY="..."
export KIMI_API_KEY="..."
export ZAI_API_KEY="..."
export DASHSCOPE_API_KEY="..."For local Ollama, default endpoint is:
http://localhost:11434
Config files are layered in this order:
~/.config/filo/config.json~/.config/filo/auth_defaults.json~/.config/filo/settings.json./.filo/config.json./.filo/settings.json~/.config/filo/profile_defaults.json~/.config/filo/model_defaults.json
Use config.json for providers/router/subagents.
Use settings.json for managed UI/workflow preferences.
Optional context compression can be enabled in config.json:
{
"context_compression": "light"
}light keeps exact small tool results, but stores oversized read_file and shell outputs in the agent history as compact summaries with size, digest, signal lines, and head/tail excerpts. The default is off.
Use "full" for the native context-cache mode: first or changed read_file calls stay exact, repeated unchanged reads collapse to small path-based cache stubs, instruction files are always preserved, and common shell outputs such as git status, git diff, build, and test logs are retained as signal-focused summaries. Diff summaries preserve structural diff lines verbatim.
Use "ultra" when token pressure matters more than keeping broad excerpts: it applies the same native cache and command-family summaries as full, but with tighter budgets for read_file and shell output. Instruction files are still preserved exactly.
In the interactive TUI, use /compression to pick a mode from a menu, or /compression ultra / /compress ultra to switch directly.
Profiles let you keep multiple named configuration overlays and switch between them instantly.
This is useful for context switching (for example: work, oss, local), without rewriting
your main config each time.
What profiles support:
- Define named overlays under
profilesinconfig.json - Inherit from one or more parent profiles with
extends_from - Override normal config fields (provider/model selection, mode, approval mode, router, MCP servers, subagents, UI defaults)
- Switch in TUI with
/profile <name>and apply changes live in the current session - Persist the active profile in
~/.config/filo/profile_defaults.jsonfor future launches
Example profile config:
{
"profiles": {
"work": {
"description": "Company defaults",
"default_provider": "openai",
"default_mode": "BUILD"
},
"oss": {
"extends_from": ["work"],
"default_provider": "grok",
"default_approval_mode": "prompt"
}
}
}Quick usage:
- Define profiles under
profilesin~/.config/filo/config.jsonor./.filo/config.json. - In TUI, run
/profile(or/profile list) to see active and available profiles. - Switch profile with
/profile <name>(for example/profile work). - Remove the persisted profile with
/profile clear.
Commands:
/profile
/profile list
/profile work
/profile oss
/profile clearPrecedence note: FILO_PROFILE=<name> forces a profile for that process and overrides the persisted selection until unset.
Filo supports Agent Skills-style instruction packages. Each skill is a directory
with a SKILL.md file containing YAML frontmatter with at least name and
description, followed by Markdown instructions.
The official project-local location is ./.filo/skills/<name>/SKILL.md.
Filo-native skill roots have precedence over compatibility roots such as
.claude/skills and .agents/skills; project-local .filo/skills has the
highest precedence. Use .filo/skills for skills that are specific to Filo or
this repository.
Instruction skills are disclosed to the model as a compact catalog and loaded
on demand through the activate_skill tool. Bundled scripts/, references/,
and assets/ files are listed during activation and can be read by calling
activate_skill again with resource_path.
To use the public Agent Skills collection in one project:
mkdir -p .filo
git clone https://github.com/addyosmani/agent-skills.git .filo/agent-skills
ln -s agent-skills/skills .filo/skillsTo install it globally instead:
mkdir -p ~/.config/filo
git clone https://github.com/addyosmani/agent-skills.git ~/.config/filo/agent-skills
ln -s agent-skills/skills ~/.config/filo/skillsIf .filo/skills or ~/.config/filo/skills already exists, copy individual
skill directories into it instead of replacing the directory.
After installation, start Filo from the project. The model will see skill names
such as using-agent-skills, code-review-and-quality, and
frontend-ui-engineering in its system catalog and should call activate_skill
before using the matching workflow. When a skill references a bundled resource,
for example scripts/idea-refine.sh, the model can load it with
activate_skill and the same resource_path.
Skills without entry_point are also available as slash commands in the TUI:
/<skill-name> [arguments]. The body may use $ARGUMENTS as a placeholder.
Filo-specific Python tool skills continue to use entry_point and the existing
get_schema() / execute() Python contract.
{
"router": {
"enabled": true,
"default_policy": "local-first",
"guardrails": {
"max_session_cost_usd": 5.0,
"min_requests_remaining_ratio": 0.20,
"min_tokens_remaining_ratio": 0.20,
"min_window_remaining_ratio": 0.20,
"enforce_on_local": false
},
"policies": {
"local-first": {
"strategy": "fallback",
"defaults": [
{ "provider": "local", "model": "qwen2.5-coder-7b", "retries": 0 },
{ "provider": "ollama", "model": "llama3", "retries": 0 },
{ "provider": "grok", "model": "grok-code-fast-1", "retries": 1 }
],
"rules": [
{
"name": "deep-reasoning",
"priority": 10,
"strategy": "fallback",
"when": {
"min_prompt_chars": 260,
"any_keywords": ["debug", "root cause", "architecture", "migration"]
},
"candidates": [
{ "provider": "claude", "model": "claude-sonnet-4-6", "retries": 1 },
{ "provider": "grok-reasoning", "model": "grok-4.20-reasoning", "retries": 1 }
]
}
]
}
}
}
}src/core/llm/provider abstraction, protocols, routingsrc/core/tools/tool execution (shell/files/patch/search/python)src/core/mcp/MCP dispatcher and client/session handlingsrc/tui/terminal UI componentssrc/exec/stdio MCP server, daemon, and prompter entrypointssrc/core/auth/API key and OAuth flows
Apache License 2.0. See LICENSE.
