Run Claude Code-style subagents across your local model fleet.
subagent-fleet is a config-first Python CLI for mapping coding subagents to the best Ollama model and machine you own, then generating LiteLLM and Claude Code-style agent configuration.
Quickstart • Configuration • Generated Files • Security • Roadmap
Local model users often have more than one useful machine: a laptop, a Mac mini, a workstation, a home server, or a spare GPU box. Most coding harnesses still point at one model endpoint.
subagent-fleet turns that setup into a private local subagent fleet:
planner -> small fast model on a lightweight node
implementer -> larger coding model on a bigger node
reviewer -> larger coding model on a bigger node
summarizer -> small local model on the controller
It does not replace Ollama, LiteLLM, or Claude Code. It generates the glue between them:
Claude Code / coding harness
|
v
LiteLLM gateway generated by subagent-fleet
|
+-- Ollama node: laptop
+-- Ollama node: Mac mini 64GB
+-- Ollama node: workstation
- Validate a declarative
fleet.yaml. - Discover models from configured Ollama nodes via
/api/tags. - Generate
litellm_config.yamlwithollama_chat/routes. - Generate Claude Code-style
.claude/agents/*.mdfiles. - Generate
.env.subagent-fleetfor Claude Code/LiteLLM environment variables. - Warm configured Ollama models with
keep_alive. - Show node health and agent routing tables.
- Keep unreachable nodes isolated so one offline machine does not crash the whole workflow.
MVP CLI implemented.
Available commands:
subagent-fleet init
subagent-fleet validate
subagent-fleet discover
subagent-fleet generate
subagent-fleet warmup
subagent-fleet status
subagent-fleet doctor
subagent-fleet clean
subagent-fleet skills list
subagent-fleet skills install
subagent-fleet plugins installChoose one of the install paths below.
Install the CLI directly from PyPI:
python -m pip install subagent-fleetOr install it as an isolated command with pipx:
pipx install subagent-fleetVerify:
subagent-fleet --helpUse this when contributing to the project:
git clone https://github.com/adityak74/subagent-fleet.git
cd subagent-fleet
python -m pip install -e ".[dev]"Run tests:
python -m pytestInstall the plugin first from Claude Code, then let the bundled bootstrap skill install the CLI:
/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet
After install, ask Claude Code:
Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.
The bootstrap skill will run or recommend:
python -m pip install subagent-fleet
subagent-fleet skills installInstall this repository as a local Codex marketplace:
codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleetThen ask Codex:
Use the subagent-fleet bootstrap skill to install the CLI and set up this repo.
Create a starter config:
subagent-fleet initEdit fleet.yaml with your Ollama node endpoints and model names, then validate it:
subagent-fleet validateCheck which nodes are reachable:
subagent-fleet discoverGenerate LiteLLM, Claude agent, and environment files:
subagent-fleet generateStart LiteLLM:
export LITELLM_MASTER_KEY="sk-local-dev"
litellm \
--config ./litellm_config.yaml \
--host 127.0.0.1 \
--port 4000Point Claude Code at the local gateway:
source .env.subagent-fleet
claudesubagent-fleet is driven by fleet.yaml.
project:
name: local-dev
gateway:
provider: litellm
host: 127.0.0.1
port: 4000
master_key_env: LITELLM_MASTER_KEY
nodes:
m5-local:
endpoint: http://localhost:11434
tags: [controller, local, fast]
m4-mini-64gb:
endpoint: http://192.168.1.50:11434
tags: [heavy, coder, reviewer]
m4-mini-16gb:
endpoint: http://192.168.1.51:11434
tags: [small, planner, summarizer]
models:
heavy-coder:
node: m4-mini-64gb
ollama_model: qwen2.5-coder:32b
litellm_alias: claude-sonnet-local
context: 32768
timeout: 600
max_parallel: 1
small-coder:
node: m4-mini-16gb
ollama_model: qwen2.5-coder:7b
litellm_alias: claude-haiku-local
context: 8192
timeout: 300
max_parallel: 1
agents:
planner:
model: small-coder
description: Use for planning, file discovery, task decomposition, and summarization.
tools: [Read, Grep, Glob]
prompt: |
You are a fast local planning agent.
Do not edit files.
Return a concise response with:
- plan
- relevant files
- risks
- next recommended agent
implementer:
model: heavy-coder
description: Use for implementation, bug fixes, refactors, and patch creation.
tools: [Read, Grep, Glob, Edit, MultiEdit, Bash]
reviewer:
model: heavy-coder
description: Use after implementation to review diffs, tests, regressions, and maintainability.
tools: [Read, Grep, Glob, Bash]Running:
subagent-fleet generatecreates:
litellm_config.yaml
.claude/agents/planner.md
.claude/agents/implementer.md
.claude/agents/reviewer.md
.env.subagent-fleet
Example LiteLLM route:
model_list:
- model_name: claude-sonnet-local
litellm_params:
model: ollama_chat/qwen2.5-coder:32b
api_base: http://192.168.1.50:11434
api_key: ollama
timeout: 600
model_info:
max_input_tokens: 32768Example Claude agent:
---
name: planner
description: Use for planning, file discovery, task decomposition, and summarization.
model: claude-haiku-local
tools: Read, Grep, Glob
---
You are a fast local planning agent.
Do not edit files.
Return a concise response with:
- plan
- relevant files
- risks
- next recommended agent| Command | Purpose |
|---|---|
subagent-fleet init |
Create a starter fleet.yaml. |
subagent-fleet validate |
Validate schema, references, URLs, aliases, and agent names. |
subagent-fleet discover |
Query configured Ollama nodes for available models. |
subagent-fleet generate |
Generate LiteLLM config, Claude agents, and env file. |
subagent-fleet warmup |
Preload configured Ollama models with keep_alive. |
subagent-fleet status |
Show node health and agent routing. |
subagent-fleet doctor |
Show validation and local-network safety guidance. |
subagent-fleet clean |
List or remove generated files. |
subagent-fleet skills list |
List bundled assistant skills and supported targets. |
subagent-fleet skills install |
Install assistant-facing setup and operations skills. |
subagent-fleet plugins install |
Install Claude Code and Codex plugin marketplace bundles. |
JSON output is available for discovery and status:
subagent-fleet discover --json
subagent-fleet status --jsonsubagent-fleet ships assistant-facing skills that teach Claude Code, Codex, OpenCode, and similar tools how to set up and operate the fleet from inside a repository.
List bundled skills and supported targets:
subagent-fleet skills listInstall all bundled skills for all supported targets:
subagent-fleet skills installThis writes:
.claude/skills/subagent-fleet-setup/SKILL.md
.claude/skills/subagent-fleet-operations/SKILL.md
.codex/skills/subagent-fleet-setup/SKILL.md
.codex/skills/subagent-fleet-operations/SKILL.md
.opencode/skills/subagent-fleet-setup/SKILL.md
.opencode/skills/subagent-fleet-operations/SKILL.md
Install for a specific assistant:
subagent-fleet skills install --target codex
subagent-fleet skills install --target claude-code
subagent-fleet skills install --target opencodeInstall one bundled skill:
subagent-fleet skills install --skill subagent-fleet-setupExisting skill files are not overwritten unless you pass --force.
This repository also ships plugin marketplace metadata so users can install the assistant skill first, then let that skill install and verify the Python CLI.
Included plugin artifacts:
.claude-plugin/marketplace.json
.agents/plugins/marketplace.json
plugins/subagent-fleet/.claude-plugin/plugin.json
plugins/subagent-fleet/.codex-plugin/plugin.json
plugins/subagent-fleet/skills/subagent-fleet-bootstrap/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-setup/SKILL.md
plugins/subagent-fleet/skills/subagent-fleet-operations/SKILL.md
The bootstrap skill teaches Claude Code or Codex how to install the CLI:
python -m pip install subagent-fleetand then install repo-local assistant skills:
subagent-fleet skills installClaude Code plugin install flow:
/plugin marketplace add https://github.com/adityak74/subagent-fleet
/plugin install subagent-fleet
Codex local marketplace flow:
codex plugin marketplace add .
codex plugin add subagent-fleet@subagent-fleetTo generate the same marketplace/plugin bundle into another directory:
subagent-fleet plugins install --out /path/to/marketplace-rootInstall only one target:
subagent-fleet plugins install --target claude-code
subagent-fleet plugins install --target codexExisting plugin marketplace files are not overwritten unless you pass --force.
On each worker machine, run Ollama on a private interface reachable from your controller:
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
launchctl setenv OLLAMA_KEEP_ALIVE "-1"
launchctl setenv OLLAMA_NUM_PARALLEL "1"
launchctl setenv OLLAMA_MAX_LOADED_MODELS "1"
killall Ollama
open -a OllamaFrom the controller:
curl http://NODE_IP:11434/api/tagssubagent-fleet assumes private local networking.
Do:
- Use LAN, firewall rules, Tailscale, WireGuard, or a private subnet.
- Keep
LITELLM_MASTER_KEYset for LiteLLM access. - Treat generated
.env.subagent-fleetfiles as local developer configuration.
Do not:
- Expose Ollama directly to the public internet.
- Expose LiteLLM without authentication.
- Commit real API keys, LAN secrets, or machine-specific private
.envfiles.
Run:
subagent-fleet doctorfor local setup and safety reminders.
Install dev dependencies:
python -m pip install -e ".[dev]"Run tests:
python -m pytestRun a focused test:
python -m pytest tests/test_config.pyCheck CLI wiring:
python -m subagent_fleet.cli --helpsrc/subagent_fleet/
cli.py
config.py
discovery.py
plugins.py
warmup.py
status.py
skills.py
generators/
skill_templates/
templates/
examples/
plugins/
tests/
MVP:
-
fleet.yamlschema - Ollama node health checks
- Ollama model discovery via
/api/tags - LiteLLM config generation
- Claude Code agent generation
- Environment file generation
- Model warmup with
keep_alive - Status and routing tables
Next:
- Latency benchmarking
- Recommended agent-to-node assignment
- Role-based routing templates
- Tailscale-aware node discovery
- OpenAI-compatible harness examples
- Release packaging
Later:
- Dynamic routing by task type
- Fallback model generation
- Queue-aware scheduling
- Agent execution trace viewer
- Support for vLLM, LM Studio, llama.cpp, OpenRouter, and cloud APIs
Issues and pull requests are welcome.
Good first areas:
- More generator tests
- Additional example fleets
- Better status formatting
- More robust Ollama error reporting
- Documentation for real multi-machine setups
Before opening a PR:
python -m pytestsubagent-fleet is not:
- an inference engine
- a replacement for Ollama
- a replacement for LiteLLM
- a model sharding framework
- Kubernetes for local LLMs
- a public model hosting platform
It is a small workflow layer for private local subagent orchestration.
MIT. See LICENSE.