Minimal Swift CLI agent with tool calling support and an OpenAI-compatible client. This project is built for educational purposes for experimenting with small-model agent patterns. It is heavily inspired by build-your-own-openclaw.
The app can run the assistant in two ways. You choose how the model interacts with tools — this matters a lot for small local models (e.g. Llama 3.2 3B), which handle OpenAI-style function calling poorly.
| Mode | CLI / env value | When to use |
|---|---|---|
| Intent Router | intent |
Default for Ollama. The model never receives tool schemas; it only emits structured JSON intents. Your Swift code executes tools. Synthesis is a separate LLM call. |
| Tool calling | tool-calling |
Default for OpenAI. Classic chat completions with tools in the request; the model returns tool_calls like the OpenAI API. |
Defaults (if you do not set routing):
| Provider | Default routing |
|---|---|
ollama |
intent |
openai |
tool-calling |
Override defaults:
# Force OpenAI-style tool calling against Ollama (e.g. large model)
swift run ToyBot --provider ollama --routing tool-calling
# Force Intent Router against OpenAI (e.g. experiment with small models)
swift run ToyBot --provider openai --routing intentEnvironment variable: TOYBOT_ROUTING=intent or TOYBOT_ROUTING=tool-calling.
On startup the app prints the active routing mode, for example:
routing: intent
This pattern splits work into three roles so small models only do one simple job at a time:
-
Router (LLM) — Classifies the user message and conversation context into a single intent (
read_file,bash,search_file, ordirect_chat). The model does not see tool definitions. It is steered with structured outputs:response_formatwith a JSON Schema matchingIntentResponseDTO(OpenAI-compatible / supported by Ollama structured outputs). Router prompts are in English for better compatibility with local models. -
Executor (Swift) — Maps
Intentto real work:ToolRegistryrunsread_file/bash, orsearch_fileuses a fixedfindcommand built in code. The model does not run arbitrary shell; the app decides how to search. -
Synthesizer (LLM) — After tools have run, a second call asks the model to answer or summarize using only the collected context. For long file content, synthesis uses a short message list (focused system prompt + user request + tool context) instead of the full chat history, so small models are less likely to return empty replies. Truncation and retries apply; if the model still returns nothing, a deterministic excerpt of the gathered context is shown.
Between router calls, DeterministicIntentResolver can choose the next intent without calling the LLM when the next step is obvious from the last tool result, for example:
- After
search_filereturns exactly one path →read_filewith that path. - After several paths → pick a shallow path (fewer
/segments) →read_file. - After a successful
read_file→direct_chat(hand off to synthesis). - After
bashreturns a single path-like line →read_file.
That reduces drift and duplicate “find again” loops on small models.
Uses InMemoryAgentSession with ChatAgent: the full conversation and OpenAI-style function tool schemas are sent on every turn. The model returns tool_calls; the app executes tools and calls the LLM again until there are no tool calls. This matches what large models (GPT-4, Claude, strong Ollama models) expect.
Layered layout:
ChatLoop(Presentation): terminal I/O,exit/quit/q.AgentSession(Domain/Interfaces):chat(_:)→ finalMessage.InMemoryAgentSession: tool-calling loop (history + LLM + tools until plain assistant reply).IntentRoutedSession: router → executor loop → synthesizer; optional deterministic resolution between steps.
Agent/ChatAgent:LLMClient, system prompt,ToolRegistry(used only by tool-calling path).IntentRouter/LLMIntentRouter: classify →Intent.ActionExecutor/LocalActionExecutor:Intent→ tool strings.Synthesizer/LLMSynthesizer: final natural-language answer from collected context.DeterministicIntentResolver: code-only next-step intents when unambiguous.ToolRegistry(Application/Tools): dispatches tools by name.Tool(Domain/Interfaces): name, description,parametersSchema,execute.OpenAIClient(Data): chat completions; optionalLLMStructuredOutputfor intent JSON schema.IntentResponseDTO(Data): JSON shape for router + schema for structured outputs.
toy-bot supports file-based micro-skills in the skills/ directory. Each skill is a .md file with:
- YAML front-matter (
id,name,description,output_format) - system prompt body
- optional
---examples---section withuser:/assistant:few-shot pairs
At runtime, skills are applied differently by routing mode:
intentmode (small-model path):LLMIntentRouterreceives only skill metadata (id + description)SkillExecutorlazily loads only the selected skill file- the skill runs in an isolated worker session (system prompt + examples + current request)
tool-callingmode (large-model path):- skills are injected into the system prompt as a single consolidated block
- injection is bounded (global character cap + per-skill prompt/example truncation)
- this avoids extra routing hops and keeps behavior simple for stronger models
This gives small models stricter context isolation, while keeping the large-model path straightforward.
conventional-commitpr-descriptiontodo-breakdownregex
| Tool | Description |
|---|---|
read_file |
Read the full contents of a file at a given path |
bash |
Execute a bash command and return stdout + stderr |
Message is an enum — each case carries exactly the data it needs:
enum Message {
case system(content: String)
case user(content: String)
case assistant(content: String, toolCalls: [ToolCall])
case tool(content: String, toolCallId: String)
}swift runWhen started, the app prints active baseURL, model, and routing, then enters interactive chat:
- prompt:
>>> - assistant output:
🤖 Bot: ... - Tool calling mode:
🔨 Tool: <name>(tool arguments printed) - Intent Router mode:
🔍 Intent: <label>;⚡ auto: <label>when the next step was chosen byDeterministicIntentResolverwithout the LLM - exit:
exit,quit,q
toy-bot defaults to Ollama with a local model. For tool-calling mode with Ollama, prefer models that support function calling well (e.g. llama3.2, qwen2.5). For Intent Router mode, smaller models are more usable because they are not asked to emit OpenAI tool_calls.
Official download page: ollama.com/download
# macOS (Homebrew)
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | shStart Ollama and pull a model:
ollama serve
ollama pull llama3.2If you do not pass any config, the app uses:
- provider:
ollama - base URL:
http://localhost:11434 - model:
llama3.2 - token: not required
- routing:
intent(Intent Router)
| Variable | Description |
|---|---|
TOYBOT_PROVIDER |
ollama or openai |
TOYBOT_BASE_URL |
Override base URL |
TOYBOT_MODEL |
Model name |
TOYBOT_API_TOKEN |
API token (optional for Ollama) |
OLLAMA_HOST |
Fallback base URL for Ollama |
TOYBOT_ROUTING |
intent (Intent Router) or tool-calling (OpenAI-style tools) |
export TOYBOT_PROVIDER=ollama
export TOYBOT_MODEL=llama3.2
export TOYBOT_ROUTING=intent
swift runswift run ToyBot --provider ollama --base-url http://localhost:11434 --model llama3.2 --routing intent
swift run ToyBot --provider openai --token sk-... --model gpt-4o-mini --routing tool-callingSupported flags: --provider, --base-url, --model, --token, --ollama-host, --routing
Use one-shot mode to run a single prompt and print only the final model answer.
swift run ToyBot --routing intent -c "Кратко объясни что делает файл Sources/toy-bot/ToyBot.swift"Prompt flags:
-c "prompt"--prompt "prompt"
When one-shot mode is used, the CLI exits after one response. Errors are written to stderr and return non-zero exit code.
- CLI arguments
- Environment variables
- Built-in defaults
- Token is not required for local Ollama;
Authorizationheader is omitted when no token is set. - Request timeout is set to 5 minutes to accommodate slow local inference.
- If you run Ollama remotely and need auth, pass
--tokenorTOYBOT_API_TOKEN. - Structured outputs for the intent router require a recent Ollama that supports
response_format/ JSON schema on/v1/chat/completions.
Skill contributions are welcome. Please keep skills focused and small-model friendly.
- One skill = one concrete job (avoid broad "developer assistant" prompts)
- Keep prompts short and specific
- Add 1-3 high-quality few-shot examples
- Define a strict output shape in the prompt (format first, style second)
- Prefer tasks with constrained input/output over open-ended reasoning
---
id: your-skill-id
name: Human Friendly Name
description: One-line router-facing description
output_format: free_text
---
System prompt text here.
---examples---
user: Example input
assistant: Example output- Place the file under
skills/and use lowercase kebab-case filename - Ensure
idmatches the filename (without.md) - Validate that examples are realistic and deterministic
- Run
swift buildbefore opening a PR
