candle-cli

Rust-first agentic CLI with multi-agent coordination, layered memory, sandboxed execution, and observability. Inspired by candle's lightweight AI runtime philosophy.

Highlights

Agentic tool loop — bounded multi-step execution with sub-agent task delegation
Streaming output — real-time token-by-token display as the model generates
Layered memory — session memory + project-level persistent memory
Sandboxed shell — optional Docker container isolation with network cutoff
Multi-model — DeepSeek, Ollama, vLLM, OpenAI via persistent Python bridge
Permission control — four modes with path boundary enforcement and interactive confirmation
Observability — /tools, /status (with token estimation), /trace (with millisecond timing and JSON export)
Fault tolerance — API retry with exponential backoff (4xx not retried), shell timeout with kill
Rust core + Python bridge — Rust owns CLI, agent loop, tools, permissions; Python bridges model backends with persistent worker

Quickstart

git clone https://github.com/DuangZ-GR/candle-cli.git
cd candle-cli
cargo build

Recommended: DeepSeek API

export CANDLE_CLI_RUNTIME="bridge"
export CANDLE_CLI_API_BASE_URL="https://api.deepseek.com/v1"
export CANDLE_CLI_API_KEY="YOUR_DEEPSEEK_API_KEY"
export CANDLE_CLI_MODEL_ID="deepseek-v4-flash"

cargo run -- prompt "Read README.md and summarize this project"
cargo run --

Local fallback: Ollama

ollama pull qwen2:0.5b

export CANDLE_CLI_RUNTIME="bridge"
export CANDLE_CLI_API_BASE_URL="http://localhost:11434/v1"
export CANDLE_CLI_API_KEY="ollama"
export CANDLE_CLI_MODEL_ID="qwen2:0.5b"

cargo run -- prompt "Hello, introduce yourself"

Small models (0.5B–3B) may not reliably follow the tool-call protocol. Use 7B+ or API models for agentic tasks.

Usage

Command	Purpose
`cargo run -- prompt "..."`	One-shot prompt and exit
`cargo run --`	Interactive REPL with readline editing
`cargo run -- harness`	Run automated scenario benchmark
`cargo run -- doctor`	Print runtime status

REPL commands

Command	Alias	Purpose
`/help`	`/h`	Show available commands
`/exit`	`/quit`, `/q`	Exit and save session
`/session`	`/info`	Show session metadata
`/status`		Show runtime, model, permission status
`/tools`		List registered tools
`/trace`		Show execution trace with timing; `--json` for structured export
`/system`		Show active system prompt
`/name <label>`		Name current session
`/memory`		Manage project memory (file/cmd/note subcommands)
`/clear`		Clear current session
`/list`	`/ls`	List saved sessions
`/resume <id>`		Resume a saved session
`/save`		Save current session

Agentic system

Tool call protocol

Models request tools via text JSON blocks:

<tool_call>{"id":"call-1","name":"read","input":{"file_path":"README.md"}}</tool_call>

Rust parses the block, executes the tool, records the result in session, and feeds it back to the model. The loop continues until the model produces a final answer or reaches the maximum step count (8). A fallback parser also accepts function-style calls: read({"file_path":"README.md"}).

Available tools

Tool	Input	Purpose	Mutates
`pwd`	`{}`	Show workspace directory	No
`read`	`{"file_path":"README.md"}`	Read a UTF-8 file (path boundary enforced)	No
`glob`	`{"pattern":"src/*/.rs"}`	Find files by pattern	No
`grep`	`{"pattern":"fn main","path":"src"}`	Search file contents recursively	No
`web_search`	`{"query":"today weather"}`	Web search via DuckDuckGo/Sogou fallback	No
`task`	`{"description":"analyze this code"}`	Delegate to read-only sub-agent (3-step loop)	No
`edit`	`{"file_path":"Cargo.toml","old_string":"0.1.0","new_string":"0.3.0"}`	Replace exactly one text occurrence	Yes
`shell`	`{"command":"cargo test"}`	Run shell command with timeout	Possible

Permission modes

Mode	Behavior
`read-only`	Allow `pwd`, `read`, `glob`, `grep` only
`workspace-write` (default)	Allow all tools without confirmation
`prompt`	Auto-allow read tools; confirm `edit`, `write`, `shell` interactively
`danger-full-access`	Allow all tools without confirmation

Multi-agent coordination

The task tool spawns a sub-agent with read-only permission and a 3-step bounded loop. The main agent can delegate code analysis, research, or verification subtasks to isolated sub-agents and receive structured results.

Layered memory

Session memory: dialogue history persisted as JSON, supports list/resume/clear
Project memory: .candle-cli/memory.json stores key files, common commands, and free-form notes; automatically injected into system prompt

/memory file src/main.rs
/memory cmd cargo test
/memory note build=takes ~5s on 4090

Sandboxed execution

Set CANDLE_CLI_SANDBOX=docker to run shell commands in an isolated Alpine container with read-only workspace mount and network disabled.

Fault tolerance

API retry with exponential backoff (3 attempts, 1s/2s/4s, 4xx not retried)
Shell timeout with SIGKILL (configurable via CANDLE_CLI_SHELL_TIMEOUT_SECS)

RAG pre-search

Before each turn, the context builder extracts keywords from the user message, runs grep against src/, and injects matching code snippets into the prompt. Greetings and chat messages are automatically detected and skipped.

Observability

/tools — system capability boundary
/status — runtime snapshot (session, model, permission, configuration)
/trace — execution chain with per-step millisecond timing; --json for structured analysis

Harness

cargo run -- harness

Runs four predefined scenarios (read, glob, grep, shell) and produces a pass/fail report with timing, tool step counts, and a harness_report.json output.

Model backends

Set CANDLE_CLI_RUNTIME=bridge for real model calls (default mock for testing).

Backend	`CANDLE_CLI_API_BASE_URL`	`CANDLE_CLI_API_KEY`	`CANDLE_CLI_MODEL_ID`
DeepSeek	`https://api.deepseek.com/v1`	`YOUR_DEEPSEEK_API_KEY`	`deepseek-v4-flash`
Ollama	`http://localhost:11434/v1`	`ollama`	`qwen2:0.5b`
vLLM	`http://localhost:8000/v1`	`not-needed`	`Qwen/Qwen2-0.5B-Instruct`
OpenAI	`https://api.openai.com/v1`	`sk-xxx`	`gpt-4o-mini`

Local transformers model

python3 -m pip install -r requirements.txt

export CANDLE_CLI_RUNTIME="bridge"
export CANDLE_CLI_MODEL_ID="Qwen/Qwen2-0.5B-Instruct"
export CANDLE_CLI_MODEL_DEVICE="cpu"
export CANDLE_CLI_LOCAL_FILES_ONLY="false"

cargo run -- prompt "Hello"

Verbose diagnostics

Set CANDLE_CLI_VERBOSE=1 for API request details, token usage, timing, and GPU memory diagnostics on stderr.

Configuration

Variable	Default	Purpose
`CANDLE_CLI_RUNTIME`	`mock`	`mock` or `bridge`
`CANDLE_CLI_MODEL_ID`	`Qwen/Qwen2-0.5B-Instruct`	Model ID or local path
`CANDLE_CLI_MODEL_DEVICE`	auto	`cpu`, `cuda`, or `auto`
`CANDLE_CLI_LOCAL_FILES_ONLY`	`true`	Use only local files (no download)
`CANDLE_CLI_API_BASE_URL`	(empty)	OpenAI-compatible API base URL
`CANDLE_CLI_API_KEY`	(empty)	API key
`CANDLE_CLI_MAX_NEW_TOKENS`	`512`	Max generated tokens per turn
`CANDLE_CLI_TEMPERATURE`	`0.7`	Sampling temperature
`CANDLE_CLI_TOP_P`	`0.9`	Top-p sampling
`CANDLE_CLI_SYSTEM_PROMPT`	built-in	Override system prompt
`CANDLE_CLI_MAX_TURNS`	`20`	Max retained conversation turns
`CANDLE_CLI_PERMISSION`	`workspace-write`	Permission mode
`CANDLE_CLI_PERMISSION_RESPONSE`	(empty)	Pre-set prompt responses (`y`/`allow`/`deny`)
`CANDLE_CLI_SHELL_TIMEOUT_SECS`	`30`	Shell command timeout (seconds)
`CANDLE_CLI_SANDBOX`	(empty)	Set to `docker` for container isolation
`CANDLE_CLI_VERBOSE`	`false`	Print diagnostics to stderr
`CANDLE_CLI_MODEL_CONFIG`	(empty)	Optional JSON config file path
`CANDLE_CLI_SESSION_DIR`	system temp dir	Session storage directory

Examples

# Standalone inference tests
python3 examples/api_inference.py
python3 examples/qwen3_local_inference.py

# Multi-step agentic task
export CANDLE_CLI_RUNTIME="bridge"
export CANDLE_CLI_API_BASE_URL="https://api.deepseek.com/v1"
export CANDLE_CLI_API_KEY="YOUR_KEY"
export CANDLE_CLI_MODEL_ID="deepseek-v4-flash"

cargo run -- prompt "Read src/tools/registry.rs and summarize the tool dispatch logic"

# Harness benchmark
cargo run -- harness

Development

cargo fmt --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test
python3 -m pytest python/test_bridge_runtime.py -q

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
docs		docs
examples		examples
python		python
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
model_config.json		model_config.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

candle-cli

Highlights

Quickstart

Recommended: DeepSeek API

Local fallback: Ollama

Usage

REPL commands

Agentic system

Tool call protocol

Available tools

Permission modes

Multi-agent coordination

Layered memory

Sandboxed execution

Fault tolerance

RAG pre-search

Observability

Harness

Model backends

Local transformers model

Verbose diagnostics

Configuration

Examples

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

candle-cli

Highlights

Quickstart

Recommended: DeepSeek API

Local fallback: Ollama

Usage

REPL commands

Agentic system

Tool call protocol

Available tools

Permission modes

Multi-agent coordination

Layered memory

Sandboxed execution

Fault tolerance

RAG pre-search

Observability

Harness

Model backends

Local transformers model

Verbose diagnostics

Configuration

Examples

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages