agent-local — Reusable Local LLM Agent Platform

A business-agnostic, multi-tier local LLM agent core that teams can adopt across domains. The valuable logic — grammar-constrained routing, an adaptive reasoning loop, objective escalation and a deterministic policy gate — lives in a reusable core/. A new domain is a thin usecases/<name>/ folder, never a fork of the core (see ADR-001).

The shipped example use-case, tienda, is a WhatsApp store assistant.

🧬 Part of a lineage — this is chapter three, not a standalone repo

This repository is the LLM plane of a deliberately connected ecosystem, and the third step of a single evolution:

ML-MLOps Portfolio — three production ML services; the lessons were paid for here.

ML-MLOps Production Template — those lessons encoded as a reusable, governed scaffold for tabular ML on Kubernetes.

agent-local (this repo) — the same governance philosophy (AUTO/CONSULT/STOP, eval-gated autonomy, policy-as-data, no fine-tuning yet) generalized to a new domain: local LLM agents.

The two repos are siblings with an explicit, bidirectional contract, not copies: agent-local reuses the template's Terraform/Kustomize when it needs cloud, and runs the template's ADR-028 day-2 maintenance lanes on its local tiers. The shared plan ACTION_PLAN_LLM_AGENT.md governs both planes. See the template's "Local model plane" section and this repo's ADR-001.

Status: Phase 1 (read-only, fixtures). Routing quality gate PASSED (19/20) on the Tier-0 router. Code is structured for the full multi-tier stack.

Why this exists

Most "LLM agent" code couples the loop, prompts and business rules into one app. That doesn't scale to multiple use-cases: the safety-critical logic diverges across copies. Here, that logic is centralized and consumed by configuration:

core/                 # business-agnostic engine — single source of truth
  config.py           #   UsecaseConfig loader
  schemas.py          #   typed Pydantic contracts
  router.py           #   Tier-0 router (GBNF-constrained JSON)
  tiers.py            #   tier clients (endpoints injected from config)
  tools.py            #   ToolRegistry (per-use-case namespaces)
  retrieval.py        #   BM25 index + semantic_retrieval factory
  policy.py           #   deterministic policy gate (rules are data)
  agent.py            #   the 7-station loop
  __init__.py         #   load_agent(name)
usecases/
  tienda/             # example use-case (config + tools + data + prompts + evals)
    config.yaml       #   endpoints, allowed_intents, policy rules, prompt templates
    tools.py          #   build_registry(config) -> ToolRegistry
    prompts/ grammars/ data/ policies/ budgets.yaml evals/sets/
app/
  main.py             # FastAPI surface; loads a use-case via AGENT_USECASE

Architecture: the loop

Customer ─▶ FastAPI ─▶ Agent.handle()
                          │
   1. route    (Tier 0, GBNF)        → intent / tier / risk / confidence
   2. plan     (Tier N)              → list of tool calls
   3. tools    (APP executes)        → observations
   4. reflect  (conditional)         → only on tool-failure or risk ≥ medium
   5. generate (Tier N)              → draft answer
   6. critic   (Tier N/N+1)          → verify against observations (risk ≥ medium)
   7. policy   (deterministic)       → MANDATORY gate; no response bypasses it
   8. finalize                       → answer + metrics

Adaptive depth: simple smalltalk goes plan → tools → policy → final without paying for reflection/critique.

Objective escalation (in code, never in the prompt): confidence < 0.70 bumps a tier; a critic rejection bumps once; Tier-3 requires explicit budget permission.

Quickstart

Prerequisites

Python 3.11+
A llama.cpp llama-server build and a GGUF router model (Tier 0).

Install

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"     # or: pip install -r requirements-dev.txt

Run the Tier-0 router

llama-server -m /path/to/router-model.gguf --port 8091 -ngl 99 -c 8192 --host 127.0.0.1

Use it

# Tests (no model required)
pytest

# Routing eval (gate: >= 18/20 intent accuracy)
python evals/run.py 01_intent.jsonl --usecase tienda

# Dev API
AGENT_USECASE=tienda python -m app.main
curl -X POST http://localhost:8000/dev/message \
  -H "Content-Type: application/json" \
  -d '{"text": "tienen coca de 600 fria?"}'

Docker (app + llama.cpp Tier-0)

cp .env.example .env       # set MODELS_DIR to your host model directory
docker compose up --build

Models are mounted as a read-only volume — never baked into the image.

Add your own use-case

usecases/<name>/
├── __init__.py        # from .tools import build_registry
├── config.yaml        # endpoints, allowed_intents, policy rules, prompts
├── prompts/router.md
├── grammars/route.gbnf
├── tools.py           # build_registry(config) -> ToolRegistry
├── data/              # fixtures (Phase 1) or API clients (Phase 2)
├── policies/*.md      # BM25-indexed docs
├── budgets.yaml
└── evals/sets/*.jsonl

Then: AGENT_USECASE=<name> python -m app.main. See the full authoring guide docs/usecases.md (contract, consumption modes, bring-your-own-models) and CONTRIBUTING.md.

Acceptance gates

Phase	Gate	Status
F0	Tier-0 router speed ≥ 25 tok/s	✅ (see `bench/RESULTS.md`)
F1	Routing intent accuracy ≥ 18/20	✅ 20/20
F1	All tools read-only (`order_create` dry-run)	✅
F1	Deterministic policy gate enforced	✅
F2.0	ExecutiveController + per-tier circuit breaker	✅
F2.0	Tier-client retry/backoff (transient blips ≠ tier failure)	✅
F1.6	Latency-budget enforced (safe degrade past deadline)	✅

Non-negotiable principles

No fine-tuning at this stage — routing + prompts + retrieval.
The model never mutates critical state without the policy gate — enforced structurally by the fail-closed tool capability contract (ADR-006).
Every lane needs an eval harness before increasing autonomy.
The simplest loop that works.
Inventory/price/stock are never held in model memory — always live tools.
Local-first; cloud only as explicit, budgeted overflow.

Roadmap

Phase 1 — Skeleton ✅ (this): core + use-case, routing gate, policy gate, Docker.
Phase 2 — executive controller, versioned YAML policies, verifier pass, 10 eval sets, SQLite queue + sagas for multi-day flows.
Phase 3 — telemetry (PII-redacted), shadow mode, retrieval growth loop.
Phase 4 — QLoRA (strategic gate; requires ≥4 weeks of logs + a new ADR).

Documentation

ADR-001 — reusable platform, not a copy template
ADR-002 — calibrated infrastructure
ADR-003 — policy rules as versioned data
ADR-004 — cross-tier verification
ADR-005 — decision telemetry as a contract
ADR-006 — fail-closed tool capability contract
ADR-007 — structured tool-calling contract
CHANGELOG.md — version history
CONTRIBUTING.md — dev setup, adding use-cases, quality gates
SECURITY.md — security model and reporting
bench/RESULTS.md — benchmark + routing gate evidence

License

Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
app		app
bench		bench
core		core
docs		docs
evals		evals
tests		tests
usecases		usecases
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
conftest.py		conftest.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
quickstart.sh		quickstart.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-local — Reusable Local LLM Agent Platform

🧬 Part of a lineage — this is chapter three, not a standalone repo

Why this exists

Architecture: the loop

Quickstart

Prerequisites

Install

Run the Tier-0 router

Use it

Docker (app + llama.cpp Tier-0)

Add your own use-case

Acceptance gates

Non-negotiable principles

Roadmap

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-local — Reusable Local LLM Agent Platform

🧬 Part of a lineage — this is chapter three, not a standalone repo

Why this exists

Architecture: the loop

Quickstart

Prerequisites

Install

Run the Tier-0 router

Use it

Docker (app + llama.cpp Tier-0)

Add your own use-case

Acceptance gates

Non-negotiable principles

Roadmap

Documentation

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages