A minimal, transparent multi-step (ReAct) document-analyst agent — one Python file, pure stdlib, no framework. Point it at a folder of documents, give it a task, and it iterates think → tool → observe until it can write a report in which every claim carries file:line provenance.
Built by Marcin J. Ołdak to show what an agentic workflow looks like when you can read all of it in ten minutes: planning loop, tool dispatch, grounding rules, human-approval gate, and a test harness that runs the whole loop with a mock LLM — no API key needed.
┌────────────────────────────────────────────┐
│ LLM (any CLI: `claude -p` by default) │
│ returns one JSON action per step │
└────────────┬───────────────────────────────┘
│ {"thought", "action", "args"}
┌────────────▼───────────────┐
│ agent loop (agent.py) │◄─── optional human
│ parse → approve? → run │ approval gate
└────┬───────────┬───────────┘
│ │
┌────────▼──┐ ┌─────▼────────────────────────┐
│ tools │ │ finish(report) │
│ list_files│ │ → markdown with ## Sources │
│ search_docs (BM25 + file:line provenance) │
│ read_file (sandboxed to the corpus folder) │
└───────────┘ │
Requires Python 3.9+ and any LLM CLI. Default backend is the Claude Code CLI (claude -p).
python3 agent.py \
--docs examples/sample-docs \
--task "What is the invoice approval process and who signs off on a 60,000 PLN invoice?" \
--out report.md --verboseReal output of exactly this command: examples/sample-report.md (and the full step trace in examples/sample-trace.json). The agent listed the corpus, ran one search, and wrote a report citing both policy files — combining the approval path (from invoicing-policy.md) with the threshold tiers (from delegation-matrix.md) to answer who signs off a 60,000 PLN invoice.
python3 agent.py --docs <folder> --task "..." --approve--approve stops before every tool call: approve it, or reject it with feedback — the feedback is injected into the agent's context as an observation, steering the next step. The same pattern (AI proposes, human signs off) I use in production tools for accounting workflows, where unsupervised automation is a non-starter.
The agent shells out to a CLI that reads a prompt on stdin and prints a completion:
AGENTFLOW_LLM_CMD="claude -p" # default
AGENTFLOW_LLM_CMD="ollama run llama3" # local model
AGENTFLOW_LLM_CMD="python3 tests/mock_llm.py" # deterministic mock (tests)- No framework. The point is to show the mechanics. A ReAct loop is ~150 lines; reading them beats trusting a black box.
- Grounding is enforced by structure, not vibes. The only knowledge source is the tool output; search results carry
file:start-endspans; the system prompt requires citations and an explicit "the corpus doesn't say" when retrieval comes up empty. - Lexical retrieval (BM25), not embeddings. For small project/client corpora, terminology-heavy search with zero dependencies and zero index maintenance wins. The scorer includes a cheap bidirectional-prefix match that handles inflection in morphology-rich languages (built with Polish in mind: faktura / faktury / fakturze).
- Sandboxed tools.
read_fileresolves paths and refuses anything outside the corpus folder. - Bounded loop. A step budget forces a final report ("here's what I found and what remains unknown") instead of an infinite research spiral; a one-shot JSON-repair retry handles malformed model output.
- Testable without an API.
tests/mock_llm.pyreplays a fixed action sequence, so CI can exercise the full loop — parsing, dispatch, sandbox, report writing — deterministically.
python3 -m unittest discover tests -vFive tests: retrieval provenance, no-match honesty, path-escape denial, in-corpus read, and the full agent loop with the mock LLM.
- BM25 is lexical — conceptual paraphrases may need a reworded query (the agent usually does this itself; the prompt tells it to).
- One agent, sequential tools. Orchestration of multiple agents is intentionally out of scope — for that I use purpose-built orchestrators; this repo is about making one agent's reasoning legible.
- Text formats only (
.md,.txt,.rst,.csv,.org,.adoc). Convert PDFs first.
MIT — see LICENSE.