Phase-specific, budget-aware context compilation for tool-using AI agents.
600+ tests passing · minimal core dependencies · deterministic by default · Python ≥ 3.10
Even with 200K-token context windows, dumping everything into the prompt is expensive, slow, and degrades output quality. More context ≠ better answers.
Imagine a tool-using agent with a 100-tool catalog and a 50-turn conversation history. At each step the agent must answer four questions:
- Route — which tool should I call?
- Call — what arguments?
- Interpret — what did it return?
- Answer — how do I respond to the user?
Naive approach A — concatenate everything:
100 tool schemas (≈50k tokens) + 50 turns (≈30k tokens) = 80k tokens
Cost: $0.48/request at GPT-4o rates · Latency: 3–5s TTFT
Quality: LLM loses focus — needle-in-haystack accuracy drops with context size
Token limit: 8k → 10× overflow
Naive approach B — cherry-pick manually:
Pick 10 tools, last 5 turns → lose dependency chains
Agent hallucinates tool calls, repeats questions, forgets context
contextweaver approach — phase-specific budgeted compilation:
Route phase: 5 tool cards (≈500 tokens), no full schemas
Answer phase: 3 relevant turns + dependency closure (≈2k tokens)
Result: 2.5k tokens, complete context, deterministic
Cost: 70% lower · Latency: sub-second · Quality: relevant context only
See examples/before_after.py for a runnable side-by-side comparison.
contextweaver provides two cooperating engines:
┌────────────────────────────┐
Events ──────>│ Context Engine │──> ContextPack (prompt)
│ candidates → closure → │
│ sensitivity → firewall → │
│ score → dedup → select → │
│ render │
└────────────────────────────┘
▲ facts / episodes
┌──────────┴─────────────────┐
Tools ───────>│ Routing Engine │──> ChoiceCards
│ Catalog → TreeBuilder → │
│ ChoiceGraph → Router │
└────────────────────────────┘
Context Engine — eight-stage pipeline:
- generate_candidates — pull phase-relevant events from the log for this request.
- dependency_closure — if a selected item has a
parent_id, include the parent automatically. - sensitivity_filter — drop or redact items at or above the configured sensitivity floor.
- apply_firewall — tool results are stored out-of-band; large outputs are summarized/truncated before prompt assembly.
- score_candidates — rank by recency, tag match, kind priority, and token cost.
- deduplicate_candidates — remove near-duplicates using Jaccard similarity.
- select_and_pack — greedily pack highest-scoring items into the phase token budget.
- render_context — assemble final prompt string with
BuildStatsmetadata.
Routing Engine — four-stage pipeline:
- Catalog — register and manage
SelectableItemobjects. - TreeBuilder — convert a flat catalog into a bounded
ChoiceGraphDAG. - Router — beam-search over the graph; deterministic tie-breaking by ID.
- ChoiceCards — compact, LLM-friendly cards (never includes full schemas).
pip install contextweavercontextweaver ships with a minimal, opinionated core: tiktoken,
PyYAML, and rank-bm25. These power accurate token budgeting, YAML
catalog/config files, and the default lexical retrieval backend.
Optional capabilities are gated behind extras so the core install stays small:
| Extra | What it adds |
|---|---|
contextweaver[cli] |
Rich-formatted CLI rendering (rich) |
contextweaver[retrieval] |
Fuzzy lexical matching backend (rapidfuzz) |
contextweaver[otel] |
OpenTelemetry tracing + metrics export |
contextweaver[ann] |
Approximate-nearest-neighbour backend (reserved) |
contextweaver[graph] |
NetworkX-backed graph ops (reserved) |
contextweaver[fastmcp] |
FastMCP catalog adapter |
contextweaver[langchain] |
LangChain integration helpers |
contextweaver[all] |
All optional capabilities |
Or from source:
git clone https://github.com/dgenio/contextweaver.git
cd contextweaver
pip install -e ".[dev]"For a guided setup with prerequisites, three runnable examples, expected output, and next steps, see docs/quickstart.md.
from contextweaver.context.manager import ContextManager
from contextweaver.types import ContextItem, ItemKind, Phase
mgr = ContextManager()
mgr.ingest(ContextItem(id="u1", kind=ItemKind.user_turn, text="How many users?"))
mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call,
text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1"))
mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result,
text="count: 1042", parent_id="tc1"))
pack = mgr.build_sync(phase=Phase.answer, query="user count")
print(pack.prompt) # budget-aware compiled context
print(pack.stats) # what was kept, dropped, deduplicatedfrom contextweaver.routing.catalog import Catalog, load_catalog_json
from contextweaver.routing.tree import TreeBuilder
from contextweaver.routing.router import Router
catalog = Catalog()
for item in load_catalog_json("catalog.json"):
catalog.register(item)
graph = TreeBuilder(max_children=10).build(catalog.all())
router = Router(graph, items=catalog.all(), beam_width=3, top_k=5)
result = router.route("send a reminder email about unpaid invoices")
print(result.candidate_ids)For a complete route -> call -> interpret -> answer reference flow, see:
examples/full_agent_loop.pyfor a runnable end-to-end script.docs/guide_agent_loop.mdfor the flow diagram, pseudo-code, and module map.
The runtime loop example demonstrates:
- Route-phase prompt assembly with ChoiceCards.
- Call-phase prompt assembly with selected tool schema hydration.
- Interpret-phase firewall behavior (large tool output summarized into context).
- Answer-phase context composition with accumulated history and result envelopes.
Looking for "where does contextweaver fit alongside my runtime?" — start with the How contextweaver Fits positioning page, then jump into the Cookbook for copy-paste recipes.
| Framework | Guide | Use Case |
|---|---|---|
| MCP | Guide | Tool conversion, session loading, firewall · Security note |
| A2A | Guide | Agent cards, multi-agent sessions |
| FastMCP | Cookbook recipe | Composed MCP servers → bounded-choice routing |
| LlamaIndex | Guide | RAG + tools with budget control |
| OpenAI Agents SDK | Guide | Swarm hand-offs with unified context |
| Google ADK / Vertex AI | Guide | Gemini tool-use with context budgets |
| LangChain + LangGraph | Guide | Chain + graph agents with firewall |
| Pipecat | Guide | Real-time voice agents with async context build |
| Concept | Description |
|---|---|
ContextItem |
Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. |
Phase |
route / call / interpret / answer — each with its own token budget. |
ContextFirewall |
Intercepts tool results: stores raw bytes out-of-band, injects compact summary (with truncation for large outputs). |
ChoiceGraph |
Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. |
ResultEnvelope |
Structured tool output: summary + extracted facts + artifact handles + views. |
BuildStats |
Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. |
See docs/concepts.md for the full glossary,
docs/architecture.md for pipeline detail and design rationale,
and docs/troubleshooting.md for common issues, debugging
techniques, and performance optimisation tips.
contextweaver is built for production use with comprehensive quality gates:
- 600+ passing tests across all modules — context pipeline, routing engine, firewall, adapters, stores, CLI, sensitivity enforcement
- mypy strict type checking — zero errors across all source files
- ruff clean linting — zero warnings
- CI pipeline on every pull request and on pushes to
main(see workflows) - Deterministic by default — tie-break by ID, sorted keys; identical inputs always produce identical outputs. Configurable retrieval backends (TF-IDF, BM25, fuzzy) preserve determinism within each mode.
Run the full suite yourself:
git clone https://github.com/dgenio/contextweaver.git
cd contextweaver
pip install -e ".[dev]"
make ci # fmt + lint + type + test + example + demo (all pass)Most agent libraries fail unpredictably when context exceeds token limits. contextweaver's deterministic design and comprehensive test coverage ensure your agent behaves the same way every time — critical for debugging, testing, and production deployment.
Every architectural choice was made for a reason:
| Decision | Reason |
|---|---|
| Zero runtime dependencies | No version conflicts, no supply-chain risks, no bloat. Works in any Python 3.10+ environment. |
| Protocol-based interfaces | EventLog, ArtifactStore, EpisodicStore, FactStore are typing.Protocol — swap backends without forking. |
| Async-first context engine | Async-compatible compilation API for real-time integrations; build_sync() wrappers for synchronous callers, with room for future non-blocking execution. |
| Phase-specific token budgets | Route / call / interpret / answer phases each get their own budget — no one-size-fits-all truncation. |
| Context firewall | Large tool outputs stored out-of-band; only compact summaries reach the prompt. |
| Dependency closure | parent_id chains keep tool results coherent — tool calls are never separated from their results. |
These aren't accidental features. They are design decisions optimized for reliability, extensibility, and production use. Zero dependencies means you can adopt contextweaver without disrupting your existing stack.
See docs/architecture.md for full pipeline detail and design rationale.
contextweaver supports both emerging agentic protocols out of the box:
MCP (Model Context Protocol) — convert tool definitions and results into native contextweaver types:
- Compatible with any MCP server (Claude Desktop, VS Code, custom servers)
- Structured content, output schemas, binary artifacts, and per-part annotations all handled
ingest_mcp_result()for one-call result ingestion with automatic artifact persistence
A2A (Agent-to-Agent) — multi-agent session management with unified context:
- Agent cards converted to
SelectableItemfor routing - Cross-agent session loading via
load_a2a_session_jsonl() - A2A results stored in
ResultEnvelopewith facts and artifact handles
weaver-spec — canonical contracts for the Weaver Stack (contextweaver, ChainWeaver, agent-kernel):
- Lossless
to_weaver_*/from_weaver_*round-trips forSelectableItem,ChoiceCard,RoutingDecision, andFrame(viaResultEnvelope) weaver_contractsis an opt-in dep —pip install 'contextweaver[weaver-spec]'- Validated in CI on every PR against the JSON Schemas at
raw.githubusercontent.com/dgenio/weaver-spec/main/contracts/json/(the source the gate fetches; the same documents are also published athttps://weaver-spec.dev/contracts/v0/)
contextweaver is positioned to become the standard context management layer for AI agents. Supporting MCP, A2A, and weaver-spec now means your codebase is future-proof as these protocols mature and gain wider adoption.
contextweaver works with any LLM provider and any agent framework:
- LLM providers: OpenAI, Anthropic, Google, open-source models — no API keys required by contextweaver itself
- Agent frameworks: LlamaIndex, LangChain, LangGraph, OpenAI Agents SDK, Google ADK, Pipecat, custom loops
- No vendor lock-in: stdlib-only core; no cloud dependencies; runs anywhere Python 3.10+ runs
| Framework | Guide | Use Case |
|---|---|---|
| MCP | Guide | Tool conversion, session loading, firewall |
| A2A | Guide | Agent cards, multi-agent sessions |
| FastMCP | Cookbook recipe | Composed MCP servers → bounded-choice routing |
| LlamaIndex | Guide | RAG + tools with budget control |
| OpenAI Agents SDK | Guide | Swarm hand-offs with unified context |
| Google ADK / Vertex AI | Guide | Gemini tool-use with context budgets |
| LangChain + LangGraph | Guide | Chain + graph agents with firewall |
| Pipecat | Guide | Real-time voice agents with async context build |
You are not locked into a specific framework or LLM provider. contextweaver is a layer beneath frameworks — context management as a composable primitive.
contextweaver follows Semantic Versioning:
- Breaking changes to public APIs only in major versions
- Deprecation policy: deprecated public APIs are warned for at least one minor version and removed only in a later major release
- API stability: public APIs in
contextweaver.*are stable; internal_*modules may change - Python support: 3.10+ (aligned with Python's active security support lifecycle)
| Version | Status | Notes |
|---|---|---|
| 0.1.x | ✅ Current | Foundation engines (context + routing), MCP/A2A adapters, CLI, sensitivity |
| 0.2.0 | 🚧 In progress (Q2 2026) | Framework integration guides, benchmark suite, distributed stores |
| 0.3.0 | 📋 Planned (Q3 2026) | DAG visualization, merge compression, LLM-assisted labeler |
| 1.0.0 | 📋 Planned (Q4 2026) | API freeze, production benchmarks, enterprise features |
Adopting a library is a long-term commitment. contextweaver's versioning policy ensures you can upgrade safely, and the roadmap shows where it's headed.
contextweaver implements weaver_contracts >= 0.2.0, < 1.0 (canonical
contracts for the Weaver Stack — see
weaver-spec).
| Invariant | Status | Where enforced |
|---|---|---|
| I-03 — Routing presents bounded choices, not full schema catalogs | ✅ Satisfied | ChoiceCard strips args_schema; routing returns ≤ top_k cards. See src/contextweaver/routing/cards.py and docs/gateway_spec.md. |
| I-05 — contextweaver receives Frames, not raw output | The canonical Frame-shaped ingestion lives on the spec adapter: tool outputs above firewall_threshold are stored out-of-band as ArtifactRefs and the LLM-facing ResultEnvelope maps to a spec Frame via adapters/weaver_contracts.py. The legacy raw-output ingestion APIs (ContextManager.ingest_tool_result(raw_output=...), ingest_mcp_result(...)) still exist for backwards compatibility; treat them as non-canonical for spec compliance. |
Contract adapters (pip install 'contextweaver[weaver-spec]'):
from contextweaver.adapters.weaver_contracts import (
to_weaver_routing_decision,
from_weaver_routing_decision,
to_weaver_frame,
from_weaver_frame,
)Round-trips are lossless via a reserved metadata["_contextweaver"] payload;
see docs/weaver_spec_mapping.md for the full
mapping table.
CI conformance — every PR runs scripts/weaver_spec_conformance.py,
which does both a Python round-trip (cw → spec → cw == cw) and JSON-Schema
validation. CI fetches the schemas from
raw.githubusercontent.com/dgenio/weaver-spec/main/contracts/json/, which
mirrors the published documents at https://weaver-spec.dev/contracts/v0/
(same content, different host). Run locally with make weaver-conformance.
v0.1 (✅ Complete)
- Context Engine: 8-stage pipeline (candidates → closure → sensitivity → firewall → score → dedup → select → render)
- Routing Engine: Catalog, DAG builder, beam-search router, choice cards
- Protocol adapters: MCP (full content types, structured content, output schemas) and A2A
- Stores:
EventLog,ArtifactStore,EpisodicStore,FactStorewith protocol-based interfaces - 600+ passing tests, mypy strict, ruff clean, minimal core dependencies
v0.2 (🚧 In Progress — Q2 2026)
- Framework integration guides: LlamaIndex, LangChain, LangGraph, OpenAI Agents SDK, Google ADK, Pipecat
- Benchmark suite: token reduction, latency, and accuracy vs. naive concatenation
- Distributed stores: Redis-backed
EventLog, S3-backedArtifactStore
v0.3 (📋 Planned — Q3 2026)
- DAG visualization: interactive routing graph inspector
- Merge compression: deduplicate similar tool results across turns
- LLM-based labeler: auto-generate namespace labels for tool catalogs
- LLM-based extractor: structured fact extraction with prompt-based schema
v1.0 (📋 Planned — Q4 2026)
- API freeze: no breaking changes in 1.x releases
- Production benchmarks: 1M+ turn deployments
- Enterprise features: audit logging, compliance tags, PII redaction
Community:
- GitHub Discussions — ask questions, share patterns
- GitHub Issues — report bugs, request features
- CHANGELOG — track every release
contextweaver is under active development with a clear roadmap. v0.1 is feature-complete for basic use cases; v0.2 adds production-ready integrations; v1.0 is the API stability milestone.
| Approach | Token Control | Tool Routing | Firewall | Framework Agnostic | Dependencies |
|---|---|---|---|---|---|
| Naive concatenation | ❌ No | ❌ No | ❌ No | ✅ Yes | None |
| LangChain ConversationBufferMemory | ❌ No | ❌ No | ❌ No | ❌ No (LangChain only) | Many |
| LangChain ConversationSummaryMemory | ❌ No | ❌ No | ❌ No (LangChain only) | Many | |
| LlamaIndex ContextManager | ❌ No | ❌ No | ❌ No (LlamaIndex only) | Many | |
| contextweaver | ✅ Yes (phase-specific budgets) | ✅ Yes (bounded DAG) | ✅ Yes (out-of-band storage) | ✅ Yes | None |
Most frameworks offer memory classes, but they don't enforce token budgets, route tools, or handle large outputs. contextweaver provides all three as a composable, framework-agnostic layer.
contextweaver ships with a CLI for quick experimentation:
contextweaver demo # end-to-end demonstration
contextweaver init # scaffold config + sample catalog
contextweaver build --catalog c.json --out g.json # build routing graph
contextweaver route --graph g.json --query "send email"
contextweaver print-tree --graph g.json
contextweaver ingest --events session.jsonl --out session.json
contextweaver replay --session session.json --phase answer| Script | Description |
|---|---|
minimal_loop.py |
Basic event ingestion → context build |
full_agent_loop.py |
End-to-end route → call → interpret → answer runtime loop |
tool_wrapping.py |
Context firewall in action |
routing_demo.py |
Build catalog → route queries → choice cards |
before_after.py |
Side-by-side token comparison: WITHOUT vs WITH contextweaver |
mcp_adapter_demo.py |
MCP adapter: tool conversion, session loading, firewall |
a2a_adapter_demo.py |
A2A adapter: agent cards, multi-agent sessions |
langchain_memory_demo.py |
LangChain memory replacement: InMemoryChatMessageHistory vs contextweaver |
cookbook/byot_recipe.py |
Bring-your-own-tools cookbook recipe — wrap plain Python callables and route |
cookbook/firewall_drilldown_recipe.py |
Cookbook recipe: firewall a large tool result, then drill into the artifact |
make example # run all examplesQ: What token budgets should I use?
Start with the defaults (route=2000, call=3000, interpret=4000, answer=6000).
Inspect pack.stats after each build and increase any phase that drops too many items.
Q: My tool result was summarized. Why?
The context firewall intercepts every tool_result item (not just large ones).
Raw data is stored out-of-band; access it via mgr.artifact_store.get("artifact:<item_id>").
Provide a custom Summarizer to control how the summary is generated.
Q: How do I debug what was kept or dropped?
Inspect pack.stats (a BuildStats object) after every build_sync() / build() call:
included_count, dropped_count, dropped_reasons, dedup_removed.
Q: Does this work with [framework X]?
Yes, contextweaver is framework-agnostic — it compiles context; you send pack.prompt
to any LLM or framework. See dedicated guides for
MCP,
A2A,
LlamaIndex,
LangChain + LangGraph,
OpenAI Agents SDK,
Google ADK / Vertex AI, and
Pipecat. If your runtime isn't listed, the
bring-your-own-tools cookbook recipe
is the canonical starting point.
Q: What's the performance overhead?
Typically 10–50 ms for a context build (depends on event log size and deduplication).
For real-time / async agents, run build_sync() in a worker thread (e.g.
await asyncio.to_thread(mgr.build_sync, phase, query)) so the synchronous
pipeline does not block the event loop.
See docs/troubleshooting.md for the full troubleshooting guide, debugging techniques, optimisation tips, and 10+ common issues with solutions.
make fmt # format (ruff)
make lint # lint (ruff)
make type # type-check (mypy)
make test # run tests (pytest)
make example # run all examples
make demo # run the built-in demo
make ci # all of the aboveSee CONTRIBUTING.md for setup instructions.
| Milestone | Status | Highlights |
|---|---|---|
| v0.1 — Foundation | ✅ complete | Context Engine, Routing Engine, MCP + A2A adapters, CLI, sensitivity enforcement, logging |
| v0.2 — Integrations | 🚧 in progress | Framework integration guides (LlamaIndex, OpenAI Agents SDK, Google ADK, LangChain) |
| v0.3 — Tooling | 📋 planned | DAG visualization, merge compression, LLM-assisted labeler |
| Future | 📋 planned | Context versioning, distributed stores, multi-agent coordination |
See CHANGELOG.md for the detailed release history.
Apache-2.0