《山海经·中山经》: "吉神泰逢司之,其状如人而虎尾,是好居于萯山之阳,出入有光,泰逢神动天地气也。"
Taifeng keeps its Chinese name because the source metaphor matters. The mythic Taifeng "moves the qi of heaven and earth"; the engineering Taifeng moves the invisible flows that make an LLM agent runtime work: tokens, events, cache anchors, tool calls, cancellation, and persisted turns.
Taifeng is a business-decoupled Python LLM agent microkernel / OS scheduler. It is designed for Python server-side systems that need an embeddable agent runtime with explicit control over skills, tools, context, persistence, permissions, and observability.
It follows the CLI-agent paradigm represented by codex, Claude Code, claw-code, and openclaw, but ports the core ideas into a Python infrastructure package. Taifeng learns the pattern; it does not copy implementation code.
Taifeng is not:
- a LangGraph / AutoGen / Letta replacement;
- a business framework or weaving layer;
- tied to any tenant model, product domain, or LLM provider;
- a memory-first agent platform.
Taifeng is:
- skill-native: skills are documented in
SKILL.md, not hidden behind function-tool wrappers; - scheduler-oriented: the LLM decides, while the engine owns concurrency, cancellation, cache safety, and persistence;
- cache-aware: compaction preserves cached prefixes whenever possible;
- observable and resumable: every important runtime path emits events, and the default transcript store is append-only JSONL;
- provider-flexible: OpenAI-compatible, Anthropic, Gemini, DeepSeek, and LiteLLM-backed models share one event stream shape.
- Markdown skills:
SKILL.mdfiles are loaded as first-class runtime capabilities. The LLM can lazilyread_skilland recursivelycall_skill. - Composite dispatch: atomic and composite skills support depth guards, cycle detection, permission checks, and hook gates.
- Declarative orchestration:
parallel,serial, andwhenplans run deterministically without extra LLM sampling. - Detached spawn and join: long-running child skills can run independently, suspend for HITL, resume, and join through barriers.
- Built-in tools: skill IO, file IO, shell execution, patching, background tasks, script execution, HTTP requests, user input, peer messaging, and todo state.
- HITL permissions: typed permission requests support Claude Code-style rules such as
Bash(...),Network(...), andSkill(...). - Hooks: pre/post tool, skill, script, turn, and compaction hooks give integrators policy injection points without business concepts in the kernel.
- Context compression: handoff, sliding-window, reactive overflow recovery, and surgical trim strategies keep turns inside budget while reporting cache impact.
- Persistence and resume: JSONL transcripts are the source of truth, with SQLite side indexes and thread-level resume.
- LLM conformance simulator: tests use
SimClientand golden shape fixtures instead of calling real APIs in CI. - MCP integration: Taifeng can consume external MCP tools and expose skills as an MCP server.
- Telemetry: Console, JSONL, and OpenTelemetry sinks cover the critical runtime path.
# Install with uv
uv venv
uv pip install -e ".[dev,litellm]"
# Run the test suite
PYTHONPATH=src uv run pytest tests/
# Run API-key-free examples backed by the simulator
PYTHONPATH=src uv run python examples/basic/minimal_chat.py
PYTHONPATH=src uv run python examples/basic/composite_skill.py
# Run a real-LLM example when provider credentials are configured
PYTHONPATH=src uv run python examples/real_llm/e2e.pyCreate a skill at ./skills/hello/SKILL.md, then wire the engine from the host application:
import taifeng
pool = await taifeng.EnginePool.create(
skills_dir="./skills",
storage_dir="./threads",
model_client=taifeng.LiteLLMClient(model="gpt-4o-mini"),
compressors=[taifeng.HandoffCompactionStrategy()],
)
engine = await pool.get_or_create(session_id="s1", entry_skill_id="hello")
sub_id = await engine.submit(taifeng.UserMessage(text="Hello"))
async for ev in engine.subscribe(sub_id):
if ev.msg.kind == "assistant_text":
print(ev.msg.data["delta"], end="", flush=True)
elif ev.msg.kind in ("turn_completed", "turn_failed"):
break
await pool.close()Terminology is intentionally strict:
session_idis an in-process routing key used byEnginePool.thread_idis the persistence and resume unit.conversation/is the persistence module name, not a runtimeconversation_id.
| Area | Implemented capabilities |
|---|---|
| Skill system | Markdown skills, lazy skill reading, recursive dispatch, declarative orchestration, runtime eligibility, script execution |
| Loop and tools | actor-style submission/event bus, cancellation, mid-turn steering, turn rewind, HITL suspend/resume, detached spawn/join, peer messaging |
| Context | cache-aware compaction, handoff summaries, sliding windows, reactive overflow recovery, surgical trim, pinned-state reinjection |
| LLM | OpenAI-compatible, Anthropic, Gemini, DeepSeek, LiteLLM fallback, structured output, retry/error classification, prompt-cache accounting |
| Persistence | JSONL transcript store, SQLite thread directory, rollback, pluggable directory/index hooks |
| Observability | EventMsg bus, console sink, JSONL sink, OpenTelemetry sink |
| MCP | stdio client integration and server mode |
For the complete integrator-facing matrix, see docs/capability-matrix.md.
src/taifeng/
├── skill/ # SkillDefinition, loader, registry, dispatch, orchestration
├── tool/ # ToolSpec, runtime scheduling, built-in tools
├── conversation/ # ResponseItem, JSONL store, SQLite side index, rebuild
├── context/ # budgets, compression strategies, cache stats, memory
├── llm/ # ModelClient protocol, events, retry, providers, simulator
├── loop/ # AgentEngine, TurnRunner, EnginePool, cancellation, events
├── hooks/ # lifecycle hooks
├── permission/ # HITL approval and rules
├── mcp/ # MCP client/server integration
└── telemetry/ # console, JSONL, OTel sinks
The core turn flow is:
Submission(UserMessage)
-> AgentEngine queue
-> TurnRunner
-> pre-sampling compaction
-> prompt build
-> ModelClient event stream
-> tool / skill dispatch
-> mid-turn cache-aware compaction
-> JSONL append
-> EventMsg turn completion
The current branch has closed the main P0/P1/P2 kernel gaps tracked in the architecture docs. Remaining work is intentionally demand-driven:
web_searchprotocol: define an unbound search capability that business systems can back with their own provider.- Memory backends: decide the R1 boundary for long-term memory before adding richer default implementations.
- Explicit multi-agent handoff API: design a stable API only after the ownership boundary is clear.
- Capability contract translation: this pass translates indexes; individual contract bodies can be translated later.
See docs/architecture/hermes-gap-roadmap.md and docs/architecture/kernel-gap-analysis.md for the detailed gap history.
| Entry | Purpose |
|---|---|
| docs/readme/ | Full README variants by language |
| docs/README.md | Documentation index and reading order |
| docs/capability-matrix.md | Integrator-facing capability matrix |
| docs/usage.md | Installation and usage guide |
| docs/configurable-knobs.md | Runtime and construction-time configuration |
| docs/architecture/overview.md | Architecture overview |
| docs/architecture/capabilities/ | Stable capability contracts |
| docs/decisions/ | ADR decision records |
| examples/ | End-to-end examples, mostly simulator-backed |
| docs/assets/brand/ | Logo, avatar, and favicon source assets |
| CLAUDE.md / AGENTS.md | Engineering collaboration rules |
- Python 3.12+.
- Tests use
pytestwith simulator-backed LLM clients. - New core behavior should update the matching architecture live doc.
- Capability changes should update both the contract and docs/capability-matrix.md.
- Real-LLM regressions are tracked in docs/real-llm-ledger.md.
- Pre-alpha infrastructure package.
- Current recorded suite: 622 tests passing.
- Source size: roughly 14k LOC under
src/. - The first production user is a host business system, but Taifeng itself must remain domain-free.
Apache License 2.0. See LICENSE.