Reliable, trustworthy, trackable AI workflows for science.
Wheeler is a thinking partner for scientists, built natively on Claude Code. It gives you slash commands for each stage of research: discuss the question, plan the investigation, execute analyses, write up results. Every action is wrapped in a knowledge graph that tracks how research artifacts (papers, code, data, findings, drafts) depend on each other, making every AI-produced result traceable back to the exact script, data, and parameters that produced it.Runs 100% locally. No API keys, no cloud services. Your data never leaves your machine.
Named after great physicist John Archibald Wheeler, Niels Bohr's longtime collaborator. Wheeler and Bohr worked by talking. Bohr would pace, thinking out loud. Wheeler would push back, sharpen the question, sketch the math. The best ideas emerged from the conversation, not from either person alone. That's the model here.
uvx wheeler init my-research-project
cd my-research-project && claude
/wh:startThat's it. The first command scaffolds the project (.plans/, .wheeler/, wheeler.yaml, .mcp.json) and installs slash commands and agents to ~/.claude/. The second drops you into Claude Code with Wheeler's MCP servers wired up. The third routes you to the right /wh:* command for what you want to do.
For long-lived use install Wheeler globally (faster startup, stable paths in .mcp.json):
uv tool install wheeler
wheeler init my-research-projectRun wheeler doctor any time to verify your setup (Python version, deps, Claude Code, Neo4j connectivity).
Prerequisites: Python 3.11+, uv, Claude Code (Max subscription), and Neo4j Desktop (free). New to all this? Walk through the Getting Started Guide.
git clone https://github.com/maxwellsdm1867/wheeler.git
cd wheeler
uv sync --extra dev # editable install + tests + ruff + mypy + build
uv run wheeler init ~/my-research-projectbin/setup.sh is still around for the full bootstrap (Neo4j in Docker, schema init, git hooks, zsh completions).
Science requires reproducibility. As AI gets embedded in research workflows, the gap between "AI helped me" and "here's the auditable chain of how this result was produced" becomes a credibility problem.
Wheeler is built on four pillars:
Traceable results. When Wheeler creates a finding, it automatically records what script ran, what data it consumed, what papers informed the approach, and when it happened. One tool call builds the full provenance chain. The agent focuses on science; infrastructure handles bookkeeping.
Change propagation. When a script changes or data is updated, Wheeler flags every downstream finding as stale and reduces its stability score. You always know what to trust and what needs re-verification.
Context management. All components read from and write to the same graph, so a finding from data analysis immediately informs subsequent literature searches, experimental design, and manuscript preparation. Information is progressively disclosed and retrieved only when relevant.
Executable research artifact. The knowledge graph moves beyond the static PDF. It is an executable map of discovery: any scientist can inherit the full experimental context of a project, explore how results connect, and build directly on top of prior work.
Wheeler gives you a fluid cycle, not a rigid pipeline. Enter at any point, skip stages, repeat them.
TOGETHER you + wheeler, thinking out loud
discuss plan chat pair write note ask
|
v remaining work is grinding
HANDOFF propose independent tasks
handoff you approve, modify, or keep talking
|
v
INDEPENDENT wheeler works alone
wh queue "..." logged, stops at decision points
|
v
RECONVENE results + flags + surprises
reconvene back to TOGETHER
Every plan and execution renders a self-contained visual brief: the question and sub-questions, figure mockups (pre-registered sketches) paired with the real result figures, a pipeline flow chart, and the data sources. /wh:discuss reads that brief to interpret the results with you like a colleague, referencing figures by number and running quick checks against the data to strengthen or disprove a point.
The flow we design for, end to end:
/wh:discuss— talk through the question until it is sharp. Wheeler asks like a colleague, grounds the conversation in what the graph already knows, and locks the decisions./wh:plan— Wheeler structures the investigation into waves of tasks and, before any data is touched, pre-registers the figures: what each one plots and how competing hypotheses would look different in it. On approval it renders a visual brief (question, mockups, pipeline, data sources) so you react to a picture, not prose. Seeing the mockup often sends one more round of sharpening back into the plan./wh:execute— Wheeler runs the WHEELER-assigned tasks, logs findings with full provenance, then regenerates the brief as a report: each pre-registered mockup now sits beside its real result figure, success criteria are marked, and result tables tuck into dropdowns./wh:discuss(again, on the results) — hand Wheeler the brief and interpret together: what holds, what is fragile, what the next question is. Wheeler references figures by number, pulls related findings from the graph, and can run a quick check against the data to settle a contested point, registering whatever you endorse back into the graph./wh:writedrafts from the endorsed findings with strict citations, or/wh:planopens the follow-up investigation./wh:closesweeps the session into a synthesis.
You can enter at any step, skip stages, or loop steps 2 to 4 as the work demands.
| Command | What it does |
|---|---|
/wh:start |
Route to the right command (or type your task) |
/wh:discuss |
Think like a colleague: sharpen the question, or interpret a plan's results from its brief (runs checks against the data, cites figures by number) |
/wh:plan |
Structure tasks with waves, assignees, checkpoints; render a visual brief with figure mockups |
/wh:execute |
Run analyses, log findings with provenance; pair mockups with the real result figures in a report |
/wh:write |
Draft text with strict citation enforcement |
/wh:ingest |
Bootstrap graph from existing code, data, papers |
/wh:add |
General-purpose ingest: text, DOI, file, URL |
/wh:note |
Quick-capture an insight, observation, or idea |
/wh:compile |
Compile graph into synthesis documents with citations |
/wh:dream |
Consolidate: promote tiers, detect communities, link orphans |
/wh:pair |
Live co-work: scientist drives, Wheeler assists |
/wh:ask |
Query the graph, trace provenance chains |
/wh:status |
Show progress, suggest next action |
/wh:handoff |
Propose tasks for independent execution |
/wh:reconvene |
Review results from independent work |
More commands
| Command | What it does |
|---|---|
/wh:chat |
Quick discussion, no execution |
/wh:triage |
Triage GitHub issues against planned work |
/wh:report |
Generate work log from graph (time period) |
/wh:close |
End-of-session provenance sweep |
/wh:pause / /wh:resume |
Save and restore investigation state |
/wh:update |
Check for Wheeler updates |
/wh:dev-feedback |
File bugs from inside your session |
Wheeler can run tasks without you present:
wh queue "search for papers on SRM models" # sonnet, 10 turns, logged
wh quick "check graph status" # haiku, 3 turns, fast
wh dream # graph consolidationThe wh launcher is a bash script in bin/wh that ships only with the source tree, not the PyPI wheel. To enable it after a uv tool install, clone the repo and symlink it: sudo ln -sf $PWD/bin/wh /usr/local/bin/wh. A native wheeler queue / quick / dream is on the roadmap.
Wheeler never does your thinking. Every task gets tagged: SCIENTIST (judgment calls), WHEELER (grinding), or PAIR (collaborative). Decision points are flagged as checkpoints, not guessed at.
The core primitive: one tool call creates a finding AND its full W3C PROV-DM provenance chain. You never write this directly; slash commands handle it. But under the hood, this is what happens:
add_finding(
description="Midget and parasol cells have similar clusters of fitted SRM parameters",
confidence=0.85,
execution_kind="script", # auto-creates Execution activity
used_entities="D-abc123,S-def456", # auto-links inputs
)Wheeler internally creates the Finding, an Execution activity node, links inputs (Dataset, Script) via USED, links the output via WAS_GENERATED_BY, sets a stability score, and dual-writes to Neo4j and JSON. The provenance chain is always complete because the agent never had to remember to create it.
Every entity carries a stability score (0.0-1.0) encoding epistemic trust: primary data = 1.0, published papers = 0.9, validated scripts = 0.7, LLM-generated findings = 0.3. When an upstream entity changes, stability decays downstream: new = source * (0.8 ^ hops). Changed scripts propagate stale flags through the entire dependency chain.
The graph is an index over files, not a document store. Each node stores an ID, type, tier, title, path, and timestamps. Full content lives in knowledge/{id}.json. Human-browsable rendering lives in synthesis/{id}.md (Obsidian-compatible with YAML frontmatter and [[backlinks]]). When you need connections, ask the graph. When you need content, read the file.
11 entity types: Finding, Hypothesis, OpenQuestion, Dataset, Paper, Script, Execution, Document, ResearchNote, Plan, Ledger.
14 relationship types: 6 W3C PROV standard (USED, WAS_GENERATED_BY, WAS_DERIVED_FROM, WAS_INFORMED_BY, WAS_ATTRIBUTED_TO, WAS_ASSOCIATED_WITH) + 8 Wheeler semantic (SUPPORTS, CONTRADICTS, CITES, APPEARS_IN, RELEVANT_TO, AROSE_FROM, DEPENDS_ON, CONTAINS).
50 MCP tools across 5 servers (mutations, queries, search, ops, legacy monolith).
See ARCHITECTURE.md for the complete technical spec: module dependency map, PROV schema, MCP tool listing, hardening patterns, design decisions.
v0.9.11 (2026-06-11): badge composes with any statusline
- Update badge composes with custom statuslines: a pre-existing statusLine (e.g. GSD's) is wrapped rather than skipped; the wrapper runs the original command unchanged and prepends the yellow
/wh:updatebadge only when an update is pending. Reinstall never double-wraps, and uninstall restores the original command verbatim. - Test suite at 1734 (was 1733 in v0.9.10).
v0.9.10 (2026-06-11): update chain hardened
- Updates apply fully on the first run:
wheeler updatereinstalls files by re-executing the freshly upgraded wheeler, so registrations that are new in the version just installed (like the status bar badge) take effect immediately and the manifest records the correct version. - Version check works without pip: the PyPI check uses the JSON API via urllib instead of shelling out to pip (uv tool venvs have no pip); the GitHub check sends a proper User-Agent.
- The badge never lies: the session hook probes known install locations when run with a minimal PATH, never claims an update when the installed version cannot be determined, and the CLI now accepts the hook-written cache instead of re-checking the network on every invocation.
- Test suite at 1733 (was 1730 in v0.9.9).
v0.9.9 (2026-06-11): update path fixed end to end
- uv tool installs can update:
wheeler updatenow detects uv-managed installs and upgrades viauv tool upgrade wheelerinstead of failing on the missing pip (#69). - Update badge actually appears:
wheeler installnow registers the statusline hook as the top-levelstatusLinesettings key, so the yellow⬆ /wh:updatebadge renders when an update is available; a custom statusLine is never overwritten (#70). - Offline checks keep the badge: a failed network check no longer overwrites a cached
update_available: true, so the badge survives offline session starts. - Test suite at 1730 (was 1713 in v0.9.8).
Claude Code (interactive)
├── /wh:* slash commands (.claude/commands/wh/*.md)
│ ├── /wh:start: intent router (invokes other commands)
│ ├── YAML frontmatter: tool restrictions per mode
│ └── System prompt: workflow + provenance protocol
│
├── MCP Servers (50 tools)
│ ├── wheeler_core (12): health, status, context, search, cypher
│ ├── wheeler_query (10): read-only query_* tools
│ ├── wheeler_mutations (18): add_*, link, delete, update, merge
│ ├── wheeler_ops (10): staleness, citations, consistency
│ └── wheeler (legacy monolith): same 50 tools, one server
│
bin/wh (headless)
└── claude -p with structured logging → .logs/*.json
Code structure
wheeler/
├── models.py # Pydantic v2: 11 node types, prefix mappings
├── config.py # YAML loader, Pydantic config models
├── provenance.py # Stability scoring, invalidation propagation
├── consistency.py # Cross-layer drift detection and repair
├── mcp_server.py # Legacy monolith: all 50 tools
├── mcp_core.py # Split server: health, context, search (12)
├── mcp_query.py # Split server: query_* read-only (10)
├── mcp_mutations.py # Split server: add_*, link, delete, update (18)
├── mcp_ops.py # Split server: staleness, citations (10)
├── mcp_shared.py # Shared: trace IDs, decorators, config
├── knowledge/ # File I/O: read, write, list, render, migrate
├── graph/ # Neo4j backend, circuit breaker, schema, context
├── search/ # Embeddings, RRF fusion, graph-expanded search
├── validation/ # Citation validation, ledger quality metrics
├── tools/graph_tools/ # Provenance-completing mutations + queries
└── workspace.py # Project file scanner
tests/ # 1734 tests
docs/ # Getting started, architecture, project spec
Bug reports: Use /wh:dev-feedback from inside a session to file structured issues, or report at GitHub Issues.
Tests: python -m pytest tests/ -v (1734 tests). E2E tests require a running Neo4j: python -m pytest tests/e2e/ -v.
Architecture: See ARCHITECTURE.md for the full technical spec (module dependency map, PROV schema, MCP tool listing, hardening patterns).
Project docs:
- Mission — four pillars, target audience, design north star
- Tech stack — components, infrastructure patterns, current gaps
- Roadmap — shipped versions, v0.9.0 phases, v1.0 criteria
- Getting started — install walkthrough with Neo4j Desktop
- Project spec — original design specification
If you use Wheeler in your research, please cite it:
@software{hong_wheeler_2026,
author = {Hong, Arthur and Rieke, Fred},
title = {{Wheeler: Reliable, trustworthy, trackable AI workflows for science}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20498885},
url = {https://doi.org/10.5281/zenodo.20498885}
}