A CLI tool that uses LLMs to build and maintain personal knowledge bases. Drop in source documents, and the LLM compiles them into a structured, interlinked wiki of markdown files — summaries, entity pages, concept pages, cross-references, and a running index. Ask questions against the wiki and get synthesized answers. Run health checks to keep everything consistent.
The wiki is a directory of markdown files you can open in Obsidian, VS Code, or any text editor. The LLM writes all of it. You never touch the wiki directly.
Most LLM document tools (RAG, NotebookLM, ChatGPT file uploads) re-derive knowledge from scratch on every query. Nothing accumulates.
This tool does something different: when you add a source, the LLM reads it and integrates it into a persistent wiki — updating entity pages, writing concept summaries, noting where new data contradicts old claims, and maintaining cross-references. By the time you ask a question, the synthesis is already done. The wiki compounds with every source you add and every question you ask.
raw/ ← you drop source documents here (immutable)
├── paper.md
├── article.md
└── assets/
wiki/ ← LLM writes and maintains everything here
├── index.md ← master catalog of all pages (auto-updated)
├── log.md ← append-only operation log
├── sources/ ← one summary page per source document
├── concepts/ ← concept and topic pages
└── entities/ ← people, organizations, products, events
schema.md ← conventions and workflows the LLM follows
The LLM runs as an agent with five file operation tools (read_file, write_file, append_to_file, list_directory, search_wiki). It uses these tools to read the schema, explore the existing wiki, and write or update pages — typically touching 5–15 files per ingest. You watch the tool calls stream in real time.
Requires Python 3.10+.
git clone https://github.com/Turi-Labs/wiki-builder
cd wiki-builder
pip install -e .This installs the wiki command globally.
Dependencies: anthropic, openai, typer, rich
Set your API key as an environment variable:
# For Anthropic (default)
export ANTHROPIC_API_KEY=sk-ant-...
# For OpenAI
export OPENAI_API_KEY=sk-...Or save it permanently:
wiki config --provider anthropic --api-key sk-ant-...
wiki config --provider openai --api-key sk-...# 1. Create a new wiki
wiki init my-research/ --name "AI Research"
# 2. Add source documents to raw/
cp ~/papers/attention-is-all-you-need.md my-research/raw/
# 3. Ingest a source (LLM reads it and updates the wiki)
cd my-research
wiki ingest raw/attention-is-all-you-need.md
# 4. Ask questions
wiki query "what is self-attention and why does it matter?"
# 5. Check the wiki's health
wiki lint
# 6. See stats
wiki statusScaffolds a new wiki project.
wiki init my-research/
wiki init my-research/ --name "ML Paper Notes"Creates:
my-research/
├── schema.md # LLM conventions (edit this to customize behavior)
├── .gitignore
├── raw/
│ └── assets/
└── wiki/
├── index.md
├── log.md
├── concepts/
├── entities/
└── sources/
Reads a source document and integrates it into the wiki. The LLM:
- Writes a summary page in
wiki/sources/ - Creates or updates entity pages in
wiki/entities/ - Creates or updates concept pages in
wiki/concepts/ - Updates
wiki/index.md - Appends an entry to
wiki/log.md
wiki ingest raw/paper.md
wiki ingest raw/article.md --provider openai --model gpt-4o
wiki ingest raw/notes.md --thinking # show Claude's reasoning
wiki ingest raw/paper.pdf --delete-pdf # convert to text, then delete the PDFSupported source formats: any text file (.md, .txt, .rst, plain text). PDFs are converted to sibling .txt files before ingestion when pdftotext from Poppler is installed.
Asks a question against the wiki. The LLM reads the index, finds relevant pages, and synthesizes an answer.
wiki query "what is the transformer architecture?"
wiki query "compare attention mechanisms across these papers" --format table
wiki query "summarize everything I know about RLHF" --save
wiki query "what are the key open problems?" --output open-problems.mdOptions:
| Flag | Description |
|---|---|
--format markdown |
Default. Well-structured markdown answer. |
--format table |
Answer as a comparison table. |
--format marp |
Marp slide deck (viewable in Obsidian with the Marp plugin). |
--save |
Files the answer back into the wiki as a new concept page. |
--output <file> |
Saves the answer to a file instead of printing. |
Tip: Use --save to make your queries compound in the wiki. Every good answer becomes a permanent page, enriching future queries.
Runs a health check over the entire wiki and produces a report.
wiki lint
wiki lint --fix # automatically fix issues foundChecks for:
- Orphan pages (no inbound links from other pages)
- Missing cross-references (entity mentioned but not linked)
- Contradictions between pages
- Concepts mentioned across multiple pages but lacking their own page
- Stale or vague claims
index.mdentries missing or out of date
Shows a summary of the wiki without calling the LLM.
wiki status╭─────────────── Wiki Status ───────────────╮
│ Wiki: /home/user/my-research │
│ │
│ 📄 Sources: 12 pages │
│ 💡 Concepts: 34 pages │
│ 👤 Entities: 18 pages │
│ 📁 Raw files: 12 │
│ 📝 Total wiki words: 47,203 │
╰───────────────────────────────────────────╯
Manages global settings stored in ~/.wiki-builder/config.json.
wiki config --provider anthropic --api-key sk-ant-...
wiki config --provider openai --api-key sk-...
wiki config --model gpt-4o-mini # set a default model
wiki config --show # print current configBoth Anthropic and OpenAI are supported. The provider can be set globally or overridden per command.
| Provider | Default model | Notes |
|---|---|---|
anthropic |
claude-opus-4-6 |
Default. Supports --thinking for extended reasoning. |
openai |
gpt-4o |
Full tool use support. --thinking flag has no effect. |
Resolution order (highest to lowest priority):
--provider/--modelCLI flagsWIKI_PROVIDER/ANTHROPIC_API_KEY/OPENAI_API_KEYenvironment variables- Saved config (
~/.wiki-builder/config.json) - Default:
anthropic/claude-opus-4-6
Examples:
# Use OpenAI for a single command
wiki ingest raw/paper.md --provider openai
# Use a cheaper model for quick queries
wiki query "what is X?" --provider openai --model gpt-4o-mini
# Switch default provider
wiki config --provider openai
# Per-session override via env var
WIKI_PROVIDER=openai wiki ingest raw/paper.mdschema.md is the configuration file that tells the LLM how to behave. It defines:
- Directory structure and what goes where
- Page format and frontmatter conventions
- Cross-linking rules
- The ingest, query, and lint workflows
- File naming conventions
The LLM reads schema.md at the start of every operation. You can edit it to change how the wiki is structured — for example, adding a timeline/ directory, requiring specific frontmatter fields, or changing how entities are categorized.
This is the key file for customizing behavior. The defaults work well for research wikis, but different domains (a book, a codebase, a company's internal knowledge) may want different structures.
wiki-builder/
├── pyproject.toml # package metadata and dependencies
└── wiki_builder/
├── __init__.py
├── cli.py # typer CLI — all commands live here
├── agent.py # agentic loop + provider backends + prompt builders
├── tools.py # file operation tools the LLM calls
├── config.py # API key and provider resolution
└── templates.py # default schema.md, index.md, log.md content
Typer commands (init, ingest, query, lint, status, config). Resolves the wiki root, builds prompts, calls run_agent, and prints results.
The core of the tool. Contains:
run_agent()— unified entry point that dispatches to the right provider backend_run_anthropic()— Anthropic agentic loop using the Messages API with tool use_run_openai()— OpenAI agentic loop using the Chat Completions API with function callingbuild_ingest_prompt(),build_query_prompt(),build_lint_prompt()— prompt builders
The loop runs until the model stops calling tools (end_turn / finish_reason: stop), up to a configurable max_turns (default: 40).
Five tools the LLM can call to interact with the filesystem:
| Tool | Description |
|---|---|
read_file(path) |
Read any file relative to wiki root |
write_file(path, content) |
Write or overwrite a file (creates parent dirs) |
append_to_file(path, content) |
Append to a file (used for log.md) |
list_directory(path) |
List files and subdirectories |
search_wiki(query) |
Grep all wiki .md files for a term, returns matching paths |
All paths are validated to prevent directory traversal outside the wiki root.
Tool definitions are stored in Anthropic format and converted to OpenAI's function-calling format automatically when using the OpenAI backend.
Handles provider and API key resolution. Saves config to ~/.wiki-builder/config.json. Supports legacy ANTHROPIC_API_KEY as well as the newer per-provider key names.
The default content written by wiki init: schema.md (the LLM's behavioral spec), and empty stubs for index.md and log.md.
Use Obsidian as your frontend. Open the wiki root as an Obsidian vault. The graph view shows the shape of your knowledge — what's connected, what's isolated. Use the Dataview plugin to query frontmatter. Use Marp for slide output.
Ingest sources one at a time. The LLM does a better job when you ingest sources one at a time and stay involved — reading the summaries, checking the updates, noting what to emphasize next time.
Use --save on queries you care about. Every good answer filed back into the wiki is available for future queries and makes the synthesis richer.
Edit schema.md as you go. If the LLM is creating pages in a format you don't like, edit schema.md to change the convention. The LLM will follow the updated schema on the next operation.
Keep raw/ immutable. The LLM is instructed never to modify raw/. It's your source of truth. If you want to re-ingest a source with different instructions, update schema.md and run wiki ingest again.
The wiki is a git repo. Run git init in the wiki root. Every ingest, query, and lint pass produces a natural commit point. You get full history, easy rollback, and the ability to branch experiments.
cd my-research
git init
git add .
git commit -m "init wiki"
wiki ingest raw/paper.md
git add -A && git commit -m "ingest: attention is all you need"- Source format: Only plain text sources are supported natively. Convert PDFs, DOCX, etc. to markdown before ingesting.
- Scale: The index-based navigation works well up to ~100–200 sources. Beyond that, consider adding a search tool (e.g. qmd) and pointing the LLM at it.
- Images: The LLM reads markdown but won't automatically process inline images from sources. Download images locally (Obsidian's "Download attachments" hotkey helps) and reference them by path. The LLM can be asked to view specific images separately.
- Cost: Each ingest touches 5–15 files and may run 10–20 tool call turns. With Claude Opus 4.6, a typical ingest costs $0.05–0.20 depending on source length and wiki size. Use
--model claude-sonnet-4-6or--model gpt-4o-minito reduce cost.