Local-first experiment for turning AI chat history into a living memory graph. ArcRift can be used as an optional web-chat capture source, but Sporepath is the memory layer: it digests chats into notes, metabolic focus paths, latent paths, and weird-but-bridged inspiration prompts.
Sporepath grows the paths you use, and keeps forgotten thoughts ready to wake. The goal is not another note archive. The experiment is whether old chat fragments can become two useful layers:
- Focus paths: active, frequently touched ideas become thicker and easier to continue.
- Latent paths: quiet, low-activation ideas sink out of the way but can be resurfaced when a new problem creates a useful bridge.
Small local models are treated as scouts, not as the brain. They propose candidate memory atoms. Local rules and later usage signals decide which paths thicken, fade, or sink into archive.
- Imports ChatGPT-style
conversations.json, generic JSONL chat logs, allowlisted local Codex/Claude conversation stores, and ArcRift SQLitefull_chats. - Queues manually written Markdown/txt notes from a notes inbox, so old saved ideas can enter the same digestion path as chats.
- Extracts
thought atomswith either:- a deterministic rules baseline, or
- a local Ollama model such as
qwen3:1.7b.
- Stores atoms and shared-tag edges in SQLite.
- Builds readable
digested notesfrom atoms, so you can review old chats without opening the full conversation log. - Exports digested notes to a local Obsidian-compatible Markdown vault.
- Tracks
activation, a rough path-strength score. - Produces a local interactive HTML graph.
- Uses
codex execonly for optional inspiration bridging, so an existing ChatGPT/Codex subscription can be used instead of an API key.
git clone https://github.com/shihchengwei-lab/sporepath.git
cd sporepath
python -m pip install -e .Or run from the checkout without installing:
$env:PYTHONPATH = "src"
python -m sporepath doctorOn Windows, double-click:
Sporepath.vbs
This starts the ArcRift backend if needed, starts the local source watcher and off-peak digestion queue worker in the background, and opens the small local Sporepath window.
Only the Sporepath window is meant to stay visible. ArcRift, source watching, and queue digestion run hidden; the main window shows small status indicators for ArcRift, Sources, and Queue.
Sporepath.bat remains as a compatibility launcher, but Windows may briefly
open a terminal for .bat files. Use Sporepath.vbs when you want a truly
headless launch.
- Local Codex/Claude/jsonl sources are watched directly. When those files change, Sporepath refreshes the SQLite memory, digested notes, Obsidian vault, and graph.
- The background digestion queue worker is started hidden. It collects new local-source, ArcRift, and notes-inbox fragments, then only digests them during the configured off-peak window.
- Put old
.md,.markdown, or.txtnotes in%USERPROFILE%\Documents\Sporepath Inbox; they will be queued for scout digestion. Sporepath does not scan arbitrary ArcRift folders for notes. - Web chats are intentionally not scraped in the background. After a ChatGPT or
Claude web conversation, press ArcRift's popup Save Chat button. Once
ArcRift writes the chat into
ArcRift.db, Sporepath queues it as another digest source. It is not rebuilt every 20 seconds.
The main window keeps the daily surface small:
- Sync Vault: treat edited Obsidian notes as usage feedback and thicken their source atoms.
- Inspire: enter a stuck question and ask Codex for weird-but-bridged next moves.
- Rate Suggestions: appears only after an
Inspirerun returns suggestion ids; mark each suggestion ๐, ๐, or leave it unselected before submitting structured feedback. - Debug: opens the maintenance panel.
The Debug panel holds the manual recovery and setup actions:
- Auto-detect Sources
- Import ArcRift
- Refresh Now
- Open Vault
- Queue Status
- Run Queue Batch
The batch launcher uses real_memory.sqlite in this checkout by default. The
Debug panel lets you edit the DB, chat export, ArcRift DB, vault, and graph
paths before running maintenance actions.
If you only want the backend without opening Sporepath:
Start-ArcRift.bat
If you only want the off-peak queue worker:
Run-Sporepath-Queue-Worker.bat
It defaults to qwen3:1.7b, 00:00-07:00, batch size 5, auto-feeds
allowlisted local sources with --source all, includes the neighboring
..\ArcRift\backend\ArcRift.db when present, creates/uses
%USERPROFILE%\Documents\Sporepath Inbox for Markdown/txt notes, refreshes the
Obsidian vault and HTML graph after new atoms are created, and checks that
Ollama and the model exist before starting.
To start that worker automatically when Windows logs in, run:
Install-Sporepath-Queue-Worker-Task.bat
To remove the scheduled task:
Uninstall-Sporepath-Queue-Worker-Task.bat
If Chrome's extension manager is inconvenient, there are best-effort launchers:
Launch-ArcRift-Chrome.bat
It tries to open a separate Chrome profile at
%LOCALAPPDATA%\Sporepath\ArcRift Chrome Profile with the local ArcRift
extension loaded from ..\ArcRift\extension\dist\chrome. Google Chrome can
ignore --load-extension in some installs, so the reliable setup is still to
load the unpacked ArcRift extension manually once from chrome://extensions.
You may need to sign in to ChatGPT/Claude once in that dedicated profile.
If you want to reuse your existing logged-in Chrome profile instead, use:
Launch-ArcRift-Logged-In-Chrome.bat
This closes Chrome, then tries to reopen the Default profile with the ArcRift
extension loaded. It is useful when ChatGPT/Claude are already signed in in your
normal browser, but it will close any current Chrome windows first. If Chrome
ignores the extension flag, manually install the unpacked ArcRift extension once
and keep using your normal browser.
Use Auto-detect Sources to find local Codex and Claude conversation stores, or let the source watcher do it automatically. Sporepath only uses an allowlist of likely conversation sources:
{home}/.codex/history.jsonl
{home}/.codex/sessions/
{home}/.codex/archived_sessions/
{home}/.claude/history.jsonl
{home}/.claude/projects/
{home}/.claude/sessions/
It deliberately ignores credentials, auth files, settings, logs, caches, and other non-conversation files.
Try the included sample first:
$env:PYTHONPATH = "src"
python -m sporepath --db sample_memory.sqlite ingest examples\sample_chat.jsonl
python -m sporepath --db sample_memory.sqlite digest
python -m sporepath --db sample_memory.sqlite notes
python -m sporepath --db sample_memory.sqlite focus
python -m sporepath --db sample_memory.sqlite graph --out graph.htmlOpen graph.html in your browser.
For a ChatGPT export saved in your Downloads folder:
$chat = "$env:USERPROFILE\Downloads\conversations.json"
python -m sporepath --db my_memory.sqlite ingest $chat
python -m sporepath --db my_memory.sqlite digest
python -m sporepath --db my_memory.sqlite notes
python -m sporepath --db my_memory.sqlite stats
python -m sporepath --db my_memory.sqlite focusUse the local model extractor on a small slice first:
ollama pull qwen3:1.7b
$chat = "$env:USERPROFILE\Downloads\chat.jsonl"
python -m sporepath --db qwen_trial.sqlite ingest $chat --extractor ollama --model qwen3:1.7b --max-turns 50
python -m sporepath --db qwen_trial.sqlite focus --limit 20The small model is expected to be noisy. It is a scout. Use show to inspect
why it kept an atom:
python -m sporepath --db qwen_trial.sqlite show <atom-id>Slow scout models do not need to run while you are working. Queue new chat fragments first, then digest them later during idle/off-peak time:
python -m sporepath --db real_memory.sqlite queue-build --source all --min-chars 80
python -m sporepath --db real_memory.sqlite queue-build `
--notes-inbox "$env:USERPROFILE\Documents\Sporepath Inbox" `
--min-chars 80
python -m sporepath --db real_memory.sqlite queue-statsQueue collection is intentionally conservative. It skips near-duplicate
fragments and disposable command/recap noise before the local scout sees them.
Use --no-dedupe only when you intentionally want to test repeated cases.
Process a small batch with the rules baseline:
python -m sporepath --db real_memory.sqlite digest-queue --extractor rules --limit 25Process with the current validated local scout:
python -m sporepath --db real_memory.sqlite digest-queue `
--extractor ollama `
--model qwen3:1.7b `
--ollama-timeout-s 120 `
--ollama-num-predict 260 `
--limit 10Each fragment is checkpointed as done, skipped, or error, so an interrupted
run can continue later without reprocessing finished items.
If a model call failed, inspect and retry errors without opening SQLite:
python -m sporepath --db real_memory.sqlite queue-errors
python -m sporepath --db real_memory.sqlite queue-retryTo leave a worker running and only process the queue during off-peak hours:
python -m sporepath --db real_memory.sqlite queue-worker `
--source all `
--arcrift-db "$env:USERPROFILE\Desktop\GH_repos\ArcRift\backend\ArcRift.db" `
--notes-inbox "$env:USERPROFILE\Documents\Sporepath Inbox" `
--off-peak 00:00-07:00 `
--batch-size 5 `
--interval-s 300 `
--vault "$env:USERPROFILE\Documents\Sporepath Vault" `
--graph real_graph.html `
--extractor ollama `
--model qwen3:1.7b `
--ollama-timeout-s 120 `
--ollama-num-predict 260Use --once --run-now to run one batch immediately for testing.
Run-Sporepath-Queue-Worker.bat is the copy-paste version of this flow.
ArcRift is optional here. Sporepath can import ArcRift's saved web chats, but it
does not depend on ArcRift for the core memory loop. The core loop remains:
local chat digestion, readable notes, path metabolism, Obsidian export, and
inspire.
Point Sporepath at an ArcRift SQLite database:
$arc = Read-Host "Paste the full path to ArcRift.db"
python -m sporepath --db my_memory.sqlite import-arcrift $arc
python -m sporepath --db my_memory.sqlite digest
python -m sporepath --db my_memory.sqlite export-vault "$env:USERPROFILE\Documents\Sporepath Vault"
python -m sporepath --db my_memory.sqlite inspire "I am stuck on what to do next"If you run ArcRift from its repo, the default SQLite file is usually
ArcRift.db in the backend working directory unless SQLITE_DB_PATH is set.
Sporepath opens the ArcRift DB in read-only mode and imports from
full_chats.rawText; it does not modify ArcRift's database.
Filter to one ArcRift project or session id:
python -m sporepath --db my_memory.sqlite import-arcrift $arc --project "My Project"To queue ArcRift chats for the same off-peak scout digestion path:
python -m sporepath --db real_memory.sqlite queue-build `
--arcrift-db "$env:USERPROFILE\Desktop\GH_repos\ArcRift\backend\ArcRift.db" `
--min-chars 80The normal Sporepath.bat flow starts the queue worker for you, so saved
ArcRift chats are picked up there. watch-arcrift still exists as a debug
escape hatch for immediate rules-based import, but it is no longer part of the
default launcher.
To open a Chrome profile with the ArcRift extension already loaded:
Launch-ArcRift-Chrome.bat
To reuse the already signed-in Chrome Default profile, close/reopen Chrome and
load the extension in one step:
Launch-ArcRift-Logged-In-Chrome.bat
Raw conversations are too long to review. Thought atoms are useful for scoring and linking, but too small to read as notes. Digested notes are the middle layer:
raw chat / JSONL
-> thought atoms
-> digested notes
-> focus and latent graph
Build notes from the atoms already in your database:
python -m sporepath --db my_memory.sqlite digest
python -m sporepath --db my_memory.sqlite notes
python -m sporepath --db my_memory.sqlite show-note <note-id>Current note generation is deliberately simple and local. It groups atoms by topic, keeps source atom ids and source spans, and produces rough note types:
concept_notedecision_notefriction_note
These notes are not treated as permanent truth. They are readable byproducts of the memory metabolism layer, and can be rebuilt as extraction improves.
refresh is the one-step pipeline behind the desktop button:
python -m sporepath --db my_memory.sqlite refresh `
--input "$env:USERPROFILE\Downloads\conversations.json" `
--vault "$env:USERPROFILE\Documents\Sporepath Vault" `
--graph sporepath_graph.htmlIf your database already has atoms, --input is optional. Without it, refresh
rebuilds edges, notes, vault export, and graph from the existing database.
You can also ask Sporepath to detect Codex/Claude sources:
python -m sporepath sources
python -m sporepath --db my_memory.sqlite refresh --source codex --source claude `
--vault "$env:USERPROFILE\Documents\Sporepath Vault" `
--graph sporepath_graph.html--source is explicit on purpose. A plain refresh does not scan your
home directory unless you ask for sources.
To keep local Codex/Claude/jsonl sources synced without pressing Refresh Now:
python -m sporepath --db real_memory.sqlite watch-sources --source all `
--vault "$env:USERPROFILE\Documents\Sporepath Vault" `
--graph real_graph.htmlOn Windows, Run-Sporepath-Sources-Watcher.bat runs that command, and
Sporepath.bat starts it for you.
Sporepath does not need to become a note-taking app. It can export digested notes into a plain Markdown vault that Obsidian can open directly:
python -m sporepath --db my_memory.sqlite export-vault "$env:USERPROFILE\Documents\Sporepath Vault"The export writes:
Sporepath Vault/
Digested Notes/
concept-note-memory-metabolism-abc1234.md
.sporepath/
manifest.json
Each note includes YAML frontmatter with sporepath_id, type, state,
activation, tags, source_atoms, and source_spans. Obsidian is the human
reading/editing surface; SQLite remains the source of truth for activation,
focus/latent scoring, and future inspire behavior.
If you edit generated notes in Obsidian, sync that activity back into the metabolism layer:
python -m sporepath --db my_memory.sqlite sync-vault "$env:USERPROFILE\Documents\Sporepath Vault"sync-vault compares the exported manifest with current Markdown files. Modified
notes touch their source atoms, so Obsidian edits become path-strength feedback.
inspire sends a compact prompt to codex exec. The adapter removes
CODEX_API_KEY and OPENAI_API_KEY from the child process environment, uses
stdin for the prompt, runs read-only, and lowers reasoning effort for this PoC.
Check auth first:
python -m sporepath doctorYou want Codex to report ChatGPT login if you intend to use subscription usage instead of API-key billing.
Dry run:
python -m sporepath --db my_memory.sqlite inspire "I am stuck on how to validate this project" --dry-runReal run:
python -m sporepath --db my_memory.sqlite inspire "I am stuck on how to validate this project" --focus-limit 5 --latent-limit 10Successful runs print an inspire_run=<id> line. When the generated text
includes suggestion_id and cited_atom_ids, Sporepath stores that mapping so
you can mark a useful idea without retyping atom ids:
python -m sporepath --db my_memory.sqlite inspire-feedback latest `
--status useful `
--suggestion 1Use an explicit run id when you are marking an older run:
python -m sporepath --db my_memory.sqlite inspire-feedback <run-id> `
--status useful `
--suggestion 1You can still mark a bridge manually by passing the cited atoms:
python -m sporepath --db my_memory.sqlite inspire-feedback <run-id> `
--status useful `
--atoms <atom-id-1> <atom-id-2>Positive feedback statuses are selected, useful, and applied. They thicken
the selected atoms and add or strengthen an inspire_feedback bridge between
them. boring, wrong, and ignored are recorded but do not thicken the path.
The desktop app uses ๐ as useful, ๐ as wrong, and an unselected submitted
suggestion as ignored. Inspire runs and feedback also write local
usage_events rows as an auditable trail.
python -m sporepath --db my_memory.sqlite graph --out graph.html --limit 160In the graph:
- circle = thought atom
- line = shared-tag path
- larger/brighter circle = stronger focus path
- faded amber circle = latent path
- click a node = inspect source, tags, activation, and original text
The graph is a standalone local HTML file. It embeds excerpts from your memory database, so treat it as private data.
This project is designed for local-first experimentation, but your imported chat logs can contain sensitive personal or work data.
Do not commit:
*.sqlite- generated graph HTML files
- real chat exports
- Codex/Claude/ChatGPT auth files
The .gitignore is set up to ignore the common generated files, but review
git status before publishing.
Use this before trusting a small local model as a scout. It builds a review sheet from real chat fragments, runs the rules baseline or Ollama extractor, and leaves blank human fields for scoring.
$env:PYTHONPATH = "src"
python -m sporepath eval-extract --source codex --limit 20 `
--contains debug --contains bug --contains error `
--max-chars 1200 `
--out eval\codex_eval.jsonl `
--report eval\codex_eval.mdTo test a local model:
python -m sporepath eval-extract --source codex --limit 20 `
--contains debug --contains bug --contains error `
--extractor ollama --model qwen3:1.7b `
--max-chars 1200 `
--out eval\qwen_eval.jsonl `
--report eval\qwen_eval.mdOllama eval runs a small JSON canary before sampling. If the model returns
degenerate output or invalid JSON, Sporepath stops before spending time on a
noisy sheet. Use --skip-model-check only when you intentionally want to debug
raw model failures.
For the current validated scout, run:
Run-Sporepath-Qwen17-Eval.bat
That samples 35 allowlisted local sources with qwen3:1.7b, caps the sample at
one case per file, skips near-duplicates, checkpoints after every case, and writes
eval\qwen17_eval.jsonl plus eval\qwen17_eval.md. It then runs
eval-clean and writes eval\qwen17_eval.clean.jsonl plus
eval\qwen17_eval.clean.md; review and score the clean sheet first.
After reviewing the Markdown, fill the human fields in the JSONL file, then
summarize:
python -m sporepath eval-score eval\qwen_eval.jsonlIf the sheet contains repeated fragments, clean it without losing the human
review fields:
python -m sporepath eval-clean eval\qwen_eval.jsonl `
--out eval\qwen_eval.clean.jsonl `
--report eval\qwen_eval.clean.mdThe human part should stay narrow. Do not judge whether the model wrote a good note. Judge whether it behaved like a useful scout:
keep: should this fragment be kept?route: debug, product, preference, idea, decision, research, writing, ops, or other.signal_found: did it catch the reusable signal?noise_marked: did it mark obvious disposable noise?handoff_sufficient: is the handoff enough for a cloud model to think with later?
The command reports pass rate, keep agreement, route agreement, signal-found rate, noise-marked rate, and handoff-sufficient rate.
Sporepath's goal is not "store every chat forever." A useful build should pass three narrower checks:
- Scout quality: the local scout keeps reusable fragments, rejects tool noise, and writes a handoff that is good enough for a cloud model later.
- Note usability: digested notes are not empty, keep their source anchors, and do not collapse into duplicate titles.
- Inspire feedback:
inspireruns produce suggestions that you actually mark as useful often enough to justify the workflow.
Run the checks separately:
python -m sporepath validate-scout eval\qwen_eval.jsonl --out eval\validation_scout.md
python -m sporepath --db my_memory.sqlite validate-notes --out eval\validation_notes.md
python -m sporepath --db my_memory.sqlite validate-inspire --out eval\validation_inspire.mdUse the cleaned sheet for validate-scout when eval-clean drops duplicates:
python -m sporepath validate-scout eval\qwen_eval.clean.jsonl --precleaned --out eval\validation_scout.mdOr write one combined report:
python -m sporepath --db my_memory.sqlite validate-report `
--scout-eval eval\qwen_eval.jsonl `
--out eval\sporepath_validation.mdVerdicts are deliberately conservative:
pass: the measured health checks are above the current target.fail: the data exists, but at least one target is below the line.needs_data: the validator cannot judge yet because the repo has not collected enough scored eval rows, generated notes, or inspire feedback.
These validators are meant to catch structural problems. They still do not replace the human judgment step: only you can say whether a note feels worth opening again or whether an inspired move changed the next action.
- Edges currently include shared-tag evidence and confidence metadata, but they are still not true semantic embeddings.
qwen3:1.7bpassed the scout validator on one 31-case clean eval, but that sheet'shumancolumn was scored by Codex, not a human, and the LLM judge is not yet calibrated against human spot-checks. It still creates false positives and needs continued sampling.- On this machine,
qwen3.5:4bwas slower and produced invalid JSON on some scout prompts; keep it experimental until it passesvalidate-scout. - ArcRift import currently reads
full_chats.rawTextonly; it does not import ArcRift facts, vector chunks, or retrieval scores yet. digestis currently rules-based grouping, not high-quality editorial summarization.sync-vaultonly uses generated-note file edits as feedback; this is not a full Obsidian plugin or bidirectional sync engine.- The desktop window is a local tkinter launcher, not a packaged Windows installer yet.
- Extraction eval exists as CLI-generated JSONL/Markdown sheets, but there is no graphical eval UI yet.
- Archive/deep-archive budgets are design goals, not complete product features.
- The graph is a static HTML export, not a full app.
$env:PYTHONPATH = "src"
python -m unittest discover -s testsThe next important question is not whether a 1B model can summarize chat logs. It is whether a small local model can extract reusable structures that the memory graph can later validate through use:
- friction structures
- state machines
- decision questions
- taste and judgment patterns
- recurring technical pitfalls
If you try this on your own logs, issues with real examples of noisy extraction, missed useful atoms, or graph behavior are the most valuable feedback.
