The forge that forges itself.
Canonical repo: unbrowse-ai/foundry
Foundry turns repeated workflows into reusable skill bundles. Point it at your local session history and it will:
- scan for recurring workflow shapes across all sessions
- emit candidate skills backed by hard evidence (session counts, tool frequencies, co-occurrence patterns)
- fabricate bundle/share/index/memory artifacts from one preset
Discovered candidate skills auto-install into the local host skill dir by default:
- Codex:
$CODEX_HOME/skillsor~/.codex/skills - Claude:
~/.claude/skills - Each mined skill now gets
references/runtime-pointers.mdwith sanitized env-var names and secret/config locations only; values, emails, and user-specific paths stay out ofSKILL.md.
It also preps the local Phase 2 routing layer:
- ingest tool trace sessions
- build an explicit action DAG
- emit next-action training examples from real tool sequences
npm run discover
npm run mine-history
npm run prepare-router
npm run fabricatePublish the public share manifest in the same pass:
node scripts/publish-bundle.mjs \
--preset presets/unbrowse-workflows.json \
--out dist \
--public-root ../unbrowse/frontend/public \
--site-url https://www.unbrowse.aiWatch for new recurring workflows:
npm run discover:watchSkip local auto-install:
node scripts/discover-skill-candidates.mjs --preset presets/unbrowse-workflows.json --no-installWrite host memory in the same pass:
node scripts/fabricate-bundle.mjs \
--preset presets/unbrowse-workflows.json \
--out dist \
--host claude \
--scope agentWe ran Foundry against the Unbrowse monorepo -- 319 sessions, 16,000+ tool calls, 200 commits -- and it surfaced 6 candidate skills that the team was executing by hand dozens of times per sprint.
Sessions analyzed: 319
Tool calls mined: 16,000+
Commits scanned: 200
Existing skills: 5 (already codified)
Candidates found: 6 (recurring but not yet automated)
The single most repeated workflow across the entire history: kill stale processes, rebuild from source, test resolve/execute, check output shape.
Evidence:
Sessions: 21 (highest intensity cluster)
Avg tool calls: 117 per session
pkill calls: 435 total across all sessions
resolve calls: 105 total
execute calls: 55 total
Signature: pkill -> sleep 2 -> bun src/cli.ts resolve -> validate
Foundry detected that pkill -9 -f 'unbrowse|kuri' followed by bun src/cli.ts resolve appeared 47+ times in a canonical sequence. The stale-server problem was the #1 source of false negatives.
Generated SKILL.md
---
name: unbrowse-dev-loop
description: Kill stale processes, rebuild, test resolve/execute against a URL, report pass/fail.
user-invocable: true
---Workflow:
- Kill stale processes:
pkill -9 -f 'unbrowse|kuri'; sleep 2 - Run resolve:
bun src/cli.ts resolve "<intent>" --url "<url>" - Validate output (HTTP response, JSON structure, no
[object Object], endpoints non-empty) - If resolve passed:
bun src/cli.ts execute <endpoint_id> - Report: PASS with key metrics or FAIL with error excerpt
Load-bearing rules:
- Always kill before testing -- stale servers are the #1 false-negative source
- Never skip the sleep after pkill -- kuri needs 2s to release ports
- If resolve returns 0 endpoints, that's a FAIL even if no error was thrown
Checks SAFE status, scans investor comms across Gmail and Telegram, flags stale threads, drafts follow-ups per cadence rules.
Evidence:
Sessions: 14
Avg tool calls: 60 per session
telegram-query: 4 invocations
gws-gmail-send: 3 invocations
granola (notes): 2 invocations
Memory reads: Heavy (investor contacts, SAFE status, fundraise state)
Foundry found that every fundraise session started with the same 7-step memory read sequence (load contacts, check BoldSign, scan Gmail, query Telegram, cross-reference, draft, update). The cadence rules were encoded in memory but executed manually every time.
Generated SKILL.md
---
name: unbrowse-investor-pipeline
description: Check SAFE status, scan investor comms across Gmail and Telegram, flag stale threads, draft follow-ups.
user-invocable: true
---Workflow:
- Load state from memory (fundraise status, investor contacts, comp plan)
- Poll BoldSign API for SAFE envelope changes
- Scan Gmail for investor replies (last 48h)
- Query Telegram for DM activity from key investor chat IDs
- Cross-reference: flag silent >5d, unsigned SAFEs >3d, owed responses
- Draft follow-up messages per cadence rules
- Update memory with new signals
Polls traction and stats APIs, compares against sprint targets, computes deltas, flags at-risk metrics.
Evidence:
Sessions: 27 (pure context-lookup cluster)
Avg tool calls: 43 per session
Pattern: 60%+ muonry reads, <20% Bash
API endpoints: 2 (traction + stats/summary)
Sprint targets: Tracked in memory, compared manually every session
27 sessions were almost entirely context-gathering: read memory, curl API, mental math, report to user. The same two API endpoints were called in the same order every time.
Generated SKILL.md
---
name: unbrowse-traction-watchdog
description: Poll traction and stats APIs, compare against sprint targets, compute deltas, flag at-risk metrics.
user-invocable: true
---Workflow:
- Fetch
https://launch.unbrowse.ai/api/traction - Fetch
https://beta-api.unbrowse.ai/v1/stats/summary - Load sprint targets from memory
- Compute deltas: current vs baseline, current vs target, daily run rate
- Flag: declining metrics, targets at risk (<50% toward goal), blockers (zero growth >2 days)
- Format dashboard table with status indicators
- Update memory with snapshot + timestamp
Fetches issues, triages by area, spawns parallel devswarm/codex agents, collects PRs, batch merges.
Evidence:
Sessions: 25
Avg tool calls: 28 per session
devswarm calls: get_issue(99), run_swarm(67), run_task(50), merge_pr(31), create_branch(25)
Git branches: 15+ codex/ prefixed branches
Batch merges: "Codex/merge last 5h branches (#286)" style commits
The devswarm MCP server does the heavy lifting, but the orchestration layer (which issues, what order, merge strategy) was re-decided every session. Foundry captured the triage-spawn-collect-merge pattern.
Generated SKILL.md
---
name: unbrowse-issue-swarm
description: Fetch P0/P1 issues, triage by area, spawn parallel agents, collect PRs, batch merge.
user-invocable: true
---Workflow:
- Fetch open P0/P1 issues from GitHub
- Triage by area: kuri, orchestrator, backend, auth, runtime
- Spawn parallel devswarm agents per issue (or codex worktree)
- Collect PRs, run CI checks
- Batch merge passing PRs
- Close linked issues on merge
Checks content queue, optimizes drafts for platform, schedules via Typefully, tracks engagement.
Evidence:
Sessions: 14
Avg tool calls: 60 per session
typefully skill: 3 invocations
x-virality skill: 2 invocations
Content messages: 207 across all sessions
docs: commits: 20 in last 100 commits
Generated SKILL.md
---
name: unbrowse-content-loop
description: Check content queue, optimize drafts for platform, schedule via Typefully, track engagement.
user-invocable: true
---Workflow:
- Check
.content-queue/for new drafts - Pick highest-priority unposted draft
- Optimize for target platform (X thread vs tweet vs LinkedIn vs HN)
- Schedule via Typefully (never x-cli)
- After publish: check engagement metrics
- Update Linear issue status
Runs eval harness, judges results, links artifacts to GitHub/Linear issues, auto-closes passing ones.
Evidence:
Sessions: 17
Avg tool calls: 85 per session
Eval harnesses: 7 distinct modes (codex, product-success, stress, autonomous, auth, campaign, regression)
Eval files: 47 in evals/ directory
Gap: Results written to JSON artifacts but never linked back to issues
The eval infrastructure is deep (7 harnesses, 47 files), but the last-mile step -- judging a result and closing the issue -- was always manual.
Generated SKILL.md
---
name: unbrowse-eval-close
description: Run eval harness, judge results, auto-close passing issues, create regression tickets for failures.
user-invocable: true
---Workflow:
- Run eval:
bun run eval:codex --intent "<intent>" --url "<url>" - Read artifact:
evals/codex-harness-last-run.json - Judge each case: PASS / PARTIAL / FAIL
- For PASS: close GitHub issue + update Linear
- For FAIL: create regression issue with artifact excerpt
- Update eval tracking
Tool co-occurrence bigrams (consecutive tool pairs across 16K+ calls):
| Count | Sequence | Meaning |
|---|---|---|
| 6,898 | Bash -> Bash | Long shell chains (build-test-fix) |
| 1,653 | read -> read | Deep context gathering |
| 694 | search -> search | Exploration sweeps |
| 449 | read -> edit | Read-then-modify (the default working mode) |
| 413 | search -> read | Find-then-inspect |
The 7 workflow clusters (by session count):
| Cluster | Sessions | Avg Calls | Dominant Pattern |
|---|---|---|---|
| Code-with-Context | 74 | 78 | read -> edit -> bash interleave |
| Build-Test-Fix | 21 | 117 | bash chains, pkill-rebuild-test |
| Package-Publish | 27 | 141 | git push, npm pack, sync-skill |
| Memory-Research | 27 | 43 | pure context reads, no execution |
| Issue-Triage | 25 | 28 | devswarm orchestration |
| Linear-Planning | 12 | 75 | Linear API + memory |
| Fundraise-Ops | 14 | 60 | telegram + gmail + memory |
- Release flow -- fully automated via
release-it+ CI (8 releases in 200 commits) - Skill sync --
bash scripts/sync-skill.shhandles monorepo-to-public-repo sync - Smart pre-commit -- file-pattern-aware test gating in
scripts/precommit.sh - Kuri packaging --
scripts/check-packaged-kuri.shvalidates the Zig binary ships correctly
Foundry only surfaces what's repeated AND not yet automated. Pin the install target explicitly:
node scripts/fabricate-bundle.mjs \
--preset presets/unbrowse-workflows.json \
--out dist \
--install-host codexFoundry writes:
dist/unbrowse-workflows/history-report.jsondist/unbrowse-workflows/candidate-skills.jsondist/unbrowse-workflows/candidates/<slug>/SKILL.mddist/unbrowse-workflows/tool-routing-report.jsondist/unbrowse-workflows/action-dag.jsondist/unbrowse-workflows/next-action-dataset.jsondist/unbrowse-workflows/bundle.jsondist/unbrowse-workflows/share.jsondist/unbrowse-workflows/registry-entry.jsondist/unbrowse-workflows/hosts/<host>/<file>
Foundry also installs discovered candidate skills into the active local host by default.
Host targets:
- Codex:
AGENTS.md - Claude:
CLAUDE.md - OpenClaw:
MEMORY.md
skills/foundry/SKILL.md-- installable skill contractpresets/unbrowse-workflows.json-- source-of-truth bundle presetscripts/discover-skill-candidates.mjs-- periodic candidate discoveryscripts/mine-history.mjs-- inspect known history matchesscripts/prepare-tool-routing.mjs-- build explicit action-DAG and next-action prep artifacts from tool tracesscripts/fabricate-bundle.mjs-- one-pass bundle/share/index/memory generationscripts/publish-bundle.mjs-- fabricate bundle artifacts and copy publicshare.jsoninto a site rootscripts/foundry-lib.mjs-- shared derivation logictests/-- regression coverage
The preset carries:
- bundle id and title
- entry skill
- child skills
- dependency graph
- routes
- history matchers
- tool trace sources for DAG/training prep
- share metadata
- registry metadata
Everything else is derived from that one file.
Current default bundle:
foundryhistory-skill-minerdocs-release-syncskill-surface-shipmain-actions-triage
Routing rule:
- call
foundryfor discovery, fabrication, sharing, indexing, and memory routing - call the narrower child skill when the request is clearly about that child workflow
Repo install:
npx skills add https://github.com/unbrowse-ai/foundry --skill foundry --yesnpm test
HOME=$(mktemp -d) node scripts/fabricate-bundle.mjs --preset presets/unbrowse-workflows.json --out dist --threshold 2This repo supports the local non-ML prep path from the routing-layer paper:
- collect tool traces
- build an explicit action DAG
- build next-action examples from real sessions
- use the DAG as reachability constraints before any trained router exists
Accepted tool trace shape:
{
"session_id": "sess-1",
"goal": "deploy to staging",
"actions": [
{ "tool": "git_status", "status": "success", "domain": "git" },
{ "tool": "build", "status": "success", "domain": "ci" },
{ "tool": "deploy_staging", "status": "success", "domain": "deploy" }
]
}