From 22e2b288534b5b12703f0042aef7d26abf4ef1f0 Mon Sep 17 00:00:00 2001 From: yai_mac_big Date: Tue, 19 May 2026 23:04:57 +0700 Subject: [PATCH] docs(benchmark): token-economy receipt + Phase 1 update MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the canonical token-economy benchmark comparing three paths for "open + read a GitHub README": curl raw README 1 round-trip, 0 schema tax browser-cli 1 compound, 0 schema tax MCP chrome 3 round-trips, +schema tax (deferred mode) v1 (2026-05-18) baseline: curl 1,821 tokens · browser-cli 1,828 · MCP 2,560 (+41%) Phase 1 update (2026-05-19) — same-day re-measurement after a whitespace normalizer landed in the browser-cli sibling repo (lib/commands/read.js): curl 2,189 (README content grew between days) browser-cli 1,896 (now wins curl by 13.4%, junk-line ratio 50% → 19%) Framed as an architectural moat — "Why CLI Agents Have a 50-Year Head Start" — not a token-shaving optimization. The moat is the load-once command vocabulary of CLI tools vs MCP's per-tool schema charge per turn. Companion doc docs/contracts/browser-cli/USE_CASE_ROUTING.md formalizes when each tool fits (content-addressable → curl; auth/JS/provenance/policy → browser-cli) so readers don't reduce the comparison to "just use curl." Raw I/O captured byte-for-byte under raw/ for reproducibility. Token proxy: chars / 3.8 (documented in count.py). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/benchmarks/token-economy/README.md | 302 +++++++++++++++++ docs/benchmarks/token-economy/count.py | 27 ++ .../token-economy/raw/bcli_input.txt | 3 + .../token-economy/raw/bcli_output.txt | 3 + .../token-economy/raw/bcli_v2_input.txt | 3 + .../raw/bcli_v2_output_2026-05-19.txt | 3 + .../token-economy/raw/curl_input.txt | 1 + .../token-economy/raw/curl_output.md | 281 ++++++++++++++++ .../raw/curl_output_2026-05-19.md | 303 ++++++++++++++++++ .../token-economy/raw/mcp_1_tabs_context.txt | 14 + .../token-economy/raw/mcp_2_navigate.txt | 15 + .../token-economy/raw/mcp_3_get_page_text.txt | 232 ++++++++++++++ .../token-economy/raw/mcp_schemas_used.json | 48 +++ .../contracts/browser-cli/USE_CASE_ROUTING.md | 117 +++++++ 14 files changed, 1352 insertions(+) create mode 100644 docs/benchmarks/token-economy/README.md create mode 100644 docs/benchmarks/token-economy/count.py create mode 100644 docs/benchmarks/token-economy/raw/bcli_input.txt create mode 100644 docs/benchmarks/token-economy/raw/bcli_output.txt create mode 100644 docs/benchmarks/token-economy/raw/bcli_v2_input.txt create mode 100644 docs/benchmarks/token-economy/raw/bcli_v2_output_2026-05-19.txt create mode 100644 docs/benchmarks/token-economy/raw/curl_input.txt create mode 100644 docs/benchmarks/token-economy/raw/curl_output.md create mode 100644 docs/benchmarks/token-economy/raw/curl_output_2026-05-19.md create mode 100644 docs/benchmarks/token-economy/raw/mcp_1_tabs_context.txt create mode 100644 docs/benchmarks/token-economy/raw/mcp_2_navigate.txt create mode 100644 docs/benchmarks/token-economy/raw/mcp_3_get_page_text.txt create mode 100644 docs/benchmarks/token-economy/raw/mcp_schemas_used.json create mode 100644 docs/contracts/browser-cli/USE_CASE_ROUTING.md diff --git a/docs/benchmarks/token-economy/README.md b/docs/benchmarks/token-economy/README.md new file mode 100644 index 0000000..dce6a79 --- /dev/null +++ b/docs/benchmarks/token-economy/README.md @@ -0,0 +1,302 @@ +# Why CLI Agents Have a 50-Year Head Start + +> Same task, three paths, measured. The numbers point at an architectural +> moat that MCP cannot patch by shipping a faster runtime. + +**Task** — Open `https://github.com/postmunnet/trinity-protocol` and bring +the README content into the model's working context. + +**Three paths measured**: +1. **`curl`** — pure Unix baseline (raw markdown via `raw.githubusercontent.com`) +2. **`browser-cli`** — Trinity's Playwright REPL sibling (stdin/stdout JSON) +3. **`MCP claude-in-chrome`** — Model Context Protocol browser tools + +**Method** — Same task, three paths. Token count = `chars / 3.8` (English-BPE +proxy; stable for relative comparison, not Anthropic's official tokenizer). +All raw I/O captured byte-for-byte under [`raw/`](raw/) next to this README — reproducible at any time. + +**Environment** — Claude Code, Opus 4.7, 2026-05-18. MCP loaded via +ToolSearch (deferred, not eager) — i.e. the *kindest possible* MCP case. + +--- + +## Headline + +| Path | Round-trips | Schema tax | Body I/O | **Total tokens** | +|---|:---:|---:|---:|---:| +| **`curl`** (raw markdown) | 1 | 0 | 1,821 | **1,821** | +| **`browser-cli`** (Playwright sibling) | 1 (compound stdin) | 0 | 1,811 | **1,811** | +| **MCP** (`claude-in-chrome`, deferred) | 3 | 619 | 1,941 | **2,560** | + +> **browser-cli matches curl on token cost and beats MCP by ~29% — in one +> round-trip instead of three, with zero per-task schema tax.** + +If MCP were loaded eagerly (classic mode — all ~40 chrome tool schemas in +the system prompt every turn whether used or not), schema tax alone adds +~7,000 tokens → MCP total balloons to ~9,500 tokens → **5×+ worse than +either CLI path**. + +--- + +## 📍 Update — Phase 1 (2026-05-19): browser-cli flips ahead of curl + +One day after v1 measurement, `browser-cli` received a whitespace +normalizer in `lib/commands/read.js` — collapsing CSS-Grid spacer divs +and trimming trailing whitespace before returning extracted text. +Same task, fresh measurement: + +| Path | v1 (2026-05-18) | v2 (2026-05-19, with normalizer) | Δ | +|---|---:|---:|---:| +| `curl` raw README | 1,821 | 2,189 | +368 *(README content grew)* | +| `browser-cli` | 1,828 | **1,896** | +68 | +| **Δ (bcli vs curl)** | **+7 (curl wins by 0.4%)** | **−293 (browser-cli wins by 13.4%)** | **flip** | + +**Why it flipped:** + +1. **Whitespace normalizer cleared the spacer-junk** — blank-or-junk + line ratio in extracted article: **50% → 19%** (−31 percentage + points). The advisor's `lib/commands/read.js` patch worked. +2. **`curl` shipped a bigger payload, not by choice** — GitHub's README + grew over the day (v0.1.0 release prep + new Trinity origin / ritual + reference sections). `curl` is forced to ship the *full* grown + markdown; `browser-cli` ships only the rendered `
` element, + leaving footer / sidebar / license / badges behind. +3. The advisor predicted ~920 tokens for browser-cli post-Phase-1. + Actual landing is 1,896 — higher than predicted because the README + itself grew during the same window — but the **direction matches**: + browser-cli now wins on signal density per token. + +**Section "What did `browser-cli` charge for that `curl` did not?" is now +partially obsolete:** + +| Original finding | Status post-Phase-1 | +|---|---| +| Whitespace pollution (50% blank/junk lines) | ✅ **fixed** — now 19% | +| Latency (~2s Playwright cold-start vs ~200ms curl) | ⏳ still applies | +| Loss of structure (no `#` headings, no fence tags) | ⏳ still applies | + +Post-Phase-1 calculus: `browser-cli` wins on tokens, still loses on +latency, still loses on markdown structure. The composition of these +trade-offs continues to support the routing matrix in +[`USE_CASE_ROUTING.md`](../../contracts/browser-cli/USE_CASE_ROUTING.md). + +**Honest disclosure:** the v2 result depends on an *uncommitted* change +in the `browser-cli` sibling repo (`lib/commands/read.js`) at the time +of writing. The benchmark will be re-verified once the change is +committed and tagged. v1 numbers above remain the only fully-citable +baseline until then. + +**Reproduction (v2):** + +``` +cd /path/to/browser-cli # v0.3.0+ with normalizer patch +printf 'goto https://github.com/postmunnet/trinity-protocol\ntext article\nexit\n' \ + | node index.js +``` + +Raw v2 output captured under `raw/`: +- [`raw/bcli_v2_input.txt`](raw/bcli_v2_input.txt) — same 3-line script +- [`raw/bcli_v2_output_2026-05-19.txt`](raw/bcli_v2_output_2026-05-19.txt) — full response (7,134 chars) +- [`raw/curl_output_2026-05-19.md`](raw/curl_output_2026-05-19.md) — same-day curl baseline (8,235 chars) + +v1 baseline preserved at `raw/bcli_output.txt` + `raw/curl_output.md` for diff inspection. + +--- + +## Why CLI tools have a 50-year head start + +GitHub exposes a **content-addressable abstraction** at +`raw.githubusercontent.com////README.md`. CLI tools +can hit it in one round-trip with no state, no schema tax, no chrome. + +Two of the three CLI ideologies beat MCP on tokens: +- **`curl` wins on signal-per-token** (raw markdown structure) +- **`browser-cli` wins on schema-cost-per-call** (one tool, many verbs) +- **MCP loses on round-trip count** *and* schema tax *and* runtime nagging + +> **The moat is not "CLI tools are faster."** +> **The moat is "CLI tools have access to better abstractions because +> Unix has had 50 years to design them, HTTP has had 30, and the +> Playwright REPL contract is `load once, command many` — whereas MCP +> is `load each, schema each, round-trip each`."** + +This is a *structural* moat, not an *optimization* moat. MCP cannot ship a +release that closes it without redesigning the protocol — at which point +it would no longer be MCP. + +``` +curl reads what is publicly addressable. +browser-cli reads what is rendered, what is authenticated, + and logs what was done. + +curl is stateless. browser-cli has provenance. +curl cannot click. browser-cli enforces policy. + +The right tool for the layer it works at. +Trinity composes both. +``` + +See [`docs/contracts/browser-cli/USE_CASE_ROUTING.md`](../../contracts/browser-cli/USE_CASE_ROUTING.md) for the full routing matrix between `curl`, `browser-cli`, and the cases that require each. + +--- + +## Path A — `curl` (Unix baseline) + +| Leg | Payload | chars | tokens (≈) | +|---|---|---:|---:| +| Tool description | `Bash` is base-context — $0 marginal | 0 | 0 | +| Tool call (input) | `curl -sL https://raw.githubusercontent.com/postmunnet/trinity-protocol/main/README.md` | 85 | 22 | +| Tool result (output) | Raw `README.md` (UTF-8 markdown, headings/links/code-fences preserved) | 6,835 | 1,799 | +| **Total** | 1 round-trip | **6,920** | **1,821** | + +**Signal quality:** ★★★★★ — Raw markdown is the canonical form. Headings, links, code fences all preserved. + +## Path B — `browser-cli` (Trinity sibling, Playwright stdin/stdout) + +``` +printf 'goto https://github.com/postmunnet/trinity-protocol\n + text article\n + exit\n' | node browser-cli/index.js +``` + +| Leg | Payload | chars | tokens (≈) | +|---|---|---:|---:| +| Tool description | `Bash` is base-context — $0 marginal | 0 | 0 | +| Command vocabulary | `COMMAND_CONTRACT.md` loaded once per agent lifetime, amortized to ~0 per task; per-task verbs used here = `goto`, `text` (~10 tokens if explicit) | ~0 | ~0 | +| Tool call (input) | 3-line script (`goto`, `text article`, `exit`) | 70 | 18 | +| Tool result (output) | Banner + 2 JSON responses (`{ok, url, title}` + `{ok, text}` with article body) | 6,878 | 1,810 | +| **Total** | 1 compound invocation | **6,948** | **1,828** | + +**Signal quality:** ★★★☆☆ — Plain text only. Markdown structure stripped. **~50% of output lines are blank or CSS-Grid spacer junk** (178 of 354 lines) — the renderer pollutes the agent's context with non-semantic whitespace from GitHub's layout grid. + +## Path C — MCP (`claude-in-chrome`, deferred load) + +| Leg | Payload | chars | tokens (≈) | +|---|---|---:|---:| +| Tool descriptions (×3) | `tabs_context_mcp` + `navigate` + `get_page_text` schemas loaded via ToolSearch | 2,351 | 619 | +| 1. `tabs_context_mcp` | `{createIfEmpty:true}` + tab-list + redundant Tab-Context echo + `browser_batch` nag | 554 | 146 | +| 2. `navigate` | `{tabId, url}` + nav-confirmation + redundant Tab-Context echo + nag | 574 | 151 | +| 3. `get_page_text` | `{tabId}` + extracted article text + redundant Tab-Context echo | 6,249 | 1,644 | +| **Total** | 3 round-trips + schema load | **9,728** | **2,560** | + +**Signal quality:** ★★★★☆ — Extracted text is cleaner than browser-cli's (better article filter). Still no markdown structure, but no spacer-junk pollution. + +--- + +## What did MCP charge for that the CLI paths did not? + +1. **Schema tax — 619 tokens** just to describe 3 tools in the conversation. Per task. Every conversation. +2. **Redundant Tab-Context echo** — ~80 chars of tab metadata printed on every call (~240 chars over 3 calls). You didn't ask for it; it ships anyway. +3. **`browser_batch` nag** — system-reminder fires after every single tool call ("Prefer browser_batch — it is significantly faster"). 2 fired this run = ~560 chars. *Anthropic's own runtime is telling you MCP is too chatty in its own infrastructure.* +4. **State management overhead** — tabId, tabGroupId, createIfEmpty flag. CLI paths have zero state between calls. + +--- + +## What did browser-cli charge for that `curl` did not? + +1. **Whitespace pollution** — GitHub's CSS Grid uses spacer divs (`
` with grid placement). `text article` flattens these to repeating empty lines. ~50% of output is blank/junk. This is a *signal* problem, not a *cost* problem — same tokens, less meaning per token. +2. **Latency** — Playwright cold-start is ~2s vs `curl` at ~200ms. Token-equal but wall-clock costlier. +3. **Loss of structure** — No `#` heading levels, no fence tags, no link URLs. Agent has to infer structure from indentation. + +**Where browser-cli wins:** sites that *don't* expose raw markdown (auth-gated dashboards, JS-rendered SPAs, login-required content). curl can't reach those. browser-cli is the right tool *when* CDP-level access is required — but for GitHub READMEs, curl is still the optimal CLI primitive. + +--- + +## Compounding cost over realistic workloads + +Per-repo overhead vs cheapest CLI path: + +| Workload | curl total | browser-cli total | MCP total | MCP excess vs curl | +|---|---:|---:|---:|---:| +| 1 repo skim | 1.8K | 1.8K | 2.6K | +0.7K | +| 10-repo competitive analysis | 18K | 18K | 26K | +7K | +| 100-repo crawl (dependency audit) | 180K | 180K | 256K | **+76K** | +| 1,000-repo (large-scale OSS scan) | 1.8M | 1.8M | 2.6M | **+760K** | + +For an agent with a 200K context window, **+76K tokens on a 100-repo task +is the difference between "fits" and "summarize and drop half"**. + +--- + +## Methodology disclosures + +1. **Token proxy:** `chars / 3.8`. Calibrated for English BPE. UTF-8 Thai + tokenizes ~2.0–2.5× denser, so absolute tokens for the Thai-mixed + README are slightly higher than reported. All paths read identical + Thai content, so the *relative* comparison is unaffected. + +2. **MCP "deferred" mode** is the kindest possible read on MCP. Eager-load + (classic Claude Desktop, Cursor, Continue, etc.) charges for every + connected tool every turn. Chrome MCP exposes ~40 tools; conservative + ~6,000-token schema tax in eager mode. **Add 6,000 to MCP total to + get the classic-mode comparison.** + +3. **browser-cli "schema tax" is ~0 per task** because: + - The `Bash` tool description is base context. + - The `browser-cli` *command vocabulary* (COMMAND_CONTRACT.md) is + read once when the agent first learns the tool — amortized to near-zero + over any non-trivial workload. + - This contrasts MCP, which charges schema per *tool call* (per turn + in eager mode, per `ToolSearch` invocation in deferred mode). + +4. **`curl` baseline is `raw.githubusercontent.com`**, not HTML scraping — + on purpose. The point is that CLI-first agents *would* use the + cleaner abstraction. An "apples-to-apples HTML scrape" CLI would just + bring curl up to ~browser-cli's cost, not change the MCP delta. + +5. **MCP `tabs_create_mcp` was not needed** because `tabs_context_mcp` + with `createIfEmpty:true` produces a fresh tab in the same call. The + schema for `tabs_create_mcp` is therefore *not* in the 619-token + count, even though typical MCP usage docs imply you should preload it. + Counting it would push MCP schema tax to ~700 tokens. + +6. **System reminders** ("Prefer browser_batch...") are counted as + real context pollution because they are real bytes the model has to + process. If you disable them, MCP drops by ~150 tokens — still ≥35% + worse than either CLI path. + +--- + +## Reproduction + +All raw I/O captured byte-for-byte under `raw/` next to this README. +Re-run the measurement at any time: + +```bash +cd docs/benchmarks/token-economy + +# Path A — curl raw README +curl -sL https://raw.githubusercontent.com/postmunnet/trinity-protocol/main/README.md > /tmp/curl.md +python3 count.py /tmp/curl.md + +# Path B — browser-cli (Trinity sibling, expects /path/to/browser-cli/) +printf 'goto https://github.com/postmunnet/trinity-protocol\ntext article\nexit\n' \ + | node /path/to/browser-cli/index.js > /tmp/bcli.txt +python3 count.py /tmp/bcli.txt + +# Path C — MCP requires Claude Code + chrome-in-chrome extension; +# this run's raw outputs are in raw/mcp_*.txt + +# Verify committed raw artifacts: +for f in raw/*; do python3 count.py "$f"; done +``` + +Directory layout: + +``` +docs/benchmarks/token-economy/ +├── README.md # this file +├── count.py # token proxy (chars / 3.8) +└── raw/ + ├── curl_input.txt # v1, 2026-05-18 + ├── curl_output.md # v1, 2026-05-18 + ├── curl_output_2026-05-19.md # v2 baseline (same-day) + ├── bcli_input.txt # v1 + ├── bcli_output.txt # v1 + ├── bcli_v2_input.txt # v2 + ├── bcli_v2_output_2026-05-19.txt # v2 with normalizer + ├── mcp_schemas_used.json + ├── mcp_1_tabs_context.txt + ├── mcp_2_navigate.txt + └── mcp_3_get_page_text.txt +``` diff --git a/docs/benchmarks/token-economy/count.py b/docs/benchmarks/token-economy/count.py new file mode 100644 index 0000000..a51be7b --- /dev/null +++ b/docs/benchmarks/token-economy/count.py @@ -0,0 +1,27 @@ +#!/usr/bin/env python3 +"""Token-count proxy for Claude-class tokenizers. + +Method: chars / 3.8 (English/ASCII baseline, typical Claude/GPT BPE rate). +We report chars + estimated tokens side by side so any reader can audit. +Not a substitute for Anthropic's official count_tokens API, but stable +and adequate for relative comparison within one document corpus. +""" +import sys, json, pathlib + +RATIO = 3.8 + +def measure(label: str, text: str): + chars = len(text) + tokens = round(chars / RATIO) + return {"label": label, "chars": chars, "tokens_est": tokens} + +if __name__ == "__main__": + if len(sys.argv) < 2: + # stdin mode + text = sys.stdin.read() + result = measure("stdin", text) + else: + path = pathlib.Path(sys.argv[1]) + text = path.read_text(encoding="utf-8", errors="replace") + result = measure(path.name, text) + print(json.dumps(result, ensure_ascii=False)) diff --git a/docs/benchmarks/token-economy/raw/bcli_input.txt b/docs/benchmarks/token-economy/raw/bcli_input.txt new file mode 100644 index 0000000..9608202 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/bcli_input.txt @@ -0,0 +1,3 @@ +goto https://github.com/postmunnet/trinity-protocol +text article +exit diff --git a/docs/benchmarks/token-economy/raw/bcli_output.txt b/docs/benchmarks/token-economy/raw/bcli_output.txt new file mode 100644 index 0000000..afef539 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/bcli_output.txt @@ -0,0 +1,3 @@ +browser-cli v0.1 ready | mode=local | schema=v1 | policy=normal +{"ok":true,"url":"https://github.com/postmunnet/trinity-protocol","title":"GitHub - postmunnet/trinity-protocol: AI agents can claim work is done. Trinity makes them prove it. CLI-first control layer for AI coding agents with evidence-driven verification, scoped execution, and auditable promotion. · GitHub"} +{"ok":true,"text":"Trinity Protocol\nLanguage: English | ไทย\nAI agents can claim work is done. Trinity makes them prove it.\nTrinity is a CLI-first control layer for AI coding agents. It coordinates\nvendor AI harnesses, verifies their work, and records decisions as auditable\nartifacts.\nCore rule:\nNo artifact = no trust.\nNo verification = no completion.\nNo authority = no transition.\n\n \n \n \n\n \n \n\n \n \n\nWhy Trinity?\nAI coding agents are powerful, but their claims are not reliable evidence.\nThey may say:\n\ntests pass, but no test artifact exists\na bug is fixed, but no reproduction was verified\na deploy is safe, but no rollback path was recorded\na file was changed correctly, but no diff was inspected\n\nTrinity turns AI-assisted work into an evidence-driven workflow:\nHuman intent\n |\n v\nAI proposes / executes within scope\n |\n v\nTrinity captures artifacts\n |\n v\nVerifier checks evidence\n |\n v\nPolicy / Human decides promotion\n\n \n \n \n\n \n \n\n \n \nRead the one-page explanation:\n\nWHY_TRINITY.md\nWHY_TRINITY_TH.md\n\n\n60-Second Example\nBefore Trinity:\nUser: Fix the login bug.\nAgent: Done. Tests pass.\n\n \n \n \n\n \n \n\n \n \nProblem: there is no trustworthy evidence.\nAfter Trinity:\nUser: Fix the login bug.\nTrinity requires:\n1. a scoped plan\n2. bounded execution\n3. diff and test artifacts\n4. verifier verdict\n5. explicit promotion authority\n\n \n \n \n\n \n \n\n \n \nIf the agent cannot produce the artifact, the work cannot be promoted.\n\nCurrent Status\n\nArchitecture generation: Trinity v2\nRuntime release: v0.1.0\nPublic Tool Contract: v1.0 freeze candidate; working spec is v1.1.0-draft\nKernel CLI: verified v0.1.0 runtime included in this repository\nRelease evidence: docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md\n\nBehavioral proof, not just test count:\n\nState machine safety\nGate enforcement\nAudit chain integrity\nTool contract compliance\nVerifier verdict behavior\nRitual command flow\nHuman approval requirements for risky transitions\n\nLatest verified test evidence:\nSource checkout: 1862 passed, 6 skipped\nClean export without optional sibling tools: 1860 passed, 8 skipped\n\n \n \n \n\n \n \n\n \n \n\nArchitecture\nHuman Owner\n |\n v\nTrinity Control Layer\n |\n +-- Intent / Scope / Constraints\n +-- Session capsule + state machine\n +-- Bounded AI execution\n +-- Artifact capture\n +-- Verifier + policy gates\n +-- Audit chain\n |\n v\nPromotion only with evidence\n\n \n \n \n\n \n \n\n \n \nWorker layer:\nClaude Code / Codex / Cursor / Gemini\n |\n v\nVendor AI proposes and executes\n\n \n \n \n\n \n \n\n \n \nTrinity does not replace the agent. Trinity governs the work.\n\nQuickstart\nbash .ai/cli/ai status\nbash .ai/cli/ai sss \"Test Trinity with a small documentation task\"\nbash .ai/cli/ai vvv\nbash .ai/cli/ai nnn\nbash .ai/cli/ai gogogo\n \n \n \n\n \n \n\n \n \nRun the CLI test suite:\npython3 -m pytest .ai/cli/tests -q\n \n \n \n\n \n \n\n \n \n\nRitual Commands\nRituals are the operator protocol. They are not the first thing to understand,\nbut they are how Trinity enforces the workflow once work begins.\nsss -> vvv -> nnn -> gogogo -> ddd -> rrr -> close\n\n \n \n \n\n \n \n\n \n \n\n\n\nRitual\nPurpose\n\n\n\n\nsss\nStart a session capsule and initial state\n\n\nvvv\nDefine goal, scope, constraints, acceptance, risk\n\n\nnnn\nNormalize into plan, steps, and artifacts\n\n\ngogogo\nExplicit execution gate\n\n\nddd\nInspect diff, damage, and scope creep\n\n\nrrr\nRetro and memory handoff through memory-cli index\n\n\nclose\nClose the session with explicit final state\n\n\n\nReference:\n\ndocs/RITUALS.md\ndocs/RITUALS_TH.md\n\n\nDocumentation Map\nStart here:\n\nWhy Trinity: English | ไทย\nOrigin story: English | ไทย\nRitual reference: English | ไทย\nGetting started: docs/GETTING_STARTED.md\nArchitecture: docs/ARCHITECTURE.md\nStorage taxonomy: docs/STORAGE_TAXONOMY.md\nVersion lineage: English | ไทย\nGitHub-safe export: docs/GITHUB_EXPORT.md\n\nOperator guides:\n\ndocs/operator-guide-en/00_README.md\ndocs/operator-guide-th/00_README.md\n\nSpecs:\n\ndocs/specs/INDEX.md\ndocs/specs/00_BLUEPRINT.md\ndocs/specs/01_TOOL_CONTRACT.md\n\n\nLayout\ntrinity_v2/\n├── AGENTS.md # Generic agent entrypoint\n├── CLAUDE.md # Claude Code entrypoint\n├── GEMINI.md # Gemini CLI entrypoint\n├── WARP.md # Warp entrypoint\n├── .ai/ # Trinity runtime\n│ ├── cli/ # Python CLI kernel commands\n│ ├── sessions/ # Session capsules\n│ └── audit/ # Hash-chain audit log\n└── docs/\n ├── specs/ # Canonical implementation specs and contracts\n ├── operator-guide-en/\n └── operator-guide-th/\n\n \n \n \n\n \n \n\n \n \n\nVersion Lineage\nThis repository previously contained earlier experimental Trinity Protocol\nmaterials. From v0.1.0 onward, the root tree is the canonical Trinity v2\nexecutable governance kernel. Legacy materials remain available through Git\nhistory.\nVersion story:\nTrinity Protocol v2 = architecture / constitution generation\nRuntime v0.1.0 = first public executable runtime line\nTool Contract = v1.0 freeze candidate, v1.1 draft working spec\n\n \n \n \n\n \n \n\n \n \nSee docs/VERSION_LINEAGE.md.\n\nMemory CLI Note\nFor the Trinity v0.1.0 ritual flow, rrr delegates to memory-cli index.\nmemory-cli learn appears in legacy/spec materials as a historical or\nnon-ritual memory surface and must not be used by rrr.\n\nภาษาไทย\nAI agent สามารถพูดได้ว่างานเสร็จแล้ว แต่ Trinity บังคับให้ต้องมีหลักฐาน\nTrinity คือ control layer แบบ CLI-first สำหรับงานที่ใช้ AI coding agent\nมันไม่ได้แทน Claude Code, Codex, Cursor หรือ Gemini แต่ทำหน้าที่คุม scope,\nเก็บ artifact, ตรวจ verifier, และบันทึก decision ให้ audit ย้อนหลังได้\nหลักการหลัก:\nไม่มี artifact = ยังเชื่อไม่ได้\nไม่มี verification = ยังถือว่างานไม่เสร็จ\nไม่มี authority = ห้ามข้าม state\n\n \n \n \n\n \n \n\n \n \nอ่านต่อ:\n\nWHY_TRINITY_TH.md — ทำไมต้องมี Trinity\ndocs/ORIGIN_TH.md — ที่มาของ Trinity\ndocs/RITUALS_TH.md — ritual reference\ndocs/operator-guide-th/00_README.md — คู่มือใช้งาน"} diff --git a/docs/benchmarks/token-economy/raw/bcli_v2_input.txt b/docs/benchmarks/token-economy/raw/bcli_v2_input.txt new file mode 100644 index 0000000..9608202 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/bcli_v2_input.txt @@ -0,0 +1,3 @@ +goto https://github.com/postmunnet/trinity-protocol +text article +exit diff --git a/docs/benchmarks/token-economy/raw/bcli_v2_output_2026-05-19.txt b/docs/benchmarks/token-economy/raw/bcli_v2_output_2026-05-19.txt new file mode 100644 index 0000000..bef74b2 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/bcli_v2_output_2026-05-19.txt @@ -0,0 +1,3 @@ +browser-cli v0.3.0 ready | mode=local | schema=v1 | policy=normal +{"ok":true,"url":"https://github.com/postmunnet/trinity-protocol","title":"GitHub - postmunnet/trinity-protocol: AI agents can claim work is done. Trinity makes them prove it. CLI-first control layer for AI coding agents with evidence-driven verification, scoped execution, and auditable promotion. · GitHub"} +{"ok":true,"text":"Trinity Protocol\nLanguage: English | ไทย\nAI agents can claim work is done. Trinity makes them prove it.\nTrinity is a CLI-first control layer for AI coding agents. It coordinates\nvendor AI harnesses, verifies their work, and records decisions as auditable\nartifacts.\nCore rule:\nNo artifact = no trust.\nNo verification = no completion.\nNo authority = no transition.\n\nWhy Trinity?\nAI coding agents are powerful, but their claims are not reliable evidence.\nThey may say:\n\ntests pass, but no test artifact exists\na bug is fixed, but no reproduction was verified\na deploy is safe, but no rollback path was recorded\na file was changed correctly, but no diff was inspected\n\nTrinity turns AI-assisted work into an evidence-driven workflow:\nHuman intent\n |\n v\nAI proposes / executes within scope\n |\n v\nTrinity captures artifacts\n |\n v\nVerifier checks evidence\n |\n v\nPolicy / Human decides promotion\n\nRead the one-page explanation:\n\nWHY_TRINITY.md\nWHY_TRINITY_TH.md\n\n60-Second Example\nBefore Trinity:\nUser: Fix the login bug.\nAgent: Done. Tests pass.\n\nProblem: there is no trustworthy evidence.\nAfter Trinity:\nUser: Fix the login bug.\nTrinity requires:\n1. a scoped plan\n2. bounded execution\n3. diff and test artifacts\n4. verifier verdict\n5. explicit promotion authority\n\nIf the agent cannot produce the artifact, the work cannot be promoted.\n\nCurrent Status\n\nArchitecture generation: Trinity v2\nRuntime release: v0.1.0\nTool Contract ABI: v1.0.0 stable; validation/examples tooling v1.0.2\nKernel CLI: verified v0.1.0 runtime included in this repository\nRelease evidence: docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md\n\nBehavioral proof, not just test count:\n\nState machine safety\nGate enforcement\nAudit chain integrity\nTool contract compliance\nVerifier verdict behavior\nRitual command flow\nHuman approval requirements for risky transitions\n\nLatest verified test evidence:\nSource checkout: 1862 passed, 6 skipped\nClean export without optional sibling tools: 1860 passed, 8 skipped\n\nTool Ecosystem\nTrinity separates the kernel, public ABI, and tools that implement the ABI.\n\nTool\nRole\nStatus\nContract\nRepo\n\nTrinity Protocol\nKernel / governance runtime\nv0.1.0 stable\nconsumes Tool Contract\nthis repo\n\nTrinity Tool Contract\nStable ABI for CLI tools\nv1.0.0 stable, v1.0.2 examples\nv1.0\npostmunnet/trinity-tool-contract\n\nbrowser-cli\nBrowser automation organ\nv0.3.0 partial v1 envelope implementation\npartial v1.0\npostmunnet/browser-cli\n\nmemory-cli\nArtifact memory organ\nplanned\ntarget v1.0\nplanned\n\nverify-cli\nVerification organ\nplanned\ntarget v1.0\nplanned\n\nretro-cli\nRetrospective / memory handoff organ\nplanned\ntarget v1.0\nplanned\n\nCanonical Tool Contract:\n\npostmunnet/trinity-tool-contract\npinned ABI: v1.0.0\nvalidation/examples tooling: v1.0.2\n\nArchitecture\nHuman Owner\n |\n v\nTrinity Control Layer\n |\n +-- Intent / Scope / Constraints\n +-- Session capsule + state machine\n +-- Bounded AI execution\n +-- Artifact capture\n +-- Verifier + policy gates\n +-- Audit chain\n |\n v\nPromotion only with evidence\n\nWorker layer:\nClaude Code / Codex / Cursor / Gemini\n |\n v\nVendor AI proposes and executes\n\nTrinity does not replace the agent. Trinity governs the work.\n\nQuickstart\nbash .ai/cli/ai status\nbash .ai/cli/ai sss \"Test Trinity with a small documentation task\"\nbash .ai/cli/ai vvv\nbash .ai/cli/ai nnn\nbash .ai/cli/ai gogogo\n\nRun the CLI test suite:\npython3 -m pytest .ai/cli/tests -q\n\nRitual Commands\nRituals are the operator protocol. They are not the first thing to understand,\nbut they are how Trinity enforces the workflow once work begins.\nsss -> vvv -> nnn -> gogogo -> ddd -> rrr -> close\n\nRitual\nPurpose\n\nsss\nStart a session capsule and initial state\n\nvvv\nDefine goal, scope, constraints, acceptance, risk\n\nnnn\nNormalize into plan, steps, and artifacts\n\ngogogo\nExplicit execution gate\n\nddd\nInspect diff, damage, and scope creep\n\nrrr\nRetro and memory handoff through memory-cli index\n\nclose\nClose the session with explicit final state\n\nReference:\n\ndocs/RITUALS.md\ndocs/RITUALS_TH.md\n\nDocumentation Map\nStart here:\n\nWhy Trinity: English | ไทย\nOrigin story: English | ไทย\nRitual reference: English | ไทย\nGetting started: docs/GETTING_STARTED.md\nArchitecture: docs/ARCHITECTURE.md\nStorage taxonomy: docs/STORAGE_TAXONOMY.md\nVersion lineage: English | ไทย\nGitHub-safe export: docs/GITHUB_EXPORT.md\n\nOperator guides:\n\ndocs/operator-guide-en/00_README.md\ndocs/operator-guide-th/00_README.md\n\nSpecs:\n\ndocs/specs/INDEX.md\ndocs/specs/00_BLUEPRINT.md\ndocs/specs/01_TOOL_CONTRACT.md redirects to postmunnet/trinity-tool-contract\n\nLayout\ntrinity_v2/\n├── AGENTS.md # Generic agent entrypoint\n├── CLAUDE.md # Claude Code entrypoint\n├── GEMINI.md # Gemini CLI entrypoint\n├── WARP.md # Warp entrypoint\n├── .ai/ # Trinity runtime\n│ ├── cli/ # Python CLI kernel commands\n│ ├── sessions/ # Session capsules\n│ └── audit/ # Hash-chain audit log\n└── docs/\n ├── specs/ # Canonical implementation specs and contracts\n ├── operator-guide-en/\n └── operator-guide-th/\n\nVersion Lineage\nThis repository previously contained earlier experimental Trinity Protocol\nmaterials. From v0.1.0 onward, the root tree is the canonical Trinity v2\nexecutable governance kernel. Legacy materials remain available through Git\nhistory.\nVersion story:\nTrinity Protocol v2 = architecture / constitution generation\nRuntime v0.1.0 = first public executable runtime line\nTool Contract ABI = v1.0.0 stable; validation/examples tooling v1.0.2\n\nSee docs/VERSION_LINEAGE.md.\n\nMemory CLI Note\nFor the Trinity v0.1.0 ritual flow, rrr delegates to memory-cli index.\nmemory-cli learn appears in legacy/spec materials as a historical or\nnon-ritual memory surface and must not be used by rrr.\n\nภาษาไทย\nAI agent สามารถพูดได้ว่างานเสร็จแล้ว แต่ Trinity บังคับให้ต้องมีหลักฐาน\nTrinity คือ control layer แบบ CLI-first สำหรับงานที่ใช้ AI coding agent\nมันไม่ได้แทน Claude Code, Codex, Cursor หรือ Gemini แต่ทำหน้าที่คุม scope,\nเก็บ artifact, ตรวจ verifier, และบันทึก decision ให้ audit ย้อนหลังได้\nหลักการหลัก:\nไม่มี artifact = ยังเชื่อไม่ได้\nไม่มี verification = ยังถือว่างานไม่เสร็จ\nไม่มี authority = ห้ามข้าม state\n\nอ่านต่อ:\n\nWHY_TRINITY_TH.md — ทำไมต้องมี Trinity\ndocs/ORIGIN_TH.md — ที่มาของ Trinity\ndocs/RITUALS_TH.md — ritual reference\ndocs/operator-guide-th/00_README.md — คู่มือใช้งาน\npostmunnet/trinity-tool-contract — Tool Contract ABI v1.0"} diff --git a/docs/benchmarks/token-economy/raw/curl_input.txt b/docs/benchmarks/token-economy/raw/curl_input.txt new file mode 100644 index 0000000..9d8244e --- /dev/null +++ b/docs/benchmarks/token-economy/raw/curl_input.txt @@ -0,0 +1 @@ +curl -sL https://raw.githubusercontent.com/postmunnet/trinity-protocol/main/README.md \ No newline at end of file diff --git a/docs/benchmarks/token-economy/raw/curl_output.md b/docs/benchmarks/token-economy/raw/curl_output.md new file mode 100644 index 0000000..1ef9027 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/curl_output.md @@ -0,0 +1,281 @@ +# Trinity Protocol + +Language: English | [ไทย](#ภาษาไทย) + +AI agents can claim work is done. Trinity makes them prove it. + +Trinity is a CLI-first control layer for AI coding agents. It coordinates +vendor AI harnesses, verifies their work, and records decisions as auditable +artifacts. + +Core rule: + +```text +No artifact = no trust. +No verification = no completion. +No authority = no transition. +``` + +--- + +## Why Trinity? + +AI coding agents are powerful, but their claims are not reliable evidence. + +They may say: + +- tests pass, but no test artifact exists +- a bug is fixed, but no reproduction was verified +- a deploy is safe, but no rollback path was recorded +- a file was changed correctly, but no diff was inspected + +Trinity turns AI-assisted work into an evidence-driven workflow: + +```text +Human intent + | + v +AI proposes / executes within scope + | + v +Trinity captures artifacts + | + v +Verifier checks evidence + | + v +Policy / Human decides promotion +``` + +Read the one-page explanation: + +- [`WHY_TRINITY.md`](WHY_TRINITY.md) +- [`WHY_TRINITY_TH.md`](WHY_TRINITY_TH.md) + +--- + +## 60-Second Example + +Before Trinity: + +```text +User: Fix the login bug. +Agent: Done. Tests pass. +``` + +Problem: there is no trustworthy evidence. + +After Trinity: + +```text +User: Fix the login bug. +Trinity requires: +1. a scoped plan +2. bounded execution +3. diff and test artifacts +4. verifier verdict +5. explicit promotion authority +``` + +If the agent cannot produce the artifact, the work cannot be promoted. + +--- + +## Current Status + +- Architecture generation: Trinity v2 +- Runtime release: v0.1.0 +- Public Tool Contract: v1.0 freeze candidate; working spec is `v1.1.0-draft` +- Kernel CLI: verified v0.1.0 runtime included in this repository +- Release evidence: [`docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md`](docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md) + +Behavioral proof, not just test count: + +- State machine safety +- Gate enforcement +- Audit chain integrity +- Tool contract compliance +- Verifier verdict behavior +- Ritual command flow +- Human approval requirements for risky transitions + +Latest verified test evidence: + +```text +Source checkout: 1862 passed, 6 skipped +Clean export without optional sibling tools: 1860 passed, 8 skipped +``` + +--- + +## Architecture + +```text +Human Owner + | + v +Trinity Control Layer + | + +-- Intent / Scope / Constraints + +-- Session capsule + state machine + +-- Bounded AI execution + +-- Artifact capture + +-- Verifier + policy gates + +-- Audit chain + | + v +Promotion only with evidence +``` + +Worker layer: + +```text +Claude Code / Codex / Cursor / Gemini + | + v +Vendor AI proposes and executes +``` + +Trinity does not replace the agent. Trinity governs the work. + +--- + +## Quickstart + +```bash +bash .ai/cli/ai status +bash .ai/cli/ai sss "Test Trinity with a small documentation task" +bash .ai/cli/ai vvv +bash .ai/cli/ai nnn +bash .ai/cli/ai gogogo +``` + +Run the CLI test suite: + +```bash +python3 -m pytest .ai/cli/tests -q +``` + +--- + +## Ritual Commands + +Rituals are the operator protocol. They are not the first thing to understand, +but they are how Trinity enforces the workflow once work begins. + +```text +sss -> vvv -> nnn -> gogogo -> ddd -> rrr -> close +``` + +| Ritual | Purpose | +|---|---| +| `sss` | Start a session capsule and initial state | +| `vvv` | Define goal, scope, constraints, acceptance, risk | +| `nnn` | Normalize into plan, steps, and artifacts | +| `gogogo` | Explicit execution gate | +| `ddd` | Inspect diff, damage, and scope creep | +| `rrr` | Retro and memory handoff through `memory-cli index` | +| `close` | Close the session with explicit final state | + +Reference: + +- [`docs/RITUALS.md`](docs/RITUALS.md) +- [`docs/RITUALS_TH.md`](docs/RITUALS_TH.md) + +--- + +## Documentation Map + +Start here: + +- **Why Trinity:** [`English`](WHY_TRINITY.md) | [`ไทย`](WHY_TRINITY_TH.md) +- **Origin story:** [`English`](docs/ORIGIN.md) | [`ไทย`](docs/ORIGIN_TH.md) +- **Ritual reference:** [`English`](docs/RITUALS.md) | [`ไทย`](docs/RITUALS_TH.md) +- **Getting started:** [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) +- **Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) +- **Storage taxonomy:** [`docs/STORAGE_TAXONOMY.md`](docs/STORAGE_TAXONOMY.md) +- **Version lineage:** [`English`](docs/VERSION_LINEAGE.md) | [`ไทย`](docs/VERSION_LINEAGE_TH.md) +- **GitHub-safe export:** [`docs/GITHUB_EXPORT.md`](docs/GITHUB_EXPORT.md) + +Operator guides: + +- [`docs/operator-guide-en/00_README.md`](docs/operator-guide-en/00_README.md) +- [`docs/operator-guide-th/00_README.md`](docs/operator-guide-th/00_README.md) + +Specs: + +- [`docs/specs/INDEX.md`](docs/specs/INDEX.md) +- [`docs/specs/00_BLUEPRINT.md`](docs/specs/00_BLUEPRINT.md) +- [`docs/specs/01_TOOL_CONTRACT.md`](docs/specs/01_TOOL_CONTRACT.md) + +--- + +## Layout + +```text +trinity_v2/ +├── AGENTS.md # Generic agent entrypoint +├── CLAUDE.md # Claude Code entrypoint +├── GEMINI.md # Gemini CLI entrypoint +├── WARP.md # Warp entrypoint +├── .ai/ # Trinity runtime +│ ├── cli/ # Python CLI kernel commands +│ ├── sessions/ # Session capsules +│ └── audit/ # Hash-chain audit log +└── docs/ + ├── specs/ # Canonical implementation specs and contracts + ├── operator-guide-en/ + └── operator-guide-th/ +``` + +--- + +## Version Lineage + +This repository previously contained earlier experimental Trinity Protocol +materials. From `v0.1.0` onward, the root tree is the canonical Trinity v2 +executable governance kernel. Legacy materials remain available through Git +history. + +Version story: + +```text +Trinity Protocol v2 = architecture / constitution generation +Runtime v0.1.0 = first public executable runtime line +Tool Contract = v1.0 freeze candidate, v1.1 draft working spec +``` + +See [`docs/VERSION_LINEAGE.md`](docs/VERSION_LINEAGE.md). + +--- + +## Memory CLI Note + +For the Trinity v0.1.0 ritual flow, `rrr` delegates to `memory-cli index`. +`memory-cli learn` appears in legacy/spec materials as a historical or +non-ritual memory surface and must not be used by `rrr`. + +--- + +## ภาษาไทย + +AI agent สามารถพูดได้ว่างานเสร็จแล้ว แต่ Trinity บังคับให้ต้องมีหลักฐาน + +Trinity คือ control layer แบบ CLI-first สำหรับงานที่ใช้ AI coding agent +มันไม่ได้แทน Claude Code, Codex, Cursor หรือ Gemini แต่ทำหน้าที่คุม scope, +เก็บ artifact, ตรวจ verifier, และบันทึก decision ให้ audit ย้อนหลังได้ + +หลักการหลัก: + +```text +ไม่มี artifact = ยังเชื่อไม่ได้ +ไม่มี verification = ยังถือว่างานไม่เสร็จ +ไม่มี authority = ห้ามข้าม state +``` + +อ่านต่อ: + +- [`WHY_TRINITY_TH.md`](WHY_TRINITY_TH.md) — ทำไมต้องมี Trinity +- [`docs/ORIGIN_TH.md`](docs/ORIGIN_TH.md) — ที่มาของ Trinity +- [`docs/RITUALS_TH.md`](docs/RITUALS_TH.md) — ritual reference +- [`docs/operator-guide-th/00_README.md`](docs/operator-guide-th/00_README.md) — คู่มือใช้งาน diff --git a/docs/benchmarks/token-economy/raw/curl_output_2026-05-19.md b/docs/benchmarks/token-economy/raw/curl_output_2026-05-19.md new file mode 100644 index 0000000..e0ef1bc --- /dev/null +++ b/docs/benchmarks/token-economy/raw/curl_output_2026-05-19.md @@ -0,0 +1,303 @@ +# Trinity Protocol + +Language: English | [ไทย](#ภาษาไทย) + +AI agents can claim work is done. Trinity makes them prove it. + +Trinity is a CLI-first control layer for AI coding agents. It coordinates +vendor AI harnesses, verifies their work, and records decisions as auditable +artifacts. + +Core rule: + +```text +No artifact = no trust. +No verification = no completion. +No authority = no transition. +``` + +--- + +## Why Trinity? + +AI coding agents are powerful, but their claims are not reliable evidence. + +They may say: + +- tests pass, but no test artifact exists +- a bug is fixed, but no reproduction was verified +- a deploy is safe, but no rollback path was recorded +- a file was changed correctly, but no diff was inspected + +Trinity turns AI-assisted work into an evidence-driven workflow: + +```text +Human intent + | + v +AI proposes / executes within scope + | + v +Trinity captures artifacts + | + v +Verifier checks evidence + | + v +Policy / Human decides promotion +``` + +Read the one-page explanation: + +- [`WHY_TRINITY.md`](WHY_TRINITY.md) +- [`WHY_TRINITY_TH.md`](WHY_TRINITY_TH.md) + +--- + +## 60-Second Example + +Before Trinity: + +```text +User: Fix the login bug. +Agent: Done. Tests pass. +``` + +Problem: there is no trustworthy evidence. + +After Trinity: + +```text +User: Fix the login bug. +Trinity requires: +1. a scoped plan +2. bounded execution +3. diff and test artifacts +4. verifier verdict +5. explicit promotion authority +``` + +If the agent cannot produce the artifact, the work cannot be promoted. + +--- + +## Current Status + +- Architecture generation: Trinity v2 +- Runtime release: v0.1.0 +- Tool Contract ABI: v1.0.0 stable; validation/examples tooling v1.0.2 +- Kernel CLI: verified v0.1.0 runtime included in this repository +- Release evidence: [`docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md`](docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md) + +Behavioral proof, not just test count: + +- State machine safety +- Gate enforcement +- Audit chain integrity +- Tool contract compliance +- Verifier verdict behavior +- Ritual command flow +- Human approval requirements for risky transitions + +Latest verified test evidence: + +```text +Source checkout: 1862 passed, 6 skipped +Clean export without optional sibling tools: 1860 passed, 8 skipped +``` + +--- + +## Tool Ecosystem + +Trinity separates the kernel, public ABI, and tools that implement the ABI. + +| Tool | Role | Status | Contract | Repo | +|---|---|---|---|---| +| Trinity Protocol | Kernel / governance runtime | v0.1.0 stable | consumes Tool Contract | this repo | +| Trinity Tool Contract | Stable ABI for CLI tools | v1.0.0 stable, v1.0.2 examples | v1.0 | [`postmunnet/trinity-tool-contract`](https://github.com/postmunnet/trinity-tool-contract) | +| browser-cli | Browser automation organ | v0.3.0 partial v1 envelope implementation | partial v1.0 | [`postmunnet/browser-cli`](https://github.com/postmunnet/browser-cli) | +| memory-cli | Artifact memory organ | planned | target v1.0 | planned | +| verify-cli | Verification organ | planned | target v1.0 | planned | +| retro-cli | Retrospective / memory handoff organ | planned | target v1.0 | planned | + +Canonical Tool Contract: + +- [`postmunnet/trinity-tool-contract`](https://github.com/postmunnet/trinity-tool-contract) +- pinned ABI: [`v1.0.0`](https://github.com/postmunnet/trinity-tool-contract/tree/v1.0.0) +- validation/examples tooling: [`v1.0.2`](https://github.com/postmunnet/trinity-tool-contract/releases/tag/v1.0.2) + +--- + +## Architecture + +```text +Human Owner + | + v +Trinity Control Layer + | + +-- Intent / Scope / Constraints + +-- Session capsule + state machine + +-- Bounded AI execution + +-- Artifact capture + +-- Verifier + policy gates + +-- Audit chain + | + v +Promotion only with evidence +``` + +Worker layer: + +```text +Claude Code / Codex / Cursor / Gemini + | + v +Vendor AI proposes and executes +``` + +Trinity does not replace the agent. Trinity governs the work. + +--- + +## Quickstart + +```bash +bash .ai/cli/ai status +bash .ai/cli/ai sss "Test Trinity with a small documentation task" +bash .ai/cli/ai vvv +bash .ai/cli/ai nnn +bash .ai/cli/ai gogogo +``` + +Run the CLI test suite: + +```bash +python3 -m pytest .ai/cli/tests -q +``` + +--- + +## Ritual Commands + +Rituals are the operator protocol. They are not the first thing to understand, +but they are how Trinity enforces the workflow once work begins. + +```text +sss -> vvv -> nnn -> gogogo -> ddd -> rrr -> close +``` + +| Ritual | Purpose | +|---|---| +| `sss` | Start a session capsule and initial state | +| `vvv` | Define goal, scope, constraints, acceptance, risk | +| `nnn` | Normalize into plan, steps, and artifacts | +| `gogogo` | Explicit execution gate | +| `ddd` | Inspect diff, damage, and scope creep | +| `rrr` | Retro and memory handoff through `memory-cli index` | +| `close` | Close the session with explicit final state | + +Reference: + +- [`docs/RITUALS.md`](docs/RITUALS.md) +- [`docs/RITUALS_TH.md`](docs/RITUALS_TH.md) + +--- + +## Documentation Map + +Start here: + +- **Why Trinity:** [`English`](WHY_TRINITY.md) | [`ไทย`](WHY_TRINITY_TH.md) +- **Origin story:** [`English`](docs/ORIGIN.md) | [`ไทย`](docs/ORIGIN_TH.md) +- **Ritual reference:** [`English`](docs/RITUALS.md) | [`ไทย`](docs/RITUALS_TH.md) +- **Getting started:** [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) +- **Architecture:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) +- **Storage taxonomy:** [`docs/STORAGE_TAXONOMY.md`](docs/STORAGE_TAXONOMY.md) +- **Version lineage:** [`English`](docs/VERSION_LINEAGE.md) | [`ไทย`](docs/VERSION_LINEAGE_TH.md) +- **GitHub-safe export:** [`docs/GITHUB_EXPORT.md`](docs/GITHUB_EXPORT.md) + +Operator guides: + +- [`docs/operator-guide-en/00_README.md`](docs/operator-guide-en/00_README.md) +- [`docs/operator-guide-th/00_README.md`](docs/operator-guide-th/00_README.md) + +Specs: + +- [`docs/specs/INDEX.md`](docs/specs/INDEX.md) +- [`docs/specs/00_BLUEPRINT.md`](docs/specs/00_BLUEPRINT.md) +- [`docs/specs/01_TOOL_CONTRACT.md`](docs/specs/01_TOOL_CONTRACT.md) redirects to [`postmunnet/trinity-tool-contract`](https://github.com/postmunnet/trinity-tool-contract) + +--- + +## Layout + +```text +trinity_v2/ +├── AGENTS.md # Generic agent entrypoint +├── CLAUDE.md # Claude Code entrypoint +├── GEMINI.md # Gemini CLI entrypoint +├── WARP.md # Warp entrypoint +├── .ai/ # Trinity runtime +│ ├── cli/ # Python CLI kernel commands +│ ├── sessions/ # Session capsules +│ └── audit/ # Hash-chain audit log +└── docs/ + ├── specs/ # Canonical implementation specs and contracts + ├── operator-guide-en/ + └── operator-guide-th/ +``` + +--- + +## Version Lineage + +This repository previously contained earlier experimental Trinity Protocol +materials. From `v0.1.0` onward, the root tree is the canonical Trinity v2 +executable governance kernel. Legacy materials remain available through Git +history. + +Version story: + +```text +Trinity Protocol v2 = architecture / constitution generation +Runtime v0.1.0 = first public executable runtime line +Tool Contract ABI = v1.0.0 stable; validation/examples tooling v1.0.2 +``` + +See [`docs/VERSION_LINEAGE.md`](docs/VERSION_LINEAGE.md). + +--- + +## Memory CLI Note + +For the Trinity v0.1.0 ritual flow, `rrr` delegates to `memory-cli index`. +`memory-cli learn` appears in legacy/spec materials as a historical or +non-ritual memory surface and must not be used by `rrr`. + +--- + +## ภาษาไทย + +AI agent สามารถพูดได้ว่างานเสร็จแล้ว แต่ Trinity บังคับให้ต้องมีหลักฐาน + +Trinity คือ control layer แบบ CLI-first สำหรับงานที่ใช้ AI coding agent +มันไม่ได้แทน Claude Code, Codex, Cursor หรือ Gemini แต่ทำหน้าที่คุม scope, +เก็บ artifact, ตรวจ verifier, และบันทึก decision ให้ audit ย้อนหลังได้ + +หลักการหลัก: + +```text +ไม่มี artifact = ยังเชื่อไม่ได้ +ไม่มี verification = ยังถือว่างานไม่เสร็จ +ไม่มี authority = ห้ามข้าม state +``` + +อ่านต่อ: + +- [`WHY_TRINITY_TH.md`](WHY_TRINITY_TH.md) — ทำไมต้องมี Trinity +- [`docs/ORIGIN_TH.md`](docs/ORIGIN_TH.md) — ที่มาของ Trinity +- [`docs/RITUALS_TH.md`](docs/RITUALS_TH.md) — ritual reference +- [`docs/operator-guide-th/00_README.md`](docs/operator-guide-th/00_README.md) — คู่มือใช้งาน +- [`postmunnet/trinity-tool-contract`](https://github.com/postmunnet/trinity-tool-contract) — Tool Contract ABI v1.0 diff --git a/docs/benchmarks/token-economy/raw/mcp_1_tabs_context.txt b/docs/benchmarks/token-economy/raw/mcp_1_tabs_context.txt new file mode 100644 index 0000000..91f9f03 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/mcp_1_tabs_context.txt @@ -0,0 +1,14 @@ +--- INPUT --- +{"createIfEmpty": true} + +--- OUTPUT (block 1, JSON) --- +{"availableTabs":[{"tabId":1679488603,"title":"New Tab","url":"chrome://newtab"}],"tabGroupId":2090782821} + +--- OUTPUT (block 2, formatted) --- + +Tab Context: +- Available tabs: + • tabId 1679488603: "New Tab" (chrome://newtab) + +--- SYSTEM REMINDER (chrome-MCP induced) --- +You used a single tool call this turn. Prefer browser_batch to execute multiple actions in one call — it is significantly faster. Batch your next sequence of clicks, types, navigations, and screenshots together. diff --git a/docs/benchmarks/token-economy/raw/mcp_2_navigate.txt b/docs/benchmarks/token-economy/raw/mcp_2_navigate.txt new file mode 100644 index 0000000..5c1d274 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/mcp_2_navigate.txt @@ -0,0 +1,15 @@ +--- INPUT --- +{"tabId": 1679488603, "url": "https://github.com/postmunnet/trinity-protocol"} + +--- OUTPUT (block 1) --- +Navigated to https://github.com/postmunnet/trinity-protocol + +--- OUTPUT (block 2, formatted) --- + +Tab Context: +- Executed on tabId: 1679488603 +- Available tabs: + • tabId 1679488603: "New tabExtend" (chrome://newtab/) + +--- SYSTEM REMINDER --- +You used a single tool call this turn. Prefer browser_batch to execute multiple actions in one call — it is significantly faster. Batch your next sequence of clicks, types, navigations, and screenshots together. diff --git a/docs/benchmarks/token-economy/raw/mcp_3_get_page_text.txt b/docs/benchmarks/token-economy/raw/mcp_3_get_page_text.txt new file mode 100644 index 0000000..208fe94 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/mcp_3_get_page_text.txt @@ -0,0 +1,232 @@ +--- INPUT --- +{"tabId": 1679488603} + +--- OUTPUT (block 1, extracted text) --- +Title: postmunnet/trinity-protocol: AI agents can claim work is done. Trinity makes them prove it. CLI-first control layer for AI coding agents with evidence-driven verification, scoped execution, and auditable promotion. +URL: https://github.com/postmunnet/trinity-protocol +Source element:
+--- +Trinity Protocol + +Language: English | ไทย + +AI agents can claim work is done. Trinity makes them prove it. + +Trinity is a CLI-first control layer for AI coding agents. It coordinates vendor AI harnesses, verifies their work, and records decisions as auditable artifacts. + +Core rule: + +No artifact = no trust. +No verification = no completion. +No authority = no transition. + +Why Trinity? + +AI coding agents are powerful, but their claims are not reliable evidence. + +They may say: + +tests pass, but no test artifact exists +a bug is fixed, but no reproduction was verified +a deploy is safe, but no rollback path was recorded +a file was changed correctly, but no diff was inspected + +Trinity turns AI-assisted work into an evidence-driven workflow: + +Human intent +| +v +AI proposes / executes within scope +| +v +Trinity captures artifacts +| +v +Verifier checks evidence +| +v +Policy / Human decides promotion + +Read the one-page explanation: + +WHY_TRINITY.md +WHY_TRINITY_TH.md +60-Second Example + +Before Trinity: + +User: Fix the login bug. +Agent: Done. Tests pass. + +Problem: there is no trustworthy evidence. + +After Trinity: + +User: Fix the login bug. +Trinity requires: +1. a scoped plan +2. bounded execution +3. diff and test artifacts +4. verifier verdict +5. explicit promotion authority + +If the agent cannot produce the artifact, the work cannot be promoted. + +Current Status +Architecture generation: Trinity v2 +Runtime release: v0.1.0 +Public Tool Contract: v1.0 freeze candidate; working spec is v1.1.0-draft +Kernel CLI: verified v0.1.0 runtime included in this repository +Release evidence: docs/releases/TRINITY_V0_1_0_RELEASE_EVIDENCE.md + +Behavioral proof, not just test count: + +State machine safety +Gate enforcement +Audit chain integrity +Tool contract compliance +Verifier verdict behavior +Ritual command flow +Human approval requirements for risky transitions + +Latest verified test evidence: + +Source checkout: 1862 passed, 6 skipped +Clean export without optional sibling tools: 1860 passed, 8 skipped + +Architecture +Human Owner +| +v +Trinity Control Layer +| ++-- Intent / Scope / Constraints ++-- Session capsule + state machine ++-- Bounded AI execution ++-- Artifact capture ++-- Verifier + policy gates ++-- Audit chain +| +v +Promotion only with evidence + +Worker layer: + +Claude Code / Codex / Cursor / Gemini +| +v +Vendor AI proposes and executes + +Trinity does not replace the agent. Trinity governs the work. + +Quickstart +bash .ai/cli/ai status +bash .ai/cli/ai sss "Test Trinity with a small documentation task" +bash .ai/cli/ai vvv +bash .ai/cli/ai nnn +bash .ai/cli/ai gogogo + +Run the CLI test suite: + +python3 -m pytest .ai/cli/tests -q +Ritual Commands + +Rituals are the operator protocol. They are not the first thing to understand, but they are how Trinity enforces the workflow once work begins. + +sss -> vvv -> nnn -> gogogo -> ddd -> rrr -> close + +Ritual Purpose +sss Start a session capsule and initial state +vvv Define goal, scope, constraints, acceptance, risk +nnn Normalize into plan, steps, and artifacts +gogogo Explicit execution gate +ddd Inspect diff, damage, and scope creep +rrr Retro and memory handoff through memory-cli index +close Close the session with explicit final state + +Reference: + +docs/RITUALS.md +docs/RITUALS_TH.md +Documentation Map + +Start here: + +Why Trinity: English | ไทย +Origin story: English | ไทย +Ritual reference: English | ไทย +Getting started: docs/GETTING_STARTED.md +Architecture: docs/ARCHITECTURE.md +Storage taxonomy: docs/STORAGE_TAXONOMY.md +Version lineage: English | ไทย +GitHub-safe export: docs/GITHUB_EXPORT.md + +Operator guides: + +docs/operator-guide-en/00_README.md +docs/operator-guide-th/00_README.md + +Specs: + +docs/specs/INDEX.md +docs/specs/00_BLUEPRINT.md +docs/specs/01_TOOL_CONTRACT.md +Layout +trinity_v2/ +├── AGENTS.md # Generic agent entrypoint +├── CLAUDE.md # Claude Code entrypoint +├── GEMINI.md # Gemini CLI entrypoint +├── WARP.md # Warp entrypoint +├── .ai/ # Trinity runtime +│ ├── cli/ # Python CLI kernel commands +│ ├── sessions/ # Session capsules +│ └── audit/ # Hash-chain audit log +└── docs/ +├── specs/ # Canonical implementation specs and contracts +├── operator-guide-en/ +└── operator-guide-th/ + +Version Lineage + +This repository previously contained earlier experimental Trinity Protocol materials. From v0.1.0 onward, the root tree is the canonical Trinity v2 executable governance kernel. Legacy materials remain available through Git history. + +Version story: + +Trinity Protocol v2 = architecture / constitution generation +Runtime story: + +Trinity Protocol v2 = architecture / constitution generation +Runtime v0.1.0 = first public executable runtime line +Tool Contract = v1.0 freeze candidate, v1.1 draft working spec + +See docs/VERSION_LINEAGE.md. + +Memory CLI Note + +For the Trinity v0.1.0 ritual flow, rrr delegates to memory-cli index. memory-cli learn appears in legacy/spec materials as a historical or non-ritual memory surface and must not be used by rrr. + +ภาษาไทย + +AI agent สามารถพูดได้ว่างานเสร็จแล้ว แต่ Trinity บังคับให้ต้องมีหลักฐาน + +Trinity คือ control layer แบบ CLI-first สำหรับงานที่ใช้ AI coding agent มันไม่ได้แทน Claude Code, Codex, Cursor หรือ Gemini แต่ทำหน้าที่คุม scope, เก็บ artifact, ตรวจ verifier, และบันทึก decision ให้ audit ย้อนหลังได้ + +หลักการหลัก: + +ไม่มี artifact = ยังเชื่อไม่ได้ +ไม่มี verification = ยังถือว่างานไม่เสร็จ +ไม่มี authority = ห้ามข้าม state + +อ่านต่อ: + +WHY_TRINITY_TH.md — ทำไมต้องมี Trinity +docs/ORIGIN_TH.md — ที่มาของ Trinity +docs/RITUALS_TH.md — ritual reference +docs/operator-guide-th/00_README.md — คู่มือใช้งาน + +--- OUTPUT (block 2, tab context) --- + +Tab Context: +- Executed on tabId: 1679488603 +- Available tabs: + • tabId 1679488603: "postmunnet/trinity-protocol: AI agents can claim work is done. Trinity makes them prove it. CLI-first control layer for AI coding agents with evidence-driven verification, scoped execution, and auditable promotion." (https://github.com/postmunnet/trinity-protocol) diff --git a/docs/benchmarks/token-economy/raw/mcp_schemas_used.json b/docs/benchmarks/token-economy/raw/mcp_schemas_used.json new file mode 100644 index 0000000..b05cd52 --- /dev/null +++ b/docs/benchmarks/token-economy/raw/mcp_schemas_used.json @@ -0,0 +1,48 @@ +[ + { + "name": "mcp__claude-in-chrome__tabs_context_mcp", + "description": "Get context information about the current MCP tab group. Returns all tab IDs inside the group if it exists. CRITICAL: You must get the context at least once before using other browser automation tools so you know what tabs exist. Each new conversation should create its own new tab (using tabs_create_mcp) rather than reusing existing tabs, unless the user explicitly asks to use an existing tab.", + "parameters": { + "properties": { + "createIfEmpty": { + "description": "Creates a new MCP tab group if none exists, creates a new Window with a new tab group containing an empty tab (which can be used for this conversation). If a MCP tab group already exists, this parameter has no effect.", + "type": "boolean" + } + }, + "required": [], + "type": "object" + } + }, + { + "name": "mcp__claude-in-chrome__navigate", + "description": "Navigate to a URL, or go forward/back in browser history. If you don't have a valid tab ID, use tabs_context_mcp first to get available tabs.", + "parameters": { + "properties": { + "tabId": { + "description": "Tab ID to navigate. Must be a tab in the current group. Use tabs_context_mcp first if you don't have a valid tab ID.", + "type": "number" + }, + "url": { + "description": "The URL to navigate to. Can be provided with or without protocol (defaults to https://). Use \"forward\" to go forward in history or \"back\" to go back in history.", + "type": "string" + } + }, + "required": ["url", "tabId"], + "type": "object" + } + }, + { + "name": "mcp__claude-in-chrome__get_page_text", + "description": "Extract raw text content from the page, prioritizing article content. Ideal for reading articles, blog posts, or other text-heavy pages. Returns plain text without HTML formatting. If you don't have a valid tab ID, use tabs_context_mcp first to get available tabs.", + "parameters": { + "properties": { + "tabId": { + "description": "Tab ID to extract text from. Must be a tab in the current group. Use tabs_context_mcp first if you don't have a valid tab ID.", + "type": "number" + } + }, + "required": ["tabId"], + "type": "object" + } + } +] diff --git a/docs/contracts/browser-cli/USE_CASE_ROUTING.md b/docs/contracts/browser-cli/USE_CASE_ROUTING.md new file mode 100644 index 0000000..622bddc --- /dev/null +++ b/docs/contracts/browser-cli/USE_CASE_ROUTING.md @@ -0,0 +1,117 @@ +# Use-Case Routing — `curl` vs `browser-cli` vs MCP + +> Companion to [`docs/benchmarks/token-economy/`](../../benchmarks/token-economy/). +> The benchmark proves the token math; this doc explains *which tool fits +> which job* so the comparison isn't reductive ("just use curl"). + +`curl` is a primitive. `browser-cli` is a governed, observable, auth-aware +Playwright REPL. They occupy different layers — Trinity routes work to the +right layer. + +--- + +## The matrix + +| Situation | Right tool | Why | +|---|---|---| +| Raw markdown / JSON / RSS / OpenGraph (content-addressable URLs) | `curl` *or* `browser-cli fetch` | No JS, no auth — the most primitive tool wins on tokens and latency | +| Auth-gated raw endpoint (logged-in API, admin JSON) | `browser-cli fetch` (CDP mode) | Inherits cookies + CSRF from the user's browser context | +| Static HTML page (Wikipedia, blog, news article) | `curl` + text extract | Rendering unnecessary; HTML is already complete | +| JS-rendered SPA (React/Vue dashboard) | `browser-cli goto + text` | `curl` only gets the shell HTML — content is hydrated client-side | +| Post-JS DOM state verification | `browser-cli` | `curl` cannot see DOM state after JavaScript runs | +| Human + AI hybrid workflow (demo → replay) | `browser-cli` recorder | `curl` has no session, no bidirectional action log | +| Verifier needs an action trace | `browser-cli` action-logger | `curl` is a blackbox — no provenance, no audit | +| Policy-gated admin actions (`force-*` commands) | `browser-cli` policy tier | `curl` has no governance layer — every action is "tier high" by default | + +--- + +## What `curl` actually is + +`curl` is the canonical Unix primitive for "send an HTTP request and +print the body." 50 years of refinement. Zero state. Zero auth context. +Zero provenance. Zero clicks. + +For content-addressable resources (raw GitHub files, JSON APIs without +auth, RSS feeds, OpenGraph metadata), `curl` is *unbeatable* on tokens, +latency, and operational simplicity. The token-economy benchmark +confirms this for the GitHub README case. + +When you can use `curl`, use `curl`. + +## What `browser-cli` actually is + +`browser-cli` is a **governed, observable, auth-aware Playwright REPL** +exposed via stdin/stdout JSON. From the Trinity Tool Contract: + +- **Action provenance** — every command is logged to NDJSON with + `agent_name`, `verb`, `args`, `url`, `via`, `ts`. The verifier organ + can answer "what did the AI actually do to this page?" — `curl` + cannot. +- **Bidirectional recorder** — human clicks and AI commands are written + to the same log, enabling "human demos → AI replays → verifier diffs" + workflows. +- **CDP mode** — connect to a Chrome instance the user already has open. + Cookies, extensions, and active sessions are inherited. The 50 lines + of cookie-jar + CSRF management you'd write around `curl` collapse to + one flag. +- **Policy tiers** — `safe` / `medium` / `high`. `force-*` commands are + gated by the dispatch policy, not the agent's judgment. +- **YAML helpers** — composable, reusable workflows (`helpers/payment-check.yml`). +- **Built-in assertions** — `assert-text`, `assert-visible`, `assert-enabled` + give you a verifier organ at the browser layer. + +The token-economy benchmark shows `browser-cli` matches `curl` on cost +for a public README. The reason to reach for it is not cost — it's +**the four capabilities `curl` cannot provide**: auth context, JS state, +action provenance, governance. + +## What MCP browser tools are not + +MCP chrome tools occupy the same conceptual layer as `browser-cli` — +they automate a browser — but the protocol charges: + +- **Schema tax per task** (~600 tokens for 3 tools in deferred mode; + ~6,000+ in eager mode for the full chrome MCP). +- **Round-trip count** — every state read or write is a separate tool + call; no native compound primitive. +- **Redundant Tab-Context echo** on every call. +- **No action provenance** at the protocol level — the trace lives in + the agent's conversation log, not in a queryable artifact the + verifier organ can inspect. + +MCP can match `browser-cli` on raw browser capability. It cannot match +on **the structural moats** (single-pass invocation, append-only audit +log, content-addressable shortcuts, policy tiers as protocol — not as +agent self-discipline). + +--- + +## How Trinity routes work between them + +The kernel does not own this decision today — the agent picks. The +guidance: + +``` +if URL is content-addressable AND no auth needed: + use curl +elif page is JS-rendered OR auth is required OR action trace matters: + use browser-cli +else: + use curl (defaults to the cheapest abstraction available) +``` + +A future `read --prefer auto` meta-verb in `browser-cli` will +internalize this routing so that **`browser-cli` always uses the cheapest +abstraction available** — short-circuiting to HTTP when the URL is +content-addressable, falling back to Playwright when it isn't. That +collapses the agent-side decision tree to one verb, and turns the moat +into runnable code instead of a routing convention. + +--- + +## See also + +- [`docs/benchmarks/token-economy/`](../../benchmarks/token-economy/) — measurements +- [`docs/contracts/browser-cli/COMMAND_CONTRACT.md`](COMMAND_CONTRACT.md) — full verb inventory +- [`docs/contracts/browser-cli/POLICY_TIERS.md`](POLICY_TIERS.md) — tier mapping +- [`docs/contracts/browser-cli/AI_AGENT_GUIDE.md`](AI_AGENT_GUIDE.md) — agent invocation patterns