You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for scaling Studio's eval-run display. Inspired by entireio/cli's "single ref + tree-reads" architecture.
Today's bottleneck
listResultFilesFromRunsDir (apps/cli/src/commands/inspect/utils.ts:572) walks the runs directory with readdir + statSync + loadResultFile for every run, called via /api/runs every 5s by Studio. Cost is O(N runs × per-manifest read) per refresh. Stalls at hundreds of runs; falls over at thousands.
Sub-tasks
P1: Append-only run index in results repo (~2d)
Write index/runs.jsonl on every push in directPushResults (packages/core/src/evaluation/results-repo.ts:407).
Studio list view reads ONE file instead of N manifest reads.
Ship a agentv results reindex CLI to backfill existing repos.
P2: Cache remote runs server-side, invalidate on manual sync (~0.5d)
Keep /api/runs merged (clean URL — source stays per-row metadata for the badge, not a URL concern).
Server: cache the remote portion of listMergedResultFiles in memory; invalidate only on POST /api/remote/sync.
Local portion stays computed fresh per request (in-flight runs need freshness).
Stops per-poll readdir/git ls-tree of the remote cache (currently happens 12x/min for no reason).
P3: Read remote runs via git ls-tree + git cat-file, not working-tree readdir (~2d)
Drop git checkout + git pull --ff-only from updateCacheRepo (results-repo.ts:167) — just git fetch origin --prune.
New listResultFilesFromGitTree(repoDir, treePath) sibling to listResultFilesFromRunsDir.
File-content endpoints in apps/cli/src/commands/results/serve.ts swap readFileSync for git cat-file -p when source is remote.
Pairs naturally with the existing --filter=blob:none clone (results-repo.ts:191) — blobs are fetched only when a detail view opens.
P4: Pagination/cursor on /api/runs (~1d)
Plumb existing limit? param through /api/runs?limit=50&cursor=<run_id>.
Studio switches to useInfiniteQuery (apps/studio/src/lib/api.ts:62).
Sentinel-row infinite scroll in RunList.tsx.
Trivial after P1 (cursor = byte offset into index file or last run_id seen).
P5: Zero-config same-repo mode (~5d) — strategic, defer until P1-P3 land
When results is not configured, write run artifacts as commits on refs/agentv/runs/v1 in the source repo (not under refs/heads/ — keeps the ref out of default git fetch/git push, git log, git branch, and clone bloat).
Studio reads from local ref via go-git equivalent.
Promotion path: agentv results promote --to <org/repo> copies the local ref history to a new separate repo when users outgrow solo mode.
P6: Agentv-Run: <run-id> commit trailer (~2d)
Mirrors entire's Entire-Checkpoint: trailer.
At run start, record git rev-parse HEAD into the manifest.
agentv results link adds the trailer post-hoc on the source commit.
Studio RunDetail deep-links to the source commit via results.repo.
Recommended sequence
P1 + P2 + P4 first (~3.5d total) → unlocks "thousands of runs without UI lag", which is the original goal.
Then P3 (~2d, cleaner internals, pairs with P1 to make all reads object-DB-only).
Then P6 (~2d, low priority but well-bounded).
P5 last (~5d) — it's a UX revolution, not a scale fix.
Background
The premise question: "should Studio read from a cloned copy or from GitHub directly?" Answer: cloned copy is correct, agentv already does this. The fix is making the local reads cheap, not changing data location. Reading from GitHub Contents API would hit rate limits and slow Monaco file-tree views to a crawl.
Tracking issue for scaling Studio's eval-run display. Inspired by entireio/cli's "single ref + tree-reads" architecture.
Today's bottleneck
listResultFilesFromRunsDir(apps/cli/src/commands/inspect/utils.ts:572) walks the runs directory withreaddir+statSync+loadResultFilefor every run, called via/api/runsevery 5s by Studio. Cost is O(N runs × per-manifest read) per refresh. Stalls at hundreds of runs; falls over at thousands.Sub-tasks
P1: Append-only run index in results repo (~2d)
index/runs.jsonlon every push indirectPushResults(packages/core/src/evaluation/results-repo.ts:407).{run_id, timestamp, experiment, target, test_count, passed, pass_rate, avg_score, tags, sha}.agentv results reindexCLI to backfill existing repos.P2: Cache remote runs server-side, invalidate on manual sync (~0.5d)
/api/runsmerged (clean URL —sourcestays per-row metadata for the badge, not a URL concern).listMergedResultFilesin memory; invalidate only onPOST /api/remote/sync.git ls-treeof the remote cache (currently happens 12x/min for no reason).P3: Read remote runs via
git ls-tree+git cat-file, not working-tree readdir (~2d)git checkout+git pull --ff-onlyfromupdateCacheRepo(results-repo.ts:167) — justgit fetch origin --prune.listResultFilesFromGitTree(repoDir, treePath)sibling tolistResultFilesFromRunsDir.apps/cli/src/commands/results/serve.tsswapreadFileSyncforgit cat-file -pwhen source is remote.--filter=blob:noneclone (results-repo.ts:191) — blobs are fetched only when a detail view opens.P4: Pagination/cursor on
/api/runs(~1d)limit?param through/api/runs?limit=50&cursor=<run_id>.useInfiniteQuery(apps/studio/src/lib/api.ts:62).RunList.tsx.run_idseen).P5: Zero-config same-repo mode (~5d) — strategic, defer until P1-P3 land
resultsis not configured, write run artifacts as commits onrefs/agentv/runs/v1in the source repo (not underrefs/heads/— keeps the ref out of defaultgit fetch/git push,git log,git branch, and clone bloat).refs/heads/namespace fixes entire's latent clone-bloat problem.agentv results promote --to <org/repo>copies the local ref history to a new separate repo when users outgrow solo mode.P6:
Agentv-Run: <run-id>commit trailer (~2d)Entire-Checkpoint:trailer.git rev-parse HEADinto the manifest.agentv results linkadds the trailer post-hoc on the source commit.results.repo.Recommended sequence
P1 + P2 + P4 first (~3.5d total) → unlocks "thousands of runs without UI lag", which is the original goal.
Then P3 (~2d, cleaner internals, pairs with P1 to make all reads object-DB-only).
Then P6 (~2d, low priority but well-bounded).
P5 last (~5d) — it's a UX revolution, not a scale fix.
Background
The premise question: "should Studio read from a cloned copy or from GitHub directly?" Answer: cloned copy is correct, agentv already does this. The fix is making the local reads cheap, not changing data location. Reading from GitHub Contents API would hit rate limits and slow Monaco file-tree views to a crawl.
References
entire/checkpoints/v1ref, sharded<id[:2]>/<id[2:]>/paths (entireio/clidocs/architecture/sessions-and-checkpoints.md).apps/cli/src/commands/inspect/utils.ts:572,apps/cli/src/commands/results/remote.ts:166.packages/core/src/evaluation/results-repo.ts:407.apps/studio/src/lib/api.ts:62.