Skip to content

Scale Studio run-list/detail to thousands of runs (entire-style indexed reads) #1259

@christso

Description

@christso

Tracking issue for scaling Studio's eval-run display. Inspired by entireio/cli's "single ref + tree-reads" architecture.

Today's bottleneck

listResultFilesFromRunsDir (apps/cli/src/commands/inspect/utils.ts:572) walks the runs directory with readdir + statSync + loadResultFile for every run, called via /api/runs every 5s by Studio. Cost is O(N runs × per-manifest read) per refresh. Stalls at hundreds of runs; falls over at thousands.

Sub-tasks

  • P1: Append-only run index in results repo (~2d)

    • Write index/runs.jsonl on every push in directPushResults (packages/core/src/evaluation/results-repo.ts:407).
    • Each row: {run_id, timestamp, experiment, target, test_count, passed, pass_rate, avg_score, tags, sha}.
    • Studio list view reads ONE file instead of N manifest reads.
    • Ship a agentv results reindex CLI to backfill existing repos.
  • P2: Cache remote runs server-side, invalidate on manual sync (~0.5d)

    • Keep /api/runs merged (clean URL — source stays per-row metadata for the badge, not a URL concern).
    • Server: cache the remote portion of listMergedResultFiles in memory; invalidate only on POST /api/remote/sync.
    • Local portion stays computed fresh per request (in-flight runs need freshness).
    • Stops per-poll readdir/git ls-tree of the remote cache (currently happens 12x/min for no reason).
  • P3: Read remote runs via git ls-tree + git cat-file, not working-tree readdir (~2d)

    • Drop git checkout + git pull --ff-only from updateCacheRepo (results-repo.ts:167) — just git fetch origin --prune.
    • New listResultFilesFromGitTree(repoDir, treePath) sibling to listResultFilesFromRunsDir.
    • File-content endpoints in apps/cli/src/commands/results/serve.ts swap readFileSync for git cat-file -p when source is remote.
    • Pairs naturally with the existing --filter=blob:none clone (results-repo.ts:191) — blobs are fetched only when a detail view opens.
  • P4: Pagination/cursor on /api/runs (~1d)

    • Plumb existing limit? param through /api/runs?limit=50&cursor=<run_id>.
    • Studio switches to useInfiniteQuery (apps/studio/src/lib/api.ts:62).
    • Sentinel-row infinite scroll in RunList.tsx.
    • Trivial after P1 (cursor = byte offset into index file or last run_id seen).
  • P5: Zero-config same-repo mode (~5d) — strategic, defer until P1-P3 land

    • When results is not configured, write run artifacts as commits on refs/agentv/runs/v1 in the source repo (not under refs/heads/ — keeps the ref out of default git fetch/git push, git log, git branch, and clone bloat).
    • Borrows entire's pattern; the non-refs/heads/ namespace fixes entire's latent clone-bloat problem.
    • Studio reads from local ref via go-git equivalent.
    • Promotion path: agentv results promote --to <org/repo> copies the local ref history to a new separate repo when users outgrow solo mode.
  • P6: Agentv-Run: <run-id> commit trailer (~2d)

    • Mirrors entire's Entire-Checkpoint: trailer.
    • At run start, record git rev-parse HEAD into the manifest.
    • agentv results link adds the trailer post-hoc on the source commit.
    • Studio RunDetail deep-links to the source commit via results.repo.

Recommended sequence

P1 + P2 + P4 first (~3.5d total) → unlocks "thousands of runs without UI lag", which is the original goal.

Then P3 (~2d, cleaner internals, pairs with P1 to make all reads object-DB-only).

Then P6 (~2d, low priority but well-bounded).

P5 last (~5d) — it's a UX revolution, not a scale fix.

Background

The premise question: "should Studio read from a cloned copy or from GitHub directly?" Answer: cloned copy is correct, agentv already does this. The fix is making the local reads cheap, not changing data location. Reading from GitHub Contents API would hit rate limits and slow Monaco file-tree views to a crawl.

References

  • entire pattern: entire/checkpoints/v1 ref, sharded <id[:2]>/<id[2:]>/ paths (entireio/cli docs/architecture/sessions-and-checkpoints.md).
  • Today's list path: apps/cli/src/commands/inspect/utils.ts:572, apps/cli/src/commands/results/remote.ts:166.
  • Today's write path: packages/core/src/evaluation/results-repo.ts:407.
  • Studio API surface: apps/studio/src/lib/api.ts:62.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions