Refresh harness leaderboard snapshots by evanatpizzarobot · Pull Request #40 · RipperMercs/tensorfeed

evanatpizzarobot · 2026-06-15T16:21:23Z

Summary

lastUpdated bumped to 2026-06-15 in both data/harnesses.json and worker/src/harnesses.ts (in sync, byte-for-byte on the data fields).
One-line TODO appended to the note field: Roo Code shut down on 2026-05-15 and Kilo Code (kilocode.ai) is the official successor. A human needs to decide whether to add Kilo Code and update src/lib/harness-directory.ts.
No benchmark scores were changed this run (see Access Issues below).

Access Issues Encountered

The three canonical upstream sources blocked automated fetching:

swebench.com (SWE-bench Verified): HTTP 403
tbench.ai (Terminal-Bench): HTTP 403
aider.chat/docs/leaderboards/ (Aider Polyglot): HTTP 403

More than 15 third-party aggregator and mirror sites also returned 403. The one source that was accessible was the Aider Polyglot YAML on GitHub (Aider-AI/aider raw file), but its most recent entries date to October 2025 and do not include the model names tracked in our data (Claude Opus 4.7, GPT-5.5, DeepSeek V4 Pro). Scores for those entries cannot be updated until the Aider team publishes results for those models.

SWE-Lancer: The original benchmark repository (openai/SWELancer-Benchmark) has been archived and merged into openai/preparedness. No new public leaderboard with per-agent scores was found.

Scores Changed

None. Per-entry scores are unchanged from the 2026-04-30 snapshot.

TODOs Added to Note

roo-code shut down 2026-05-15; Kilo Code (kilocode.ai) is the official successor and should be evaluated for addition to harness-directory.ts after editorial review.

Test Plan

npx tsc --noEmit passes at repo root (after npm install)
cd worker && npx tsc --noEmit passes (after npm install)
Human: confirm Kilo Code editorial decision and re-run this routine once upstream leaderboard sites become accessible again

https://claude.ai/code/session_01SVZWBi9UTPJ1VrrBUnKda9

Generated by Claude Code

Updated lastUpdated to 2026-06-15. Benchmark scores unchanged: official upstream sites (swebench.com, tbench.ai, aider.chat) returned HTTP 403 during this run so per-entry scores could not be re-verified. The Aider Polyglot YAML on GitHub is current only through October 2025 and does not yet include our tracked model names (Claude Opus 4.7, GPT-5.5, DeepSeek V4 Pro). Added a one-line TODO in note for Kilo Code, the official successor to roo-code which shut down 2026-05-15. https://claude.ai/code/session_01SVZWBi9UTPJ1VrrBUnKda9

cloudflare-workers-and-pages · 2026-06-15T16:28:39Z

Deploying tensorfeed with Cloudflare Pages

Latest commit:	`5a67f9a`
Status:	✅ Deploy successful!
Preview URL:	https://c3cd488e.tensorfeed.pages.dev
Branch Preview URL:	https://harness-refresh-2026-06-15.tensorfeed.pages.dev

View logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh harness leaderboard snapshots#40

Refresh harness leaderboard snapshots#40
evanatpizzarobot wants to merge 1 commit into
mainfrom
harness-refresh-2026-06-15

evanatpizzarobot commented Jun 15, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

evanatpizzarobot commented Jun 15, 2026

Summary

Access Issues Encountered

Scores Changed

TODOs Added to Note

Test Plan

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 15, 2026

Deploying tensorfeed with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants