Skip to content

Consolidate duplicate benchmark directories (benchmark/ -> benchmarks/)#261

Merged
liana313 merged 1 commit into
mainfrom
consolidate-benchmark-dirs
Jun 13, 2026
Merged

Consolidate duplicate benchmark directories (benchmark/ -> benchmarks/)#261
liana313 merged 1 commit into
mainfrom
consolidate-benchmark-dirs

Conversation

@liana313

Copy link
Copy Markdown
Collaborator

Problem

After #219 merged, the repo had two benchmark directories:

Fix

Move the two new suites into the pre-existing benchmarks/:

  • benchmark/biodexbenchmarks/biodex
  • benchmark/rerankingbenchmarks/reranking

benchmark/ is removed. All 10 files move as git renames (history preserved).

Safety

  • No subdirectory name collisions between the two folders.
  • No code or docs reference the singular benchmark/ path.
  • The biodex/reranking scripts use sibling imports (e.g. from metrics import ...) that resolve within their own folder, unaffected by the parent move.
  • benchmarks/main.py only registers its existing subcommands; the moved suites are run standalone per their READMEs.

🤖 Generated with Claude Code

PR #219 added a singular benchmark/ folder (biodex, reranking) alongside the
pre-existing plural benchmarks/ (failure_mode_discovery, llm_as_judge,
rag_pubmedqa). Move the two new suites under benchmarks/ so there is a single
benchmark directory. No subdir name collisions; the biodex/reranking scripts
use sibling imports unaffected by the parent move, and benchmarks/main.py only
registers its existing subcommands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@liana313 liana313 merged commit 228a1a5 into main Jun 13, 2026
9 checks passed
@liana313 liana313 deleted the consolidate-benchmark-dirs branch June 13, 2026 19:16
@liana313 liana313 mentioned this pull request Jun 13, 2026
liana313 added a commit that referenced this pull request Jun 13, 2026
Release **v1.2.2**. Bumps `pyproject.toml` 1.2.1 → 1.2.2 and regenerates
`uv.lock` to match (so the locked-constraints CI step stays green).

Notable changes shipping in this release since 1.2.1:
- **#260** — gpt-5 / reasoning-model accuracy fix: model-aware
`max_tokens` default + truncation warning (closes #255)
- **#262** — fix flaky `test_pairwise_judge`
- **#261** — consolidate duplicate benchmark directories
- **#219** — Biodex + reranking benchmark suites (resolves #227)

On merge I'll tag `v1.2.2` off `main`, which triggers `publish.yml` to
build and publish to PyPI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant