Skip to content

docs(evaluate): Benchmarks page — selecting and running benchmarks #1761

Description

@sephmard

Part of #1436 — Evaluate section restructure.

The current evaluation section has a Browse Environments page (environment-list.mdx) that lists all 90+ environments but does not orient readers to how to select a benchmark for evaluation, what makes a benchmark trustworthy, or how to interpret benchmark results.

Tasks

  • Create or significantly revise to produce fern/versions/latest/pages/evaluation/benchmarks.mdx
  • Cover:
    • What a benchmark is vs a training environment (evaluation-only vs dual-use)
    • How to select a benchmark for a target capability
    • Brief overview of benchmark categories (math, coding, reasoning, agentic, safety)
    • Link to the Browse Environments page for the full list
    • Link to Add a Benchmark (Contribute) for contributors
  • Assess whether environment-list.mdx should remain as a sibling page or be merged/linked
  • Add navigation card in evaluation index
  • fern check passes

Metadata

Metadata

Assignees

Labels

documentationImprovements to documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions