Part of #1436 — Evaluate section restructure.
The current evaluation section has a Browse Environments page (environment-list.mdx) that lists all 90+ environments but does not orient readers to how to select a benchmark for evaluation, what makes a benchmark trustworthy, or how to interpret benchmark results.
Tasks
- Create or significantly revise to produce
fern/versions/latest/pages/evaluation/benchmarks.mdx
- Cover:
- What a benchmark is vs a training environment (evaluation-only vs dual-use)
- How to select a benchmark for a target capability
- Brief overview of benchmark categories (math, coding, reasoning, agentic, safety)
- Link to the Browse Environments page for the full list
- Link to Add a Benchmark (Contribute) for contributors
- Assess whether
environment-list.mdx should remain as a sibling page or be merged/linked
- Add navigation card in evaluation index
fern check passes
Part of #1436 — Evaluate section restructure.
The current evaluation section has a Browse Environments page (
environment-list.mdx) that lists all 90+ environments but does not orient readers to how to select a benchmark for evaluation, what makes a benchmark trustworthy, or how to interpret benchmark results.Tasks
fern/versions/latest/pages/evaluation/benchmarks.mdxenvironment-list.mdxshould remain as a sibling page or be merged/linkedfern checkpasses