docs(evaluate): Benchmarks page — selecting and running benchmarks

Part of #1436 — Evaluate section restructure.

The current evaluation section has a Browse Environments page (`environment-list.mdx`) that lists all 90+ environments but does not orient readers to how to select a benchmark for evaluation, what makes a benchmark trustworthy, or how to interpret benchmark results.

## Tasks

- Create or significantly revise to produce `fern/versions/latest/pages/evaluation/benchmarks.mdx`
- Cover:
  - What a benchmark is vs a training environment (evaluation-only vs dual-use)
  - How to select a benchmark for a target capability
  - Brief overview of benchmark categories (math, coding, reasoning, agentic, safety)
  - Link to the Browse Environments page for the full list
  - Link to Add a Benchmark (Contribute) for contributors
- Assess whether `environment-list.mdx` should remain as a sibling page or be merged/linked
- Add navigation card in evaluation index
- `fern check` passes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(evaluate): Benchmarks page — selecting and running benchmarks #1761

Tasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

docs(evaluate): Benchmarks page — selecting and running benchmarks #1761

Description

Tasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions