Skip to content

docs(evaluate): Ablations and iterations page — comparing runs systematically #1762

Description

@sephmard

Part of #1436 — Evaluate section restructure.

There is no page explaining how to structure ablation studies or model iteration comparisons using NeMo Gym — a key use case for the Frontier Eval workflow.

Tasks

  • Create fern/versions/latest/pages/evaluation/ablations.mdx
  • Cover:
    • What an ablation is in the Gym context (vary one dimension — model, harness, prompt, verifier — while holding others fixed)
    • Recommended run structure for a clean comparison (fixed seed, repeat count, sampling settings)
    • Naming and organizing runs for later analysis
    • How ablation outputs feed into BLADE analysis (link to Benchmark Analysis / BLADE)
    • Iteration pattern: baseline → intervention → re-run → compare aggregate metrics
  • Add navigation card in evaluation index
  • fern check passes

Metadata

Metadata

Assignees

Labels

documentationImprovements to documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions