Part of #1436 — Evaluate section restructure.
There is no page explaining how to structure ablation studies or model iteration comparisons using NeMo Gym — a key use case for the Frontier Eval workflow.
Tasks
- Create
fern/versions/latest/pages/evaluation/ablations.mdx
- Cover:
- What an ablation is in the Gym context (vary one dimension — model, harness, prompt, verifier — while holding others fixed)
- Recommended run structure for a clean comparison (fixed seed, repeat count, sampling settings)
- Naming and organizing runs for later analysis
- How ablation outputs feed into BLADE analysis (link to Benchmark Analysis / BLADE)
- Iteration pattern: baseline → intervention → re-run → compare aggregate metrics
- Add navigation card in evaluation index
fern check passes
Part of #1436 — Evaluate section restructure.
There is no page explaining how to structure ablation studies or model iteration comparisons using NeMo Gym — a key use case for the Frontier Eval workflow.
Tasks
fern/versions/latest/pages/evaluation/ablations.mdxfern checkpasses