diff --git a/fern/versions/latest/pages/reference/cli-commands.mdx b/fern/versions/latest/pages/reference/cli-commands.mdx
index 0cd1e5a5b..3d753bd7e 100644
--- a/fern/versions/latest/pages/reference/cli-commands.mdx
+++ b/fern/versions/latest/pages/reference/cli-commands.mdx
@@ -252,23 +252,6 @@ gym dataset migrate \
--revision 0.0.1
```
-### `gym dataset render`
-
-Generate a dataset preview by materializing prompts from a raw input file and a prompt template.
-
-| Option | Description |
-| --- | --- |
-| `--input`, `-i` | Raw input JSONL file. |
-| `--prompt-config` | Prompt template YAML to apply. |
-| `--output`, `-o` | Output JSONL file. |
-
-```bash
-gym dataset render \
- --input raw.jsonl \
- --prompt-config prompt.yaml \
- --output preview.jsonl
-```
-
### `gym dataset collate`
Validate and collate a dataset, generating metrics and statistics.
@@ -289,6 +272,33 @@ gym dataset collate \
--mode example_validation
```
+### `gym dataset render`
+
+Generate a dataset preview by materializing prompts from a raw input file and a prompt template, producing JSONL with populated `responses_create_params.input` for RL training.
+
+Each input row must **not** already have a populated `responses_create_params.input`; the command applies the prompt template from `--prompt-config` to each row, fills in the input, and preserves the row's other fields.
+
+| Option | Description |
+| --- | --- |
+| `--input`, `-i` | Raw input JSONL file (rows without `responses_create_params.input`). |
+| `--prompt-config` | Prompt template YAML to apply. |
+| `--output`, `-o` | Output JSONL file. |
+
+```bash
+gym dataset render \
+ --input raw.jsonl \
+ --prompt-config prompt.yaml \
+ --output preview.jsonl
+```
+
+
+**Which data-preparation command should I use?**
+
+- **`gym dataset render`** — a focused, standalone step that applies a prompt template to raw rows to populate `responses_create_params.input`. No servers are started. Use it when you have raw data and just need to turn it into prompt-ready rows.
+- **`gym dataset collate`** — the full preparation pipeline for training: it can download missing datasets, validate data, and compute dataset metrics, writing train/validation splits and metrics artifacts. Use it to prepare and validate datasets for training or PR submission.
+
+
+
---
## Environments
@@ -444,7 +454,7 @@ Collate data, start the servers, and collect rollouts. This is the main evaluati
| `--model-type NAME` | Load the named model server type config. |
| `--search-dir DIR` | Extra root directory to search for named components. Repeatable. |
| `--no-serve` | Collect against already-running servers instead of starting them. |
-| `--resume` | Resume from cached rollouts instead of recollecting. |
+| `--resume` | Resume an interrupted run, skipping rows already completed. See [Resume Interrupted Runs](#resume-interrupted-runs). |
| `--agent`, `-a` | Agent to collect rollouts with. |
| `--input`, `-i` | Input tasks JSONL file. |
| `--output`, `-o` | Output rollouts JSONL file. |
@@ -488,6 +498,40 @@ gym eval run --no-serve \
--num-repeats '{agent_alpha: 4, agent_beta: 1, _default: 1}'
```
+#### Generation Parameters
+
+The most common sampling parameters have dedicated flags on `gym eval run`: `--temperature`, `--top-p`, and `--max-output-tokens`. These map onto `responses_create_params.temperature`, `.top_p`, and `.max_output_tokens` respectively.
+
+Any other `responses_create_params` field that has no dedicated flag can be set with a raw Hydra override using the `++responses_create_params.` syntax (see [Hydra overrides](#hydra-overrides-escape-hatch)). Overrides are merged into each input row's existing `responses_create_params` with a **shallow** merge (top-level keys only):
+
+```bash
+gym eval run --no-serve \
+ --agent example_single_tool_call_simple_agent \
+ --input weather_query.jsonl \
+ --output weather_rollouts.jsonl \
+ --temperature 1.0 \
+ --top-p 1.0 \
+ --max-output-tokens 4096 \
+ ++responses_create_params.reasoning.effort=low
+```
+
+Because the merge is shallow, setting a field inside a nested object (for example, `++responses_create_params.reasoning.effort=low`) replaces the row's entire nested dict at that key — other fields under the same nested object are not preserved.
+
+#### Resume Interrupted Runs
+
+Passing `--resume` to `gym eval run` lets you restart the **same command** after a crash or interruption and pick up only the rows that have not finished yet. It works whether or not you pass `--no-serve`.
+
+How it works:
+
+- **Materialized inputs.** On the first run, the fully expanded input rows (after `--num-repeats`, `--limit`, `--prompt-config`, and any overrides) are written to a sidecar file next to your output. The path is derived from the output path by appending `_materialized_inputs` to the stem — so `rollouts.jsonl` produces `rollouts_materialized_inputs.jsonl`.
+- **Incremental output.** Results are flushed to the output file after each completed rollout, so partial output survives a crash.
+- **Matching.** On resume, completed work is matched by `(task_index, rollout_index)` against the materialized inputs, and already-completed rows are skipped. The run prints a summary such as the number of original input rows, rows already done, and rows that still need to be run.
+- **Fallback.** If either the materialized inputs or the output file is missing, resume is skipped and the run starts fresh. Without `--resume`, existing output is cleared before the run.
+
+
+If you change the config, schema, or data between runs, the materialized inputs become stale and resume will diff against the old expansion. Delete the `*_materialized_inputs.jsonl` file (and the output file) to start fresh.
+
+
### `gym eval aggregate`
Merge sharded rollout results into a single rollouts file with aggregate metrics.