Problem
Dataset shape is easy to misunderstand. Raw GSM8K (question/answer) needs different formatting per algorithm: RL math uses prompt/solutions; SFT expects messages, prompt+response, or text. "GSM8K works with AReno" does not mean it works for every algorithm.
Scope
Add a Dataset Formats page answering: "What columns must my dataset have for each AReno algorithm?"
- Mental model: raw datasets vs AReno training schemas vs loader functions.
- SFT schemas (
prompt+response, messages, text) and their loss behavior.
- RL math schema (
prompt+solutions as reward metadata, not an SFT target).
- DPO preference schema (
prompt/chosen/rejected).
- Dataset loader function contract and when to use one.
- GSM8K examples for both RL and SFT; state plainly raw GSM8K is not SFT-ready.
- Working CLI examples per shape.
Acceptance
- Short, exact, example-heavy; no marketing language.
- Cites existing
examples/math/ files.
- No code or CLI changes.
Problem
Dataset shape is easy to misunderstand. Raw GSM8K (
question/answer) needs different formatting per algorithm: RL math usesprompt/solutions; SFT expectsmessages,prompt+response, ortext. "GSM8K works with AReno" does not mean it works for every algorithm.Scope
Add a
Dataset Formatspage answering: "What columns must my dataset have for each AReno algorithm?"prompt+response,messages,text) and their loss behavior.prompt+solutionsas reward metadata, not an SFT target).prompt/chosen/rejected).Acceptance
examples/math/files.