IgnitionAI · salim4n · May 29, 2026 · May 29, 2026
diff --git a/README.md b/README.md
@@ -91,6 +91,8 @@ IgnitionRL
 
 ## Headless Demo
 
+To author an environment from a blank TypeScript project, follow the first guide in [`docs/BUILD_YOUR_FIRST_ENVIRONMENT.md`](docs/BUILD_YOUR_FIRST_ENVIRONMENT.md).
+
 After cloning and installing dependencies, generate a local project with traces, metrics and JSON exports:
 
 ```sh

diff --git a/docs/BUILD_YOUR_FIRST_ENVIRONMENT.md b/docs/BUILD_YOUR_FIRST_ENVIRONMENT.md
@@ -0,0 +1,317 @@
+# Build Your First Environment
+
+This guide starts from an empty TypeScript project and ends with a working
+IgnitionRL environment that can reset, step, emit observations, validate
+actions, explain rewards and produce a replay trace.
+
+The example is a small `Target2D-v0` task: an agent starts at the origin and
+must move toward a seeded target.
+
+## 1. Create a Blank Project
+
+```sh
+mkdir target-2d-env
+cd target-2d-env
+bun init -y
+bun add @ignitionrl/core
+mkdir src
+```
+
+Inside this repository, the same APIs are available from
+`packages/core/src/index.ts`; in a user project, import from
+`@ignitionrl/core`.
+
+## 2. Define State and Helpers
+
+Create `src/target-2d.ts`.
+
+```ts
+import { defineEnvironment, reward } from "@ignitionrl/core";
+
+type Vec2 = {
+  x: number;
+  y: number;
+};
+
+export type Target2DState = {
+  agent: Vec2;
+  target: Vec2;
+  previousDistance: number;
+  steps: number;
+};
+
+const ACTIONS = ["up", "down", "left", "right"] as const;
+
+export type Target2DAction = typeof ACTIONS[number];
+
+const STEP_SIZE = 0.25;
+const TARGET_RADIUS = 0.5;
+const MAX_STEPS = 120;
+
+function distance(a: Vec2, b: Vec2): number {
+  return Math.hypot(a.x - b.x, a.y - b.y);
+}
+```
+
+The state can be any serializable shape that your environment owns. Keep the
+state explicit: observations should be derived from it, rewards should be
+computed from it and `step()` should return the next state without mutating the
+previous one.
+
+## 3. Add Reset and Seeding
+
+In IgnitionRL, reset logic lives in `createInitialState()`. The runner calls it
+when it is created and every time you call `runner.reset()`.
+
+```ts
+export const Target2D = defineEnvironment({
+  id: "Target2D-v0",
+  metadata: {
+    description: "A compact 2D target-reaching tutorial environment.",
+    maxSteps: MAX_STEPS,
+    tags: ["tutorial", "2d", "discrete"],
+    observationLabels: [
+      "agent.x",
+      "agent.y",
+      "target.dx",
+      "target.dy",
+      "previousDistance",
+    ],
+    actionLabels: ACTIONS,
+  },
+
+  createInitialState: ({ rng }): Target2DState => {
+    const agent = { x: 0, y: 0 };
+    const target = {
+      x: rng.float(-5, 5),
+      y: rng.float(-5, 5),
+    };
+
+    return {
+      agent,
+      target,
+      previousDistance: distance(agent, target),
+      steps: 0,
+    };
+  },
+```
+
+Use the provided `rng` instead of `Math.random()`. A runner created with the
+same seed and stepped with the same action sequence should produce the same
+observations, rewards and done results.
+
+## 4. Return Observations
+
+Observations are numeric vectors. Arrays and numeric typed arrays are accepted.
+
+```ts
+  observations: ({ state }) => [
+    state.agent.x,
+    state.agent.y,
+    state.target.x - state.agent.x,
+    state.target.y - state.agent.y,
+    state.previousDistance,
+  ],
+```
+
+Keep the observation order stable. Learners treat dimension `0`, dimension `1`
+and so on as separate signals, so reordering values between runs changes the
+meaning of a checkpoint.
+
+## 5. Declare Actions
+
+This guide uses a discrete action space with literal values.
+
+```ts
+  actions: {
+    type: "discrete",
+    values: ACTIONS,
+    labels: ACTIONS,
+  },
+```
+
+Other supported action spaces are:
+
+```ts
+// Discrete index action: action is 0, 1, 2, ...
+{ type: "discrete", n: 4, labels: ["up", "down", "left", "right"] }
+
+// Continuous action: action is a number array with shape size 4.
+{ type: "continuous", shape: [4], low: -1, high: 1, labels: ["throttle", "yaw", "pitch", "roll"] }
+
+// Multi-discrete action: action is a number array with one integer per branch.
+{ type: "multi-discrete", nvec: [3, 2], labels: ["move", "fire"] }
+```
+
+## 6. Step the Environment
+
+`step()` receives the current state and a validated action. Return the next
+state.
+
+```ts
+  step: ({ state, action }) => {
+    const next: Target2DState = structuredClone(state);
+
+    next.previousDistance = distance(state.agent, state.target);
+
+    if (action === "up") next.agent.y += STEP_SIZE;
+    if (action === "down") next.agent.y -= STEP_SIZE;
+    if (action === "left") next.agent.x -= STEP_SIZE;
+    if (action === "right") next.agent.x += STEP_SIZE;
+
+    next.steps += 1;
+
+    return next;
+  },
+```
+
+Avoid mutating `state` in place. Traces, reward debugging and comparisons are
+easier to reason about when `state` and `nextState` are distinct snapshots.
+
+## 7. Name Reward Terms
+
+Rewards can be a scalar internally, but users debug them as named terms. Use
+one term for each reason the agent gains or loses reward.
+
+```ts
+  reward: ({ state, nextState }) => {
+    const previousDistance = distance(state.agent, state.target);
+    const nextDistance = distance(nextState.agent, nextState.target);
+
+    return reward()
+      .add("progress", previousDistance - nextDistance)
+      .add("target_reached", nextDistance < TARGET_RADIUS, 10)
+      .stepPenalty(-0.01);
+  },
+```
+
+The reward builder also supports penalties:
+
+```ts
+reward()
+  .add("progress", 0.1)
+  .penalty("collision", didCollide, 3)
+  .stepPenalty(-0.01)
+```
+
+## 8. End Episodes
+
+`done()` can return a boolean or a structured result. Prefer structured results
+when you know whether the episode succeeded or was truncated.
+
+```ts
+  done: ({ nextState }) => {
+    const nextDistance = distance(nextState.agent, nextState.target);
+
+    if (nextDistance < TARGET_RADIUS) {
+      return { done: true, reason: "target_reached", success: true };
+    }
+
+    if (nextState.steps >= MAX_STEPS) {
+      return { done: true, reason: "max_steps", truncated: true };
+    }
+
+    return false;
+  },
+});
+```
+
+The full `src/target-2d.ts` file now exports `Target2D`, a typed environment
+with reset, observation, action, step, reward and done behavior.
+
+## 9. Run a Smoke Script
+
+Create `src/smoke.ts`.
+
+```ts
+import { randomPolicy } from "@ignitionrl/core";
+import { Target2D } from "./target-2d.ts";
+
+const spec = Target2D.getSpec({ seed: "tutorial-spec" });
+
+console.log("Environment:", spec.id);
+console.log("Observation shape:", spec.observation.shape);
+console.log("Action spec:", spec.actions);
+
+const runner = Target2D.createRunner({
+  seed: "tutorial-run",
+  runId: "tutorial-target-2d",
+  maxSteps: 120,
+  collectTrace: true,
+});
+
+const reset = runner.reset({ seed: "tutorial-reset" });
+
+console.log("Initial observation:", reset.observation);
+
+for (const action of ["right", "up", "right", "up"] as const) {
+  const step = runner.step(action);
+
+  console.log({
+    t: step.t,
+    action: step.action,
+    reward: step.reward.total,
+    terms: step.reward.terms,
+    done: step.done,
+    reason: step.reason,
+  });
+
+  if (step.done) break;
+}
+
+const trace = await Target2D
+  .createRunner({ seed: "tutorial-random", maxSteps: 120, collectTrace: true })
+  .runEpisode(randomPolicy(Target2D.actions, { seed: "tutorial-policy" }), {
+    reset: false,
+    maxSteps: 120,
+  });
+
+console.log("Random baseline:", trace.summary);
+console.log("Trace steps:", trace.steps.length);
+```
+
+Run it:
+
+```sh
+bun src/smoke.ts
+```
+
+You should see:
+
+- an environment spec with a vector observation shape;
+- an action spec with four discrete values;
+- per-step rewards with named terms;
+- a replay trace summary from the random baseline episode.
+
+## 10. What IgnitionRL Validates
+
+IgnitionRL validates the contract at definition time and step time:
+
+- `defineEnvironment()` checks that `id`, action spec and required functions are valid.
+- `getSpec()` calls `createInitialState()` and `observations()` to infer observation shape.
+- `runner.step(action)` validates the action before calling your `step()`.
+- every observation is normalized into a finite number array.
+- every reward is normalized into `{ total, terms }`.
+- every done result is normalized into `{ done, reason?, success?, truncated? }`.
+
+## Troubleshooting
+
+| Symptom | Likely cause | Fix |
+| --- | --- | --- |
+| `[IgnitionRL] observations() must return a numeric array.` | `observations()` returned an object, scalar, `undefined` or mixed structure. | Return `number[]`, `Float32Array` or another numeric typed array. |
+| `[IgnitionRL] observation[2] must be finite, got NaN.` | An observation dimension is `NaN` or `Infinity`. | Guard division by zero, missing state fields and invalid physics values. |
+| `[IgnitionRL] Invalid discrete action: jump.` | The policy returned a value not listed in `actions.values`. | Return one of the exact literal values, or use `env.sampleAction()` while debugging. |
+| `[IgnitionRL] Continuous action length must be 4, got 3.` | A continuous policy returned the wrong vector size. | Match the flattened size of `actions.shape`. |
+| `[IgnitionRL] continuous action[0] must be in [-1, 1]` | A continuous policy exceeded `low` or `high`. | Clamp action outputs or widen the action spec intentionally. |
+| `[IgnitionRL] reward() must return a RewardBuilder or RewardResult.` | `reward()` returned a raw number or a malformed object. | Return `reward().add(...)` or `{ total, terms }`. |
+| `[IgnitionRL] Reward term names must be non-empty.` | A reward term name is `""` or whitespace. | Name every reward cause, such as `progress`, `collision` or `step_penalty`. |
+| `[IgnitionRL] done() must return a boolean or DoneResult.` | `done()` returned `undefined`, a number or another shape. | Return `false`, `true` or `{ done: boolean, reason?: string }`. |
+| `[IgnitionRL] Episode is done. Call reset() before stepping again.` | Code called `runner.step()` after termination. | Call `runner.reset()` before continuing. |
+| Runs are not reproducible. | Randomness is coming from `Math.random()` or time-based state. | Use the provided `rng` in `createInitialState()` and policy code. |
+
+## Next Step
+
+Once the environment works under `@ignitionrl/core`, wire it into a local
+project through the SDK or one of the CLI templates. The current first-class CLI
+training command supports the built-in `GridWorld-v0`, `Target2D-v0` and
+`DroneTarget-v0` environments while custom project registration matures.
-Original file line number
+Diff line change
@@ Expand Up / @@ -91,6 +91,8 @@ IgnitionRL @@
     ## Headless Demo
+    To author an environment from a blank TypeScript project, follow the first guide in [`docs/BUILD_YOUR_FIRST_ENVIRONMENT.md`](docs/BUILD_YOUR_FIRST_ENVIRONMENT.md).
     After cloning and installing dependencies, generate a local project with traces, metrics and JSON exports:
     ```sh
@@ Expand Down @@