Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ IgnitionRL

## Headless Demo

To author an environment from a blank TypeScript project, follow the first guide in [`docs/BUILD_YOUR_FIRST_ENVIRONMENT.md`](docs/BUILD_YOUR_FIRST_ENVIRONMENT.md).

After cloning and installing dependencies, generate a local project with traces, metrics and JSON exports:

```sh
Expand Down
317 changes: 317 additions & 0 deletions docs/BUILD_YOUR_FIRST_ENVIRONMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,317 @@
# Build Your First Environment

This guide starts from an empty TypeScript project and ends with a working
IgnitionRL environment that can reset, step, emit observations, validate
actions, explain rewards and produce a replay trace.

The example is a small `Target2D-v0` task: an agent starts at the origin and
must move toward a seeded target.

## 1. Create a Blank Project

```sh
mkdir target-2d-env
cd target-2d-env
bun init -y
bun add @ignitionrl/core
mkdir src
```

Inside this repository, the same APIs are available from
`packages/core/src/index.ts`; in a user project, import from
`@ignitionrl/core`.

## 2. Define State and Helpers

Create `src/target-2d.ts`.

```ts
import { defineEnvironment, reward } from "@ignitionrl/core";

type Vec2 = {
x: number;
y: number;
};

export type Target2DState = {
agent: Vec2;
target: Vec2;
previousDistance: number;
steps: number;
};

const ACTIONS = ["up", "down", "left", "right"] as const;

export type Target2DAction = typeof ACTIONS[number];

const STEP_SIZE = 0.25;
const TARGET_RADIUS = 0.5;
const MAX_STEPS = 120;

function distance(a: Vec2, b: Vec2): number {
return Math.hypot(a.x - b.x, a.y - b.y);
}
```

The state can be any serializable shape that your environment owns. Keep the
state explicit: observations should be derived from it, rewards should be
computed from it and `step()` should return the next state without mutating the
previous one.

## 3. Add Reset and Seeding

In IgnitionRL, reset logic lives in `createInitialState()`. The runner calls it
when it is created and every time you call `runner.reset()`.

```ts
export const Target2D = defineEnvironment({
id: "Target2D-v0",
metadata: {
description: "A compact 2D target-reaching tutorial environment.",
maxSteps: MAX_STEPS,
tags: ["tutorial", "2d", "discrete"],
observationLabels: [
"agent.x",
"agent.y",
"target.dx",
"target.dy",
"previousDistance",
],
actionLabels: ACTIONS,
},

createInitialState: ({ rng }): Target2DState => {
const agent = { x: 0, y: 0 };
const target = {
x: rng.float(-5, 5),
y: rng.float(-5, 5),
};

return {
agent,
target,
previousDistance: distance(agent, target),
steps: 0,
};
},
```

Use the provided `rng` instead of `Math.random()`. A runner created with the
same seed and stepped with the same action sequence should produce the same
observations, rewards and done results.

## 4. Return Observations

Observations are numeric vectors. Arrays and numeric typed arrays are accepted.

```ts
observations: ({ state }) => [
state.agent.x,
state.agent.y,
state.target.x - state.agent.x,
state.target.y - state.agent.y,
state.previousDistance,
],
```

Keep the observation order stable. Learners treat dimension `0`, dimension `1`
and so on as separate signals, so reordering values between runs changes the
meaning of a checkpoint.

## 5. Declare Actions

This guide uses a discrete action space with literal values.

```ts
actions: {
type: "discrete",
values: ACTIONS,
labels: ACTIONS,
},
```

Other supported action spaces are:

```ts
// Discrete index action: action is 0, 1, 2, ...
{ type: "discrete", n: 4, labels: ["up", "down", "left", "right"] }

// Continuous action: action is a number array with shape size 4.
{ type: "continuous", shape: [4], low: -1, high: 1, labels: ["throttle", "yaw", "pitch", "roll"] }

// Multi-discrete action: action is a number array with one integer per branch.
{ type: "multi-discrete", nvec: [3, 2], labels: ["move", "fire"] }
```

## 6. Step the Environment

`step()` receives the current state and a validated action. Return the next
state.

```ts
step: ({ state, action }) => {
const next: Target2DState = structuredClone(state);

next.previousDistance = distance(state.agent, state.target);

if (action === "up") next.agent.y += STEP_SIZE;
if (action === "down") next.agent.y -= STEP_SIZE;
if (action === "left") next.agent.x -= STEP_SIZE;
if (action === "right") next.agent.x += STEP_SIZE;

next.steps += 1;

return next;
},
```

Avoid mutating `state` in place. Traces, reward debugging and comparisons are
easier to reason about when `state` and `nextState` are distinct snapshots.

## 7. Name Reward Terms

Rewards can be a scalar internally, but users debug them as named terms. Use
one term for each reason the agent gains or loses reward.

```ts
reward: ({ state, nextState }) => {
const previousDistance = distance(state.agent, state.target);
const nextDistance = distance(nextState.agent, nextState.target);

return reward()
.add("progress", previousDistance - nextDistance)
.add("target_reached", nextDistance < TARGET_RADIUS, 10)
.stepPenalty(-0.01);
},
```

The reward builder also supports penalties:

```ts
reward()
.add("progress", 0.1)
.penalty("collision", didCollide, 3)
.stepPenalty(-0.01)
```

## 8. End Episodes

`done()` can return a boolean or a structured result. Prefer structured results
when you know whether the episode succeeded or was truncated.

```ts
done: ({ nextState }) => {
const nextDistance = distance(nextState.agent, nextState.target);

if (nextDistance < TARGET_RADIUS) {
return { done: true, reason: "target_reached", success: true };
}

if (nextState.steps >= MAX_STEPS) {
return { done: true, reason: "max_steps", truncated: true };
}

return false;
},
});
```

The full `src/target-2d.ts` file now exports `Target2D`, a typed environment
with reset, observation, action, step, reward and done behavior.

## 9. Run a Smoke Script

Create `src/smoke.ts`.

```ts
import { randomPolicy } from "@ignitionrl/core";
import { Target2D } from "./target-2d.ts";

const spec = Target2D.getSpec({ seed: "tutorial-spec" });

console.log("Environment:", spec.id);
console.log("Observation shape:", spec.observation.shape);
console.log("Action spec:", spec.actions);

const runner = Target2D.createRunner({
seed: "tutorial-run",
runId: "tutorial-target-2d",
maxSteps: 120,
collectTrace: true,
});

const reset = runner.reset({ seed: "tutorial-reset" });

console.log("Initial observation:", reset.observation);

for (const action of ["right", "up", "right", "up"] as const) {
const step = runner.step(action);

console.log({
t: step.t,
action: step.action,
reward: step.reward.total,
terms: step.reward.terms,
done: step.done,
reason: step.reason,
});

if (step.done) break;
}

const trace = await Target2D
.createRunner({ seed: "tutorial-random", maxSteps: 120, collectTrace: true })
.runEpisode(randomPolicy(Target2D.actions, { seed: "tutorial-policy" }), {
reset: false,
maxSteps: 120,
});

console.log("Random baseline:", trace.summary);
console.log("Trace steps:", trace.steps.length);
```

Run it:

```sh
bun src/smoke.ts
```

You should see:

- an environment spec with a vector observation shape;
- an action spec with four discrete values;
- per-step rewards with named terms;
- a replay trace summary from the random baseline episode.

## 10. What IgnitionRL Validates

IgnitionRL validates the contract at definition time and step time:

- `defineEnvironment()` checks that `id`, action spec and required functions are valid.
- `getSpec()` calls `createInitialState()` and `observations()` to infer observation shape.
- `runner.step(action)` validates the action before calling your `step()`.
- every observation is normalized into a finite number array.
- every reward is normalized into `{ total, terms }`.
- every done result is normalized into `{ done, reason?, success?, truncated? }`.

## Troubleshooting

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| `[IgnitionRL] observations() must return a numeric array.` | `observations()` returned an object, scalar, `undefined` or mixed structure. | Return `number[]`, `Float32Array` or another numeric typed array. |
| `[IgnitionRL] observation[2] must be finite, got NaN.` | An observation dimension is `NaN` or `Infinity`. | Guard division by zero, missing state fields and invalid physics values. |
| `[IgnitionRL] Invalid discrete action: jump.` | The policy returned a value not listed in `actions.values`. | Return one of the exact literal values, or use `env.sampleAction()` while debugging. |
| `[IgnitionRL] Continuous action length must be 4, got 3.` | A continuous policy returned the wrong vector size. | Match the flattened size of `actions.shape`. |
| `[IgnitionRL] continuous action[0] must be in [-1, 1]` | A continuous policy exceeded `low` or `high`. | Clamp action outputs or widen the action spec intentionally. |
| `[IgnitionRL] reward() must return a RewardBuilder or RewardResult.` | `reward()` returned a raw number or a malformed object. | Return `reward().add(...)` or `{ total, terms }`. |
| `[IgnitionRL] Reward term names must be non-empty.` | A reward term name is `""` or whitespace. | Name every reward cause, such as `progress`, `collision` or `step_penalty`. |
| `[IgnitionRL] done() must return a boolean or DoneResult.` | `done()` returned `undefined`, a number or another shape. | Return `false`, `true` or `{ done: boolean, reason?: string }`. |
| `[IgnitionRL] Episode is done. Call reset() before stepping again.` | Code called `runner.step()` after termination. | Call `runner.reset()` before continuing. |
| Runs are not reproducible. | Randomness is coming from `Math.random()` or time-based state. | Use the provided `rng` in `createInitialState()` and policy code. |

## Next Step

Once the environment works under `@ignitionrl/core`, wire it into a local
project through the SDK or one of the CLI templates. The current first-class CLI
training command supports the built-in `GridWorld-v0`, `Target2D-v0` and
`DroneTarget-v0` environments while custom project registration matures.
Loading