Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/adr/0006-neural-learner-adapter-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ADR 0006: Neural Learner Adapter Contract

## Status

Accepted

## Context

IgnitionRL needs DQN, PPO and SAC later, but environment authors should not see algorithm-specific internals. The existing public boundary is already the TypeScript environment contract: vector observations, action specs, rewards, done conditions, traces and checkpoints.

M3.1 needs an adapter contract before implementing neural algorithms so Studio, CI and future native backends can agree on action-space support, update cadence, metrics and checkpoint shape.

## Decision

Add a `@ignitionrl/learning` neural adapter contract that is derived from `EnvironmentSpec`:

- `NeuralObservationSpace` mirrors vector observation shape and dtype.
- `NeuralActionSpace` normalizes discrete, continuous and multi-discrete action specs.
- `NeuralUpdateCadence` models step-based updates, episode-based updates and rollout-based updates.
- `NeuralMetricSpec` defines stable metric names, scopes, reducers and optimization direction.
- `NeuralCheckpointEnvelope` wraps backend-specific checkpoint payloads in a stable JSON envelope.

The built-in algorithm support profiles are:

- DQN: discrete action spaces.
- PPO: discrete, continuous and multi-discrete action spaces.
- SAC: continuous action spaces.

Unsupported action spaces fail before training starts. Custom adapters must declare supported action spaces explicitly.

## Consequences

Future DQN, PPO and SAC implementations can plug in without changing `defineEnvironment()`.

Studio can reason about learner metrics and checkpoint payloads before the actual neural backend exists.

Native backends can keep tensor payloads opaque while still producing stable IgnitionRL metadata around observations, actions, update cadence and metrics.

The current tabular Q-learning and linear policy search baselines remain unchanged; they map conceptually to the same boundaries but are not forced through the neural adapter until a concrete neural implementation needs it.
32 changes: 32 additions & 0 deletions packages/learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,38 @@ Checkpoints are JSON-serializable and include:

`linear-policy-search` keeps v1 checkpoint loading backward-compatible when newer diagnostic metrics are missing. Loading normalizes those fields before inference, so older demo artifacts can still be replayed through the current learner.

## Neural Learner Adapter Contract

`@ignitionrl/learning` now exposes an adapter contract for future neural learners without changing environment definitions:

```ts
import { defineNeuralLearnerAdapterContract } from "@ignitionrl/learning"
import { Target2D } from "@ignitionrl/examples"

const contract = defineNeuralLearnerAdapterContract(Target2D.getSpec(), {
algorithm: "dqn",
})
```

The contract records:

- vector observation shape and dtype;
- discrete, continuous or multi-discrete action space details;
- algorithm support rules for DQN, PPO and SAC;
- update cadence (`step`, `episode` or `rollout`);
- stable metric names for Studio and CI;
- a JSON checkpoint envelope with an opaque backend payload.

Built-in support profiles are intentionally conservative:

- `dqn`: discrete action spaces;
- `ppo`: discrete, continuous and multi-discrete action spaces;
- `sac`: continuous action spaces.

Unsupported combinations fail before a run starts with an algorithm-specific error. Custom neural adapters must declare their supported action spaces explicitly.

Current `TabularQLearner` and `LinearPolicySearchLearner` remain direct TypeScript learners. Future DQN/PPO/SAC implementations can sit behind this contract whether the backend is TypeScript, Rust/Burn, Rust/Candle or another native process. Environment authors still only implement `defineEnvironment()`.

## Scope

Use this package to prove that environments can be trained and reloaded through the public contract. Neural RL algorithms and native training backends should come later behind the same `Learner` and checkpoint boundary.
31 changes: 31 additions & 0 deletions packages/learning/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,37 @@ export {
type SelectActionOptions,
type TabularQOptions,
} from "./tabular-q.js";
export {
NEURAL_ADAPTER_CONTRACT_VERSION,
NEURAL_CHECKPOINT_SCHEMA_VERSION,
assertActionSpaceSupported,
assertNeuralCheckpointEnvelope,
createNeuralCheckpointEnvelope,
defaultMetricSpecs,
defineNeuralLearnerAdapterContract,
neuralActionSpaceFromSpec,
neuralAdapterContractToJson,
neuralObservationSpaceFromSpec,
type BuiltInNeuralLearnerAlgorithm,
type CreateNeuralCheckpointEnvelopeOptions,
type DefineNeuralLearnerAdapterContractOptions,
type NeuralActionSpace,
type NeuralActionSpaceKind,
type NeuralBackendDescriptor,
type NeuralCheckpointContract,
type NeuralCheckpointEnvelope,
type NeuralContinuousActionSpace,
type NeuralDiscreteActionSpace,
type NeuralLearnerAdapterContract,
type NeuralLearnerAlgorithm,
type NeuralMetricDirection,
type NeuralMetricReducer,
type NeuralMetricScope,
type NeuralMetricSpec,
type NeuralMultiDiscreteActionSpace,
type NeuralObservationSpace,
type NeuralUpdateCadence,
} from "./neural-adapter.js";
export {
trainLinearPolicySearch,
trainTabularQ,
Expand Down
Loading
Loading