Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions packages/learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,41 @@ console.log(diagnostics.bestPolicy?.weights)

`LinearPolicySearchLearner` is a small non-neural continuous-control baseline. It learns a linear observation-to-action policy with seeded candidate perturbations, optional label-based warm starts, elite averaging, bounded actions and JSON checkpoints. It also exposes numeric learner metrics such as candidate index, population rewards, improvement flags and policy weight norms, plus label-aware diagnostics for inspecting the current, mean and best policy matrices. It is meant to prove the continuous learner/checkpoint/debugging path before PPO/SAC or native tensor backends exist.

## Configuration and Metrics

Learner hyperparameters are normalized through exported config helpers:

```ts
import {
defineTabularQConfig,
learnerMetricSpecsForAlgorithm,
} from "@ignitionrl/learning"

const learnerConfig = defineTabularQConfig({
epsilon: 0.2,
learningRate: 0.3,
})

const metrics = learnerMetricSpecsForAlgorithm("tabular-q-learning")
```

Default baseline configs are:

- `tabular-q-learning`: `learningRate: 0.2`, `discount: 0.95`, `epsilon: 0.1`, `initialQ: 0`, `observationPrecision: 2`, `seed: 0`;
- `linear-policy-search`: `sigma: 0.2`, `actionNoise: 0.03`, `initialWeightScale: 0.05`, `populationSize: 6`, `eliteCount: 2`, `seed: 0`.

Default neural adapter cadences are:

- `dqn`: step updates every step after `warmupSteps: 1000`, with `batchSize: 64`;
- `ppo`: rollout updates every `2048` steps, with `epochs: 10` and `minibatches: 32`;
- `sac`: step updates every step after `warmupSteps: 1000`, with `batchSize: 64`.

Network, optimizer and replay-buffer hyperparameters are intentionally left to the future learner implementation while the adapter contract keeps their metrics and checkpoints stable.

Invalid hyperparameters throw during config normalization or learner initialization, before a Studio/SDK learner run is created. Learners also expose `getConfig()`, so `@ignitionrl/sdk` can persist the effective JSON config under `run.config.learnerConfig` for reproducibility.

Metric catalogs use the persisted run metric names that Studio panels and CI gates should depend on. Episode metrics are `totalReward`, `length`, `success`, `terminated` and `truncated`. Learner metrics are namespaced as `learner.*`, for example `learner.lastTdError`, `learner.epsilon` or `learner.tdLoss`.

## Checkpoints

```ts
Expand Down
19 changes: 19 additions & 0 deletions packages/learning/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ export {
export {
LINEAR_POLICY_SEARCH_ALGORITHM,
LINEAR_POLICY_SEARCH_CHECKPOINT_VERSION,
DEFAULT_LINEAR_POLICY_SEARCH_CONFIG,
LinearPolicySearchLearner,
assertCheckpointMatchesContinuousSpec,
assertLinearPolicySearchCheckpoint,
createLinearPolicySearchLearner,
defineLinearPolicySearchConfig,
normalizeLinearPolicySearchCheckpoint,
type LinearPolicySearchActOptions,
type LinearPolicySearchCandidateSummary,
Expand All @@ -28,12 +30,29 @@ export {
type LinearPolicySearchPolicyWeights,
} from "./linear-policy-search.js";
export {
DEFAULT_TABULAR_Q_CONFIG,
TabularQLearner,
createTabularQLearner,
defineTabularQConfig,
transitionFromStep,
type SelectActionOptions,
type TabularQOptions,
} from "./tabular-q.js";
export {
EPISODE_METRIC_SPECS,
LINEAR_POLICY_SEARCH_METRIC_SPECS,
TABULAR_Q_METRIC_SPECS,
assertLearnerMetricSpecs,
learnerMetricSpecsForAlgorithm,
learnerMetricSpecsFromNeuralAdapterContract,
stableMetricNamesForAlgorithm,
stableMetricNamesFromNeuralAdapterContract,
type LearnerMetricCatalogOptions,
type LearnerMetricDirection,
type LearnerMetricReducer,
type LearnerMetricScope,
type LearnerMetricSpec,
} from "./metrics.js";
export {
NEURAL_ADAPTER_CONTRACT_VERSION,
NEURAL_CHECKPOINT_SCHEMA_VERSION,
Expand Down
39 changes: 33 additions & 6 deletions packages/learning/src/linear-policy-search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,14 @@ type LinearPolicySearchMetricDefaults = {
readonly bestWeights?: readonly number[];
};

const DEFAULT_CONFIG: LinearPolicySearchConfig = {
export const DEFAULT_LINEAR_POLICY_SEARCH_CONFIG: LinearPolicySearchConfig = Object.freeze({
seed: 0,
sigma: 0.2,
actionNoise: 0.03,
initialWeightScale: 0.05,
populationSize: 6,
eliteCount: 2,
};
});

type Candidate = {
readonly weights: readonly number[];
Expand Down Expand Up @@ -158,13 +158,13 @@ export class LinearPolicySearchLearner implements Learner<number[]> {
private lastImproved = 0;

constructor(options: LinearPolicySearchOptions = {}) {
this.config = normalizeConfig(options);
this.config = defineLinearPolicySearchConfig(options);
this.metadata = options.metadata;
this.rng = createSeededRng(this.config.seed);
}

async init(spec: EnvironmentSpec, config: LearnerConfig = {}): Promise<void> {
const nextConfig = normalizeConfig({
const nextConfig = defineLinearPolicySearchConfig({
...this.config,
...config,
});
Expand Down Expand Up @@ -327,6 +327,10 @@ export class LinearPolicySearchLearner implements Learner<number[]> {
};
}

getConfig(): LinearPolicySearchConfig {
return cloneLinearPolicySearchConfig(this.config);
}

getPolicyWeights(
policy: LinearPolicySearchPolicyKind = "best",
): LinearPolicySearchPolicyWeights {
Expand Down Expand Up @@ -481,6 +485,12 @@ export function createLinearPolicySearchLearner(
return new LinearPolicySearchLearner(options);
}

export function defineLinearPolicySearchConfig(
options: Partial<LinearPolicySearchConfig> = {},
): LinearPolicySearchConfig {
return normalizeConfig(options);
}

export function normalizeLinearPolicySearchCheckpoint(
value: unknown,
): LinearPolicySearchCheckpoint {
Expand Down Expand Up @@ -599,8 +609,14 @@ export function assertCheckpointMatchesContinuousSpec(

function normalizeConfig(value: Partial<LinearPolicySearchConfig>): LinearPolicySearchConfig {
const config = {
...DEFAULT_CONFIG,
...value,
seed: value.seed ?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.seed,
sigma: value.sigma ?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.sigma,
actionNoise: value.actionNoise ?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.actionNoise,
initialWeightScale: value.initialWeightScale
?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.initialWeightScale,
populationSize: value.populationSize
?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.populationSize,
eliteCount: value.eliteCount ?? DEFAULT_LINEAR_POLICY_SEARCH_CONFIG.eliteCount,
...(value.initialWeightMap !== undefined
? { initialWeightMap: cloneInitialWeightMap(value.initialWeightMap) }
: {}),
Expand All @@ -611,6 +627,17 @@ function normalizeConfig(value: Partial<LinearPolicySearchConfig>): LinearPolicy
return config;
}

function cloneLinearPolicySearchConfig(
config: LinearPolicySearchConfig,
): LinearPolicySearchConfig {
return {
...config,
...(config.initialWeightMap !== undefined
? { initialWeightMap: cloneInitialWeightMap(config.initialWeightMap) }
: {}),
};
}

function assertLinearPolicySearchConfig(
value: unknown,
): asserts value is LinearPolicySearchConfig {
Expand Down
Loading
Loading