Pluggable red selector + extras factory (architecture follow-up) by PaulHax · Pull Request #15 · ITM-Kitware/jaxborg

PaulHax · 2026-05-06T19:14:37Z

Stacked on #14. Reviewing this PR shows only the architecture delta — the rebased resilience work it depends on is in the parent PR.

What it does

Collapses the multi-class, hand-rolled red-agent surface into a single parameterised env + a registry of selectors. The next biased-red PR becomes ~200 lines instead of ~1300.

How roles are picked

Each episode picks 3 of the operational-zone server hostnames at random (out of ~6) and tags them auth / db / web. The resilience metric scores impact actions against those 3; the red bias points at those 3 (or a CIA subset of them). Every other host — including untagged op-zone servers — is unbiased and unscored that episode.

Same (env_seed) reproduces the same map. Different seeds → different shuffle → different 3 hosts get the 3 roles. Over many episodes, every op-zone server gets exposure to every role, so the policy can't memorise "host_0 is always auth."

Coordination:

JAX: env extras_factory(key, const) calls assign_resilience_roles_from_const(const, key) at every reset; host_resilience_role rides in env state and gets read directly by the selector.
CybORG: callers run inject_role_map(env, ep_seed) after every env.reset(), which builds the map from the env's full host list and pushes it into every ResilienceRedAgent via set_role_map. Wired in training (ippo_cyborg.env_worker), eval (cyborg_runner / jax_runner), and the trajectory recorder (cc4_trajectory_eval).

This restores @Dmujt's original PR #11 intent ("each episode randomly assigns three Operational Zone servers to auth/db/web roles") which had drifted to deterministic-by-sort during the rebase, and unifies the JAX side and CybORG side onto a single role-assignment rule (the old CybORG index mod 3 is gone).

Commits

01ac145 feat: pluggable red action selectors with name-based registry. New scenarios/cc4/red_selectors.py with RedSelector Protocol-style callable, role_biased_selector factory, REGISTRY (fsm/resilience/cia_a/cia_i/cia_c plus aliases), and make_red_selector(name, **kwargs) recipe-friendly entry.
07a1cf6 refactor: collapse FsmRedCC4Env + ResilienceRedCC4Env into one parameterised env. FsmRedCC4Env now takes red_selector + extras_factory. FsmRedEnvState keeps the flat state/const layout (existing callers untouched) plus a stable-shape extras dict.
4e6567b refactor: ippo_jax uses make_fsm_red_env, dropping the if/elif ladder. 13-line if/elif/else × 2 sites → 1 line × 2 sites.
33c7703 refactor: drop back-compat shims; consolidate FSM helpers in red_selectors. Deletes parity/resilience_red_env.py, scenarios/cc4/resilience_red_fsm.py, scenarios/cc4/resilience_topology.py. Moves JAX role assignment into topology_roles.py as assign_resilience_roles_from_const.
ded1e70 refactor: drop cc4_ prefix from generic eval scripts. cc4_aggregate_cia.py and cc4_score_trajectories.py are pure trajectory-format consumers — they don't know which game produced the JSONL. Renamed to aggregate_cia.py / score_trajectories.py. Kept cc4_ prefix on cc4_trajectory_eval.py because it uses EnterpriseScenarioGenerator / EnterpriseMAE and hardcodes CC4 dims (OBS_DIM=210 / ACT_DIM=242 / NUM_AGENTS=5) — bound to CC4 by construction.
87c5069 perf(eval): parallel rollouts in eval_recipe. Episodes are independent; evaluate_on_cyborg / evaluate_jax_on_cyborg now fan out across processes (default cpu_count() - 2). 300-episode comparison drops from ~30 min to ~3 min on a 64-core box.
3b2f4e1 fix(eval): jax_runner honors recipe.eval.red_agent + dedupe dispatch. Four call sites had drifting _red_classes tables; consolidated into jaxborg.evaluation.cyborg_red_dispatch.cyborg_red_class with proper c/i/a → CRedAgent/IRedAgent/ARedAgent mapping. Unknown names now raise instead of silently falling back to finite_state.
2d129a2 feat(resilience): per-episode random role assignment, globally coordinated. See "How roles are picked" above.
67a9ce2 simplify: drop role-assignment fallbacks. No deterministic-by-sort branch, no lazy _ensure_role_map, no TYPE_CHECKING guard. One path: caller supplies rng / key, agents get the map via set_role_map. -51 lines net.

What "next biased red" looks like under this design

def _database_tier(**_):
    return role_biased_selector(target_roles=(ROLE_DB,), target_weight=20.0)

REGISTRY["database_tier"] = _database_tier

Recipe: red_agent: database_tier. Done. No env class. No env wrapper. No ippo_jax.py change.

New tests

tests/test_red_selectors.py (13 tests): registry names, alias resolution, end-to-end env construction per registered selector, default-extras-zero invariant.

tests/test_resilience_roles.py (8 tests): pinned to the random-with-rng contract — same rng → same map, varies across seeds, input-order invariant, handles <3 candidates.

Validation

uv run ruff check clean
850+ fast tests + 9 slow test_fsm_red_env + 13 selector contract tests + 8 role-assignment tests + 21 recipe smoke tests, all green
300-episode CybORG eval comparison (default vs resilience checkpoints) running cleanly under the parallel runner

Merge order

This PR's base is the parent PR's branch; merge #14 first, then GitHub auto-retargets this to main.

A RedSelector is a callable wired into FsmRedCC4Env that picks red actions each step. All selectors share one signature so they're interchangeable — recipes pick a selector by name via make_red_selector, no env subclassing. The four hand-rolled CIA selectors in resilience_red_fsm.py (resilience/c/i/a) collapse into one role_biased_selector factory call parameterised by target_roles, target_weight, and an optional FSM action- prob matrix override. Adding "target the database tier" or "ignore user hosts" becomes one REGISTRY entry — no env class, no env wrapper. The selector signature always takes host_resilience_role; selectors that don't care (e.g. fsm_selector) ignore it. Keeps the wrapper env unaware of which selectors need which extras. This commit only adds the module; FsmRedCC4Env is still the old monolithic class. The next commit migrates the env wrapper.

…terised env FsmRedCC4Env now takes a red_selector (default vanilla FSM) and an extras_factory (default empty roles) at construction time. Any biased red agent — resilience, CIA-targeted, future variants — is just a different (selector, extras_factory) pair, not a different class. ResilienceRedCC4Env survives as a thin compat shim over make_fsm_red_env so existing imports keep working. ResilienceEnvState is aliased to the new FsmRedEnvState. State shape: FsmRedEnvState extends ScenarioEnvState's flat layout (state, const) plus an extras dict. Existing callers reading env_state.state / env_state.const are unaffected; only the type changed. The only extras key today is host_resilience_role, defaulting to zeros so role-biased selectors degrade gracefully when no factory is supplied. Tests: tests/test_fsm_red_env.py asserts isinstance against the new FsmRedEnvState; tests/subsystems and tests/test_cc4_env all pass.

Both env-construction sites in ippo_jax (the make_train builder and the top-level probe for action_dim) collapse from a 5-branch if/elif/else into one make_fsm_red_env call. The recipe's RED_AGENT name flows directly to the selector registry; RESILIENCE_MODE is honoured as a fallback so old recipes keep working without modification. Removes the FsmRedCC4Env / ResilienceRedCC4Env import distinction.

…ctors The user has explicitly accepted the breaking API change — no callers outside this branch import the deleted symbols, so there is no reason to keep shims. Deleted: - parity/resilience_red_env.py (the ResilienceRedCC4Env class is now make_fsm_red_env) - scenarios/cc4/resilience_red_fsm.py (the 4 hand-rolled selectors collapsed into role_biased_selector, the helpers move to red_selectors.py) - scenarios/cc4/resilience_topology.py (build_resilience_topology was unused; _assign_resilience_roles moves to topology_roles as the public assign_resilience_roles_from_const) Result: - One canonical role-assignment module: scenarios/cc4/topology_roles.py exports both flavours (hostname-list for the Python recorder + CybORG side, const-based for JAX). They use the same lowest-3-sorted convention so they agree on every input. - One canonical selector module: scenarios/cc4/red_selectors.py with the Protocol-style RedSelector callable, the role_biased_selector factory, the REGISTRY, and all the internal FSM machinery in one place. - One env class: parity/fsm_red_env.py with red_selector + extras_factory injection points. The make_fsm_red_env helper does the recipe-name dispatch. Deleted RESILIENCE_ROLE_* aliases — ROLE_AUTH/DB/WEB/NONE in topology_roles are the canonical constants. CybORG-side mirror docstrings updated. Tests still green: tests/test_red_selectors.py (13 incl. one slow E2E), tests/test_resilience_roles.py (6), tests/test_fsm_red_env.py (9 slow), tests/test_cc4_env.py + tests/subsystems/* (~830 fast).

Per the naming convention discussion: "cc4" should mean the unmodified upstream game (specific topology + FiniteStateRedAgent + BlueRewardMachine), not "any tool that happens to live in this repo". Tools that work across topologies / red agents shouldn't carry the cc4 brand. Renamed: - scripts/eval/cc4_aggregate_cia.py -> aggregate_cia.py - scripts/eval/cc4_score_trajectories.py -> score_trajectories.py These are pure trajectory-format consumers — they don't know or care which game produced the JSONL, only that it follows the documented schema. Kept cc4_ prefix on: - cc4_trajectory_eval.py — uses EnterpriseScenarioGenerator / EnterpriseMAE, hardcodes EPISODE_LENGTH=500 / NUM_AGENTS=5 / OBS_DIM=210 / ACT_DIM=242. Bound to CC4 by construction. A future "record_trajectories.py" that takes a scenario factory would supersede it. Updated cross-references in: - README usage example - recipes/resilience.yaml comment - cc4_trajectory_eval.py docstring

scripts/dev/check_red_bias.py rolls out short episodes with each registered selector under a sleep-blue policy and reports what fraction of red attacks land on hosts of each role (NONE / AUTH / DB / WEB). Validates that the registry-based architecture preserves the per-selector bias semantics PR #11 specified. Run output (3 episodes × 30 steps × 5 selectors, all under role_assignment='resilience' for an apples-to-apples baseline): selector NONE AUTH DB WEB tagged% fsm 95.1% 3.3% 0.8% 0.8% 4.9% (uniform baseline) resilience 82.8% 6.1% 5.7% 5.3% 17.2% (weight=5, all 3) cia_c 83.2% 5.3% 11.1% 0.4% 16.8% (weight=10, AUTH+DB) cia_i 82.8% 8.2% 1.2% 7.8% 17.2% (weight=10, AUTH+WEB) cia_a 79.9% 4.9% 8.2% 7.0% 20.1% (weight=10, all 3) Each biased selector shifts ~3.5–4× over baseline on its target role set. The cleanest signal is cia_c vs cia_i: cia_c heavy on DB (11.1%) and almost nothing on WEB (0.4%); cia_i flips that (DB 1.2%, WEB 7.8%) — exactly the selector spec.

4 slow parity tests build a ScenarioEnvState directly from a CybORG seed via build_const_from_cyborg + _init_red_state, then feed it to env.step(). After the FsmRedEnvState refactor, that path skipped extras_factory and tripped ``AttributeError: 'ScenarioEnvState' object has no attribute 'extras'``. Add FsmRedCC4Env.wrap_scenario_state(env_state, key=None) that synthesizes extras via the env's own factory and returns the proper FsmRedEnvState. Update the two test fixtures (jax_env_from_cyborg, jax_fsm_from_cyborg) to use it. Verified the 4 failures pass locally: - test_fsm_red_env_differential::test_random_blue_reward_distribution - test_fsm_red_env_differential::test_sleep_blue_cumulative_reward_same_sign - test_reward_comparison::test_sleep_baseline_both_nonpositive - test_reward_comparison::test_returns_are_finite

Reorder imports + format long print() in scripts/dev/check_red_bias.py to satisfy ruff check + format.

Episodes don't share state, so the serial loop in evaluate_on_cyborg / evaluate_jax_on_cyborg leaves all but 1 of N CPUs idle. Add a workers arg to both runners that fans out (idx, seed) chunks across a ProcessPoolExecutor with mp_context='spawn'; each worker loads the checkpoint once and runs its slice. eval_recipe.py exposes --workers (default cpu_count() - 2). Single-process path preserved at workers=1. 300-episode comparison drops from ~30 min to ~3 min on a 64-core box.

Two bugs surfaced when comparing default vs resilience checkpoints: - jax_runner.make_env hardcoded FiniteStateRedAgent; ignored recipe. eval_recipe.py only routed eval_cfg['red_agent'] through the torch branch. So a JAX-trained policy with eval.red_agent=c was silently evaluated against finite_state. - Four call sites carried near-identical \_red_classes dispatch tables (cyborg_runner, jax_runner, ippo_cyborg, cc4_trajectory_eval) that drifted (one supported c/i/a, three didn't). Add jaxborg.evaluation.cyborg_red_dispatch.cyborg_red_class as the single dispatch — finite_state/fsm, sleep, resilience, plus c/cia_c, i/cia_i, a/cia_a → CRedAgent/IRedAgent/ARedAgent (existing CybORG-side classes). Unknown names raise instead of silently falling back to finite_state. Replace inline dispatch in all four sites; eval_recipe.py passes red_agent + target_weight to evaluate_jax_on_cyborg. Verified: resilience checkpoint now evaluates against c per its recipe sidecar (-1647.6 ± 546.5, n=300) vs default against finite_state (-1758.7 ± 685.6, n=300).

…nated Restores Dena's original PR #11 intent: each episode randomly picks 3 of the operational-zone server hostnames to tag as auth/db/web, instead of always pinning roles to the lowest-3-sorted hostnames. The policy trains against a moving role map and learns position-agnostic defense. JAX side: - assign_resilience_roles_from_const(const, key) takes an optional key. None: deterministic-by-sort (replay / tests). Key: jax.random.uniform noise drives a candidate-host shuffle, first 3 → AUTH/DB/WEB. - The 'resilience' extras_factory now passes the per-episode key_extras through (previously discarded), so each env.reset(key) gets a fresh random role map. CybORG side (formerly index-mod-3 across all op-zone servers): - ResilienceRedAgent + CIA subclasses now hold a per-episode role map (set_role_map / _ensure_role_map). _CIARedAgent._target_roles uses the canonical ROLE_AUTH / DB / WEB constants. - inject_role_map(env, ep_seed) builds the role map deterministically from ep_seed + the env's full host list and pushes it into every ResilienceRedAgent in the env. This makes the map global to the episode — all 6 red agents bias toward the same 3 hosts and the trajectory recorder writes the matching map for the scorer. - All call sites that reset CybORG envs now inject after every reset: cc4_trajectory_eval (eval recording), env_worker in ippo_cyborg (CybORG training), and the cyborg/jax eval runners. Smoke-verified: 6 red agents per episode all share the same _role_map after inject_role_map. Different ep_seed → different map; same ep_seed reproduces. JAX side: 5 distinct maps for 5 keys.

Dropped: - assign_resilience_roles(hostnames, rng=None) deterministic-by-sort branch → rng is now required - assign_resilience_roles_from_const(const, key=None) deterministic branch → key is now required - ResilienceRedAgent._ensure_role_map lazy fallback — set_role_map is the only path. inject_role_map is called after every env.reset() in every real call site (training + eval + recorder) - TYPE_CHECKING guard on SimulatorConst — there's no circular import to defend against, just import directly The tests previously pinned the deterministic-by-sort behavior; rewrote them to pin the random-with-rng contract: same rng → same map, varies across seeds, input-order invariant, handles <3 candidates gracefully.

PaulHax mentioned this pull request May 6, 2026

Resilience CIA metric + biased red FSM (rebased + cleanups) #14

Merged

PaulHax force-pushed the pr-11-refactor branch 4 times, most recently from f6ea0d8 to 5c8f82c Compare May 7, 2026 13:36

PaulHax mentioned this pull request May 8, 2026

feat: GameVariant encapsulation — variant-driven CC4 env construction #17

Merged

4 tasks

PaulHax changed the base branch from pr-11-rebase to main May 8, 2026 18:50

PaulHax added 12 commits May 8, 2026 14:53

style: ruff fixes for dev scripts

8f786cf

Reorder imports + format long print() in scripts/dev/check_red_bias.py to satisfy ruff check + format.

PaulHax force-pushed the pr-11-refactor branch 2 times, most recently from 47d5628 to 9908fd1 Compare May 8, 2026 19:05

PaulHax merged commit e5cf25f into main May 8, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pluggable red selector + extras factory (architecture follow-up)#15

Pluggable red selector + extras factory (architecture follow-up)#15
PaulHax merged 12 commits into
mainfrom
pr-11-refactor

PaulHax commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PaulHax commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

How roles are picked

Commits

What "next biased red" looks like under this design

New tests

Validation

Merge order

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PaulHax commented May 6, 2026 •

edited

Loading