Pluggable red selector + extras factory (architecture follow-up)#15
Merged
Conversation
f6ea0d8 to
5c8f82c
Compare
4 tasks
A RedSelector is a callable wired into FsmRedCC4Env that picks red actions each step. All selectors share one signature so they're interchangeable — recipes pick a selector by name via make_red_selector, no env subclassing. The four hand-rolled CIA selectors in resilience_red_fsm.py (resilience/c/i/a) collapse into one role_biased_selector factory call parameterised by target_roles, target_weight, and an optional FSM action- prob matrix override. Adding "target the database tier" or "ignore user hosts" becomes one REGISTRY entry — no env class, no env wrapper. The selector signature always takes host_resilience_role; selectors that don't care (e.g. fsm_selector) ignore it. Keeps the wrapper env unaware of which selectors need which extras. This commit only adds the module; FsmRedCC4Env is still the old monolithic class. The next commit migrates the env wrapper.
…terised env FsmRedCC4Env now takes a red_selector (default vanilla FSM) and an extras_factory (default empty roles) at construction time. Any biased red agent — resilience, CIA-targeted, future variants — is just a different (selector, extras_factory) pair, not a different class. ResilienceRedCC4Env survives as a thin compat shim over make_fsm_red_env so existing imports keep working. ResilienceEnvState is aliased to the new FsmRedEnvState. State shape: FsmRedEnvState extends ScenarioEnvState's flat layout (state, const) plus an extras dict. Existing callers reading env_state.state / env_state.const are unaffected; only the type changed. The only extras key today is host_resilience_role, defaulting to zeros so role-biased selectors degrade gracefully when no factory is supplied. Tests: tests/test_fsm_red_env.py asserts isinstance against the new FsmRedEnvState; tests/subsystems and tests/test_cc4_env all pass.
Both env-construction sites in ippo_jax (the make_train builder and the top-level probe for action_dim) collapse from a 5-branch if/elif/else into one make_fsm_red_env call. The recipe's RED_AGENT name flows directly to the selector registry; RESILIENCE_MODE is honoured as a fallback so old recipes keep working without modification. Removes the FsmRedCC4Env / ResilienceRedCC4Env import distinction.
…ctors The user has explicitly accepted the breaking API change — no callers outside this branch import the deleted symbols, so there is no reason to keep shims. Deleted: - parity/resilience_red_env.py (the ResilienceRedCC4Env class is now make_fsm_red_env) - scenarios/cc4/resilience_red_fsm.py (the 4 hand-rolled selectors collapsed into role_biased_selector, the helpers move to red_selectors.py) - scenarios/cc4/resilience_topology.py (build_resilience_topology was unused; _assign_resilience_roles moves to topology_roles as the public assign_resilience_roles_from_const) Result: - One canonical role-assignment module: scenarios/cc4/topology_roles.py exports both flavours (hostname-list for the Python recorder + CybORG side, const-based for JAX). They use the same lowest-3-sorted convention so they agree on every input. - One canonical selector module: scenarios/cc4/red_selectors.py with the Protocol-style RedSelector callable, the role_biased_selector factory, the REGISTRY, and all the internal FSM machinery in one place. - One env class: parity/fsm_red_env.py with red_selector + extras_factory injection points. The make_fsm_red_env helper does the recipe-name dispatch. Deleted RESILIENCE_ROLE_* aliases — ROLE_AUTH/DB/WEB/NONE in topology_roles are the canonical constants. CybORG-side mirror docstrings updated. Tests still green: tests/test_red_selectors.py (13 incl. one slow E2E), tests/test_resilience_roles.py (6), tests/test_fsm_red_env.py (9 slow), tests/test_cc4_env.py + tests/subsystems/* (~830 fast).
Per the naming convention discussion: "cc4" should mean the unmodified upstream game (specific topology + FiniteStateRedAgent + BlueRewardMachine), not "any tool that happens to live in this repo". Tools that work across topologies / red agents shouldn't carry the cc4 brand. Renamed: - scripts/eval/cc4_aggregate_cia.py -> aggregate_cia.py - scripts/eval/cc4_score_trajectories.py -> score_trajectories.py These are pure trajectory-format consumers — they don't know or care which game produced the JSONL, only that it follows the documented schema. Kept cc4_ prefix on: - cc4_trajectory_eval.py — uses EnterpriseScenarioGenerator / EnterpriseMAE, hardcodes EPISODE_LENGTH=500 / NUM_AGENTS=5 / OBS_DIM=210 / ACT_DIM=242. Bound to CC4 by construction. A future "record_trajectories.py" that takes a scenario factory would supersede it. Updated cross-references in: - README usage example - recipes/resilience.yaml comment - cc4_trajectory_eval.py docstring
scripts/dev/check_red_bias.py rolls out short episodes with each registered selector under a sleep-blue policy and reports what fraction of red attacks land on hosts of each role (NONE / AUTH / DB / WEB). Validates that the registry-based architecture preserves the per-selector bias semantics PR #11 specified. Run output (3 episodes × 30 steps × 5 selectors, all under role_assignment='resilience' for an apples-to-apples baseline): selector NONE AUTH DB WEB tagged% fsm 95.1% 3.3% 0.8% 0.8% 4.9% (uniform baseline) resilience 82.8% 6.1% 5.7% 5.3% 17.2% (weight=5, all 3) cia_c 83.2% 5.3% 11.1% 0.4% 16.8% (weight=10, AUTH+DB) cia_i 82.8% 8.2% 1.2% 7.8% 17.2% (weight=10, AUTH+WEB) cia_a 79.9% 4.9% 8.2% 7.0% 20.1% (weight=10, all 3) Each biased selector shifts ~3.5–4× over baseline on its target role set. The cleanest signal is cia_c vs cia_i: cia_c heavy on DB (11.1%) and almost nothing on WEB (0.4%); cia_i flips that (DB 1.2%, WEB 7.8%) — exactly the selector spec.
4 slow parity tests build a ScenarioEnvState directly from a CybORG seed via build_const_from_cyborg + _init_red_state, then feed it to env.step(). After the FsmRedEnvState refactor, that path skipped extras_factory and tripped ``AttributeError: 'ScenarioEnvState' object has no attribute 'extras'``. Add FsmRedCC4Env.wrap_scenario_state(env_state, key=None) that synthesizes extras via the env's own factory and returns the proper FsmRedEnvState. Update the two test fixtures (jax_env_from_cyborg, jax_fsm_from_cyborg) to use it. Verified the 4 failures pass locally: - test_fsm_red_env_differential::test_random_blue_reward_distribution - test_fsm_red_env_differential::test_sleep_blue_cumulative_reward_same_sign - test_reward_comparison::test_sleep_baseline_both_nonpositive - test_reward_comparison::test_returns_are_finite
Reorder imports + format long print() in scripts/dev/check_red_bias.py to satisfy ruff check + format.
Episodes don't share state, so the serial loop in evaluate_on_cyborg / evaluate_jax_on_cyborg leaves all but 1 of N CPUs idle. Add a workers arg to both runners that fans out (idx, seed) chunks across a ProcessPoolExecutor with mp_context='spawn'; each worker loads the checkpoint once and runs its slice. eval_recipe.py exposes --workers (default cpu_count() - 2). Single-process path preserved at workers=1. 300-episode comparison drops from ~30 min to ~3 min on a 64-core box.
Two bugs surfaced when comparing default vs resilience checkpoints: - jax_runner.make_env hardcoded FiniteStateRedAgent; ignored recipe. eval_recipe.py only routed eval_cfg['red_agent'] through the torch branch. So a JAX-trained policy with eval.red_agent=c was silently evaluated against finite_state. - Four call sites carried near-identical \_red_classes dispatch tables (cyborg_runner, jax_runner, ippo_cyborg, cc4_trajectory_eval) that drifted (one supported c/i/a, three didn't). Add jaxborg.evaluation.cyborg_red_dispatch.cyborg_red_class as the single dispatch — finite_state/fsm, sleep, resilience, plus c/cia_c, i/cia_i, a/cia_a → CRedAgent/IRedAgent/ARedAgent (existing CybORG-side classes). Unknown names raise instead of silently falling back to finite_state. Replace inline dispatch in all four sites; eval_recipe.py passes red_agent + target_weight to evaluate_jax_on_cyborg. Verified: resilience checkpoint now evaluates against c per its recipe sidecar (-1647.6 ± 546.5, n=300) vs default against finite_state (-1758.7 ± 685.6, n=300).
…nated Restores Dena's original PR #11 intent: each episode randomly picks 3 of the operational-zone server hostnames to tag as auth/db/web, instead of always pinning roles to the lowest-3-sorted hostnames. The policy trains against a moving role map and learns position-agnostic defense. JAX side: - assign_resilience_roles_from_const(const, key) takes an optional key. None: deterministic-by-sort (replay / tests). Key: jax.random.uniform noise drives a candidate-host shuffle, first 3 → AUTH/DB/WEB. - The 'resilience' extras_factory now passes the per-episode key_extras through (previously discarded), so each env.reset(key) gets a fresh random role map. CybORG side (formerly index-mod-3 across all op-zone servers): - ResilienceRedAgent + CIA subclasses now hold a per-episode role map (set_role_map / _ensure_role_map). _CIARedAgent._target_roles uses the canonical ROLE_AUTH / DB / WEB constants. - inject_role_map(env, ep_seed) builds the role map deterministically from ep_seed + the env's full host list and pushes it into every ResilienceRedAgent in the env. This makes the map global to the episode — all 6 red agents bias toward the same 3 hosts and the trajectory recorder writes the matching map for the scorer. - All call sites that reset CybORG envs now inject after every reset: cc4_trajectory_eval (eval recording), env_worker in ippo_cyborg (CybORG training), and the cyborg/jax eval runners. Smoke-verified: 6 red agents per episode all share the same _role_map after inject_role_map. Different ep_seed → different map; same ep_seed reproduces. JAX side: 5 distinct maps for 5 keys.
Dropped: - assign_resilience_roles(hostnames, rng=None) deterministic-by-sort branch → rng is now required - assign_resilience_roles_from_const(const, key=None) deterministic branch → key is now required - ResilienceRedAgent._ensure_role_map lazy fallback — set_role_map is the only path. inject_role_map is called after every env.reset() in every real call site (training + eval + recorder) - TYPE_CHECKING guard on SimulatorConst — there's no circular import to defend against, just import directly The tests previously pinned the deterministic-by-sort behavior; rewrote them to pin the random-with-rng contract: same rng → same map, varies across seeds, input-order invariant, handles <3 candidates gracefully.
47d5628 to
9908fd1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #14. Reviewing this PR shows only the architecture delta — the rebased resilience work it depends on is in the parent PR.
What it does
Collapses the multi-class, hand-rolled red-agent surface into a single parameterised env + a registry of selectors. The next biased-red PR becomes ~200 lines instead of ~1300.
How roles are picked
Each episode picks 3 of the operational-zone server hostnames at random (out of ~6) and tags them
auth/db/web. The resilience metric scores impact actions against those 3; the red bias points at those 3 (or a CIA subset of them). Every other host — including untagged op-zone servers — is unbiased and unscored that episode.Same
(env_seed)reproduces the same map. Different seeds → different shuffle → different 3 hosts get the 3 roles. Over many episodes, every op-zone server gets exposure to every role, so the policy can't memorise "host_0 is always auth."Coordination:
extras_factory(key, const)callsassign_resilience_roles_from_const(const, key)at every reset;host_resilience_rolerides in env state and gets read directly by the selector.inject_role_map(env, ep_seed)after everyenv.reset(), which builds the map from the env's full host list and pushes it into everyResilienceRedAgentviaset_role_map. Wired in training (ippo_cyborg.env_worker), eval (cyborg_runner/jax_runner), and the trajectory recorder (cc4_trajectory_eval).This restores @Dmujt's original PR #11 intent ("each episode randomly assigns three Operational Zone servers to auth/db/web roles") which had drifted to deterministic-by-sort during the rebase, and unifies the JAX side and CybORG side onto a single role-assignment rule (the old CybORG
index mod 3is gone).Commits
01ac145feat: pluggable red action selectors with name-based registry. Newscenarios/cc4/red_selectors.pywithRedSelectorProtocol-style callable,role_biased_selectorfactory,REGISTRY(fsm/resilience/cia_a/cia_i/cia_c plus aliases), andmake_red_selector(name, **kwargs)recipe-friendly entry.07a1cf6refactor: collapse FsmRedCC4Env + ResilienceRedCC4Env into one parameterised env.FsmRedCC4Envnow takesred_selector+extras_factory.FsmRedEnvStatekeeps the flatstate/constlayout (existing callers untouched) plus a stable-shapeextrasdict.4e6567brefactor: ippo_jax uses make_fsm_red_env, dropping the if/elif ladder. 13-line if/elif/else × 2 sites → 1 line × 2 sites.33c7703refactor: drop back-compat shims; consolidate FSM helpers in red_selectors. Deletesparity/resilience_red_env.py,scenarios/cc4/resilience_red_fsm.py,scenarios/cc4/resilience_topology.py. Moves JAX role assignment intotopology_roles.pyasassign_resilience_roles_from_const.ded1e70refactor: drop cc4_ prefix from generic eval scripts.cc4_aggregate_cia.pyandcc4_score_trajectories.pyare pure trajectory-format consumers — they don't know which game produced the JSONL. Renamed toaggregate_cia.py/score_trajectories.py. Keptcc4_prefix oncc4_trajectory_eval.pybecause it usesEnterpriseScenarioGenerator/EnterpriseMAEand hardcodes CC4 dims (OBS_DIM=210/ACT_DIM=242/NUM_AGENTS=5) — bound to CC4 by construction.87c5069perf(eval): parallel rollouts in eval_recipe. Episodes are independent;evaluate_on_cyborg/evaluate_jax_on_cyborgnow fan out across processes (defaultcpu_count() - 2). 300-episode comparison drops from ~30 min to ~3 min on a 64-core box.3b2f4e1fix(eval): jax_runner honors recipe.eval.red_agent + dedupe dispatch. Four call sites had drifting_red_classestables; consolidated intojaxborg.evaluation.cyborg_red_dispatch.cyborg_red_classwith properc/i/a→CRedAgent/IRedAgent/ARedAgentmapping. Unknown names now raise instead of silently falling back tofinite_state.2d129a2feat(resilience): per-episode random role assignment, globally coordinated. See "How roles are picked" above.67a9ce2simplify: drop role-assignment fallbacks. No deterministic-by-sort branch, no lazy_ensure_role_map, no TYPE_CHECKING guard. One path: caller suppliesrng/key, agents get the map viaset_role_map. -51 lines net.What "next biased red" looks like under this design
Recipe:
red_agent: database_tier. Done. No env class. No env wrapper. Noippo_jax.pychange.New tests
tests/test_red_selectors.py(13 tests): registry names, alias resolution, end-to-end env construction per registered selector, default-extras-zero invariant.tests/test_resilience_roles.py(8 tests): pinned to the random-with-rng contract — same rng → same map, varies across seeds, input-order invariant, handles <3 candidates.Validation
uv run ruff checkcleantest_fsm_red_env+ 13 selector contract tests + 8 role-assignment tests + 21 recipe smoke tests, all greenMerge order
This PR's base is the parent PR's branch; merge #14 first, then GitHub auto-retargets this to main.