Skip to content

test(parity): replace retired xfail tests with sync-green harness cov…#22

Merged
PaulHax merged 1 commit into
mainfrom
worktree-remove-xfail-tests
May 15, 2026
Merged

test(parity): replace retired xfail tests with sync-green harness cov…#22
PaulHax merged 1 commit into
mainfrom
worktree-remove-xfail-tests

Conversation

@PaulHax

@PaulHax PaulHax commented May 15, 2026

Copy link
Copy Markdown
Collaborator

…erage

Remove three xfail tests in test_fsm_red_env_differential.py that depended on retired CybORG green/replay tape infrastructure. Their first two checks (red_4 known-hosts parity and red_4 action-selection parity) are already covered by test_red_policy_matches_cyborg_multistep across 200 steps x 5 seeds. The third (end-state host_compromised/red_privilege parity under FSM red + green phish) had no equivalent — existing green-sync tests use SleepAgent for red, so no exploit/privesc chains fire.

Add TestFsmRedGreenSyncParity::test_no_critical_state_diffs_over_10_steps which closes that gap via CC4DifferentialHarness(FSM red + EnterpriseGreen

  • sync_green_rng=True) and asserts at least one privesc fired so the test can't pass on a degenerate trajectory. Also adds seed=0 to the existing red_policy_parity parametrize to preserve the original tests' seed.

…erage

Remove three xfail tests in test_fsm_red_env_differential.py that depended
on retired CybORG green/replay tape infrastructure. Their first two checks
(red_4 known-hosts parity and red_4 action-selection parity) are already
covered by test_red_policy_matches_cyborg_multistep across 200 steps x 5
seeds. The third (end-state host_compromised/red_privilege parity under
FSM red + green phish) had no equivalent — existing green-sync tests use
SleepAgent for red, so no exploit/privesc chains fire.

Add TestFsmRedGreenSyncParity::test_no_critical_state_diffs_over_10_steps
which closes that gap via CC4DifferentialHarness(FSM red + EnterpriseGreen
+ sync_green_rng=True) and asserts at least one privesc fired so the test
can't pass on a degenerate trajectory. Also adds seed=0 to the existing
red_policy_parity parametrize to preserve the original tests' seed.
@PaulHax PaulHax merged commit c6ad860 into main May 15, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant