test(parity): replace retired xfail tests with sync-green harness cov…#22
Merged
Conversation
…erage Remove three xfail tests in test_fsm_red_env_differential.py that depended on retired CybORG green/replay tape infrastructure. Their first two checks (red_4 known-hosts parity and red_4 action-selection parity) are already covered by test_red_policy_matches_cyborg_multistep across 200 steps x 5 seeds. The third (end-state host_compromised/red_privilege parity under FSM red + green phish) had no equivalent — existing green-sync tests use SleepAgent for red, so no exploit/privesc chains fire. Add TestFsmRedGreenSyncParity::test_no_critical_state_diffs_over_10_steps which closes that gap via CC4DifferentialHarness(FSM red + EnterpriseGreen + sync_green_rng=True) and asserts at least one privesc fired so the test can't pass on a degenerate trajectory. Also adds seed=0 to the existing red_policy_parity parametrize to preserve the original tests' seed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…erage
Remove three xfail tests in test_fsm_red_env_differential.py that depended on retired CybORG green/replay tape infrastructure. Their first two checks (red_4 known-hosts parity and red_4 action-selection parity) are already covered by test_red_policy_matches_cyborg_multistep across 200 steps x 5 seeds. The third (end-state host_compromised/red_privilege parity under FSM red + green phish) had no equivalent — existing green-sync tests use SleepAgent for red, so no exploit/privesc chains fire.
Add TestFsmRedGreenSyncParity::test_no_critical_state_diffs_over_10_steps which closes that gap via CC4DifferentialHarness(FSM red + EnterpriseGreen