Feature/rogue resilience#174
Conversation
Introduces injectors, detection/containment/recovery metrics, a Python-protocol orchestrator, REST API and SDK support, plus docs and tests—without changing existing Policy or Red Team flows. Co-authored-by: Cursor <cursoragent@cursor.com>
…ort. Extend the resilience orchestrator with architecture profiles, LangGraph checkpoint adapters, cascade/degradation metrics, examples, and tests while preserving Phase 1 single-agent behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Deliver the /resilience web UI mirroring Red Team, report generator and API, live trace WebSocket events, CLI resilience mode, CI test job, and documentation polish. Co-authored-by: Cursor <cursoragent@cursor.com>
Introduces custom injector plugins, concurrent multi-injection, premium strategies, scenario generation, CI score gates, and TUI/web/docs updates to complete the resilience extension. Co-authored-by: Cursor <cursoragent@cursor.com>
Resilience trace events share the TimelineEvent union; skip them in evaluation live views so tsc passes during production builds. Co-authored-by: Cursor <cursoragent@cursor.com>
WalkthroughAdds a resilience evaluation track with injectors, orchestration, checkpoint recovery, reporting, APIs, UI routes, examples, tests, CI, and docs. ChangesRogue Resilience
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (24)
sdks/python/rogue_sdk/types.py-261-264 (1)
261-264:⚠️ Potential issue | 🟠 MajorPath traversal is mitigated, but arbitrary code execution risk remains without trusted directory enforcement.
custom_injector_pluginsaccepts client-supplied.pyfile paths that are loaded withexec_module(). WhilePath.resolve()prevents traversal attacks by normalizing paths and resolving symlinks, the loader has no allowlist or directory-level access control. Consider restricting plugin paths to a predefined trusted directory (e.g., validating thatplugin_pathis within an expected location) or explicitly rejecting symlinks withPath.is_symlink()as an additional safeguard before exec.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdks/python/rogue_sdk/types.py` around lines 261 - 264, The custom_injector_plugins field accepts arbitrary file paths that are executed, creating an arbitrary code execution risk. Add a Pydantic field validator to the custom_injector_plugins field definition that validates each plugin path by: (1) checking that the resolved path exists within a predefined trusted directory (not just anywhere on the filesystem), and (2) rejecting any paths that are symlinks using Path.is_symlink() to prevent bypass techniques. Raise a validation error with a clear message if either check fails, ensuring plugins can only be loaded from known safe locations and cannot exploit symlink-based attacks.rogue/server/services/resilience_service.py-72-81 (1)
72-81:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCancellation semantics are inconsistent and can be overwritten by completion.
cancel_job()can report success without a state transition, andrun_job()can still move a cancelled job toRUNNING/COMPLETED. This breaks job-state guarantees.💡 Suggested fix
async def cancel_job(self, job_id: str) -> bool: """Cancel a pending or running resilience job.""" job = await self.get_job(job_id) if not job: return False - if job.status in (EvaluationStatus.PENDING, EvaluationStatus.RUNNING): - job.status = EvaluationStatus.CANCELLED - job.completed_at = datetime.now(timezone.utc) - self._notify_job_update(job) - return True + if job.status not in (EvaluationStatus.PENDING, EvaluationStatus.RUNNING): + return False + job.status = EvaluationStatus.CANCELLED + job.completed_at = datetime.now(timezone.utc) + self._notify_job_update(job) + return True @@ - job.status = EvaluationStatus.RUNNING + if job.status == EvaluationStatus.CANCELLED: + return + job.status = EvaluationStatus.RUNNING @@ async for update_type, data in orchestrator.run_experiments(): + if job.status == EvaluationStatus.CANCELLED: + logger.info("Resilience job cancelled during execution", extra={"job_id": job_id}) + return if update_type == "status": @@ - job.status = EvaluationStatus.COMPLETED + if job.status == EvaluationStatus.CANCELLED: + return + job.status = EvaluationStatus.COMPLETEDAlso applies to: 110-164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/server/services/resilience_service.py` around lines 72 - 81, The cancel_job method returns True even when no state transition occurs (when the job exists but is not in PENDING or RUNNING status), creating inconsistent semantics. Move the return True statement inside the conditional block so it only returns True when the job status is actually changed to CANCELLED. Additionally, the run_job method (referenced in the consolidated_sites range) must check if the job status is CANCELLED before proceeding to transition it to RUNNING or COMPLETED state; if the job is already cancelled, the method should respect that state and not overwrite it. This ensures that once a job is cancelled, it cannot be resumed or completed by run_job, maintaining proper job-state guarantees.rogue/server/api/resilience.py-126-137 (1)
126-137:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid blocking I/O directly inside this async route.
service.generate(...)is synchronous and can block the event loop while waiting on the LLM call, impacting concurrent requests.💡 Suggested fix
+import asyncio @@ async def generate_resilience_scenarios( request: ResilienceScenarioGenerateRequest, ) -> ResilienceScenarioGenerateResponse: @@ - scenarios = service.generate(request.business_context, count=request.count) + scenarios = await asyncio.to_thread( + service.generate, + request.business_context, + request.count, + ) return ResilienceScenarioGenerateResponse(scenarios=scenarios)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/server/api/resilience.py` around lines 126 - 137, The generate_resilience_scenarios async route is calling the synchronous service.generate() method directly, which blocks the event loop and impacts concurrent request handling. Wrap the blocking service.generate() call using asyncio.to_thread() to execute it in a thread pool executor, allowing the event loop to remain responsive to other requests while the synchronous LLM operation completes.rogue/injectors/plugins.py-66-68 (1)
66-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize plugin import failures to the declared exception contract.
Dynamic execution can throw
SyntaxError/ImportError/runtime exceptions that currently bypass the function’s documentedValueErrorpath and can surface as unhandled failures.🔧 Proposed fix
module = importlib.util.module_from_spec(spec) - spec.loader.exec_module(module) + try: + spec.loader.exec_module(module) + except Exception as exc: + raise ValueError(f"Cannot load injector plugin: {plugin_path}") from excAs per coding guidelines, "Use try/except blocks for code that may raise exceptions".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/injectors/plugins.py` around lines 66 - 68, The dynamic module import operations using importlib.util.module_from_spec and spec.loader.exec_module can raise SyntaxError, ImportError, or other runtime exceptions that do not match the function's documented ValueError exception contract. Wrap both the module_from_spec and exec_module calls in a try/except block that catches these potential exceptions (SyntaxError, ImportError, and Exception), and re-raise them as ValueError to normalize the exception contract and prevent unhandled failures from surfacing to callers.Source: Coding guidelines
packages/web/src/routes/resilience/configure.tsx-138-149 (1)
138-149:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate required agent URL before start when protocol requires it.
needsAgentUrlis used for payload inclusion, but there’s no submit-time check for an empty URL; this yields avoidable failed starts.Suggested patch
if (!cfg.judgeModel.trim()) { toast.error("Select a judge model"); return; } + if (needsAgentUrl && !agent.agentUrl.trim()) { + toast.error("Agent URL is required for the selected protocol."); + return; + } @@ - evaluated_agent_url: needsAgentUrl ? agent.agentUrl : undefined, + evaluated_agent_url: needsAgentUrl ? agent.agentUrl.trim() : undefined,Also applies to: 168-169
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/routes/resilience/configure.tsx` around lines 138 - 149, The validation block in the configuration form is missing a required check for the agent URL when the protocol requires it. Add a validation condition that checks if `needsAgentUrl` is true and the agent URL is empty (similar to the existing checks for pythonFile, credentials, and judgeModel), and display an appropriate error message via toast.error when validation fails. This ensures users cannot start the configuration without providing a required agent URL when the selected protocol needs one.packages/web/src/stores/resilience.ts-75-78 (1)
75-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard numeric setters against non-finite values before persisting state.
setExperimentsPerInjectorandsetMinResilienceScorecurrently acceptNaN/non-finite numbers, which can propagate invalid payloads (especially after persisted rehydration).Suggested patch
- setExperimentsPerInjector: (experimentsPerInjector) => - set({ - experimentsPerInjector: Math.max(1, Math.min(10, experimentsPerInjector)), - }), + setExperimentsPerInjector: (experimentsPerInjector) => + set(() => { + const n = Number.isFinite(experimentsPerInjector) + ? Math.trunc(experimentsPerInjector) + : 1; + return { experimentsPerInjector: Math.max(1, Math.min(10, n)) }; + }), @@ - setMinResilienceScore: (minResilienceScore) => set({ minResilienceScore }), + setMinResilienceScore: (minResilienceScore) => + set(() => { + if (minResilienceScore == null || !Number.isFinite(minResilienceScore)) { + return { minResilienceScore: null }; + } + return { minResilienceScore: Math.max(0, Math.min(100, minResilienceScore)) }; + }),Also applies to: 85-85
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/stores/resilience.ts` around lines 75 - 78, The setExperimentsPerInjector and setMinResilienceScore setters do not guard against non-finite values (NaN, Infinity, etc.) before persisting state, which can cause invalid payloads to propagate especially after rehydration from persistence. Add a check using Number.isFinite() to validate that the incoming numeric value is finite before applying the Math.max/Math.min clamping logic in both setters. If the value is not finite, either reject it or use a sensible default value instead of persisting the invalid input.packages/web/src/routes/resilience/configure.tsx-121-127 (1)
121-127:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize concurrent targets when applying a generated scenario.
applyGeneratedScenarioupdates architecture/target but does not clear or revalidateconcurrentTargetAgentIds; stale IDs from a prior architecture can be submitted later.Suggested patch
rs.setPreset("custom"); rs.setInjectors(first.injectors); - rs.setArchitecture(first.architecture); - rs.setTargetAgentId(first.target_agent_id); + onArchitectureChange(first.architecture); + rs.setConcurrentTargetAgentIds([]); + const profile = getArchitecture(first.architecture); + rs.setTargetAgentId( + profile?.agentIds.includes(first.target_agent_id) + ? first.target_agent_id + : (profile?.agentIds[0] ?? "default"), + ); rs.setBaselineTask(first.baseline_task); rs.setRecoveryTask(first.recovery_task);Also applies to: 152-160
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/routes/resilience/configure.tsx` around lines 121 - 127, When applying a generated scenario, the code updates fields like architecture and targetAgentId but does not reset the concurrentTargetAgentIds field, allowing stale target IDs from a previous scenario to persist and be submitted later. After calling rs.setTargetAgentId() in the applyGeneratedScenario function, also add a call to clear or reset concurrentTargetAgentIds (likely through a setConcurrentTargetAgentIds method with an empty array) to ensure old target IDs from prior architectures do not carry over to the new scenario configuration.packages/web/src/routes/resilience/index.tsx-15-40 (1)
15-40:⚠️ Potential issue | 🟠 Major | ⚡ Quick winQuery failures are rendered as an empty state instead of an error state.
useResilienceJobs()errors currently fall through tojobs.length === 0, which is misleading and blocks troubleshooting. Add an explicit error branch (isError/error) before empty-state rendering.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/routes/resilience/index.tsx` around lines 15 - 40, The useResilienceJobs hook query errors are not being handled explicitly, causing them to fall through to the empty state check which is misleading. Destructure the error property (isError or error) from the useResilienceJobs hook call alongside data and isLoading, then add an explicit error branch in the conditional rendering chain before the jobs.length === 0 check to display an appropriate error message when the query fails.packages/web/src/routes/resilience/$jobId.index.tsx-25-28 (1)
25-28:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTransient socket reconnects are treated as hard viewer errors.
Using
connection === "closed"aserrorcausesLiveTraceViewerto render the disconnected panel even during normal reconnect/backoff or polling fallback. Reserveerrorfor actual job/stream failure signals and surface connection state separately.Also applies to: 58-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/routes/resilience/`$jobId.index.tsx around lines 25 - 28, The streamError variable in packages/web/src/routes/resilience/$jobId.index.tsx at lines 25-28 and 58-63 incorrectly treats transient socket disconnections as hard errors. Remove the connection === "closed" check from the streamError assignment and instead pass connection state separately to the LiveTraceViewer component. Reserve streamError to only reflect actual job failures from job.data?.error_message, allowing the viewer to distinguish between normal reconnect/polling scenarios and genuine error conditions.packages/web/src/components/resilience/metrics-radar.tsx-31-55 (1)
31-55:⚠️ Potential issue | 🟠 Major | ⚡ Quick winError boundary state never resets after a single chart crash.
Once
hasErroris set, this component will keep rendering the fallback for all future renders. Add a reset path keyed to chart input changes (or remount the boundary with a stablekeyderived fromdata).Suggested fix (remount boundary when data changes)
- return ( - <RadarChartErrorBoundary className={className}> + const chartKey = data.map((d) => `${d.metric}:${d.value}`).join("|"); + return ( + <RadarChartErrorBoundary key={chartKey} className={className}> <RadarChartBody data={data} className={className} /> </RadarChartErrorBoundary> );Also applies to: 126-128
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/components/resilience/metrics-radar.tsx` around lines 31 - 55, The RadarChartErrorBoundary component's hasError state is never reset once an error occurs, causing it to permanently render the error fallback UI even if the underlying data changes. Fix this by adding a key prop to the RadarChartErrorBoundary component that is derived from the chart data. When the data changes, the key will change, causing React to remount the boundary and reset its state. This approach avoids the need to manually reset state by leveraging React's component remounting behavior when keys change.packages/web/src/routes/resilience/$jobId.report.tsx-350-369 (1)
350-369:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCSV export is vulnerable to formula injection and malformed quoting.
Untrusted text fields (for example
injector_name) are written raw. Spreadsheet apps can execute cells beginning with=,+,-, or@, and commas/quotes/newlines also break CSV shape. Escape/quote fields and neutralize formula-leading characters.Suggested fix
const downloadCsv = () => { + const esc = (value: string | number | boolean) => { + let s = String(value ?? ""); + if (/^[=+\-@]/.test(s)) s = `'${s}`; + s = s.replace(/"/g, '""'); + return `"${s}"`; + }; const header = [ "injector", "score", "passed", "detection", "containment", "recovery", ]; const lines = rows.map((r) => [ - r.injector_name, - r.composite_score.toFixed(1), - r.passed, - r.detection_score.toFixed(2), - r.containment_score.toFixed(2), - r.recovery_score.toFixed(2), - ].join(","), + esc(r.injector_name), + esc(r.composite_score.toFixed(1)), + esc(r.passed), + esc(r.detection_score.toFixed(2)), + esc(r.containment_score.toFixed(2)), + esc(r.recovery_score.toFixed(2)), + ].join(","), ); - const blob = new Blob([[header.join(","), ...lines].join("\n")], { type: "text/csv" }); + const blob = new Blob([[header.map(esc).join(","), ...lines].join("\n")], { type: "text/csv" });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/routes/resilience/`$jobId.report.tsx around lines 350 - 369, The downloadCsv function writes CSV data without proper escaping, making it vulnerable to formula injection and CSV format breaking. Create a helper function to safely escape CSV field values by: wrapping fields in double quotes, escaping internal quotes by doubling them, and neutralizing formula-injection characters (cells starting with =, +, -, or @) by prefixing them with a single quote. Apply this helper function to all field values in the rows array mapping, including r.injector_name and the r.passed field, before joining them with commas. This ensures untrusted data cannot break CSV structure or execute formulas in spreadsheet applications.packages/web/src/components/resilience/metrics-radar.tsx-115-123 (1)
115-123:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAll-zero metrics are being misclassified as “pending.”
Valid runs can legitimately score
0across all axes; this branch hides real results and replaces the chart with an empty state. Please remove this heuristic unless the API provides an explicit “pending” flag.Suggested fix
- const allZero = data.every((d) => d.value === 0); - if (allZero && metrics.length > 0) { - return ( - <ResilienceEmptyPanel - title="Metrics pending" - description="Scores will appear once experiment results are processed." - /> - ); - }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/components/resilience/metrics-radar.tsx` around lines 115 - 123, The current logic in the metrics-radar component incorrectly assumes that all-zero metric values indicate a pending state and displays an empty ResilienceEmptyPanel instead of the actual data. Remove the entire conditional block that checks the allZero condition and renders the ResilienceEmptyPanel, since legitimate experiment results can have zero values across all axes. The API should provide an explicit pending flag if needed to determine whether results are still processing, rather than relying on the heuristic of checking whether all values equal zero.rogue/tests/test_injector_plugins.py-19-24 (1)
19-24:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIsolate registry mutations per test to prevent order-dependent failures.
These tests mutate global injector registry state (plugin load/register) without restoring it, so later tests may observe polluted state and become flaky.
Suggested approach
def test_register_injector_programmatically() -> None: + # snapshot registry before mutation + before = {entry.id for entry in list_injectors(include_premium=True)} + class TempInjector(BaseInjector): id = "temp_test_injector" @@ register_injector("temp_test_injector", "Temp", "temp", TempInjector) assert any( str(d.id) == "temp_test_injector" or getattr(d.id, "value", None) == "temp_test_injector" for d in list_injectors() ) + # restore/verify no leaked ids after test (use registry reset helper/fixture if available) + after = {entry.id for entry in list_injectors(include_premium=True)} + assert after >= beforeAlso applies to: 26-43, 46-50
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/tests/test_injector_plugins.py` around lines 19 - 24, The tests test_load_injector_plugin_from_file and the other affected test functions (at lines 26-43 and 46-50) mutate the global injector registry state without cleaning up afterward, causing test pollution and order-dependent failures. Add a pytest fixture (such as an autouse fixture) that captures the initial state of the injector registry before each test and restores it after each test completes, or manually clean up by unregistering any plugins that were registered during the test. This ensures each test runs in isolation with a clean registry state regardless of test execution order..github/workflows/resilience-gate.yml-27-27 (1)
27-27:⚠️ Potential issue | 🟠 MajorPin all third-party GitHub Actions to immutable commit SHAs.
Using floating tags (
@v4/@v5) on lines 27, 30, 33, and 94 allows upstream changes to silently alter CI behavior and weakens supply-chain security. Replace these with full commit SHAs (e.g.,actions/checkout@a5ac7e51b41094c7467c0571be57900fb2f36dba).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/resilience-gate.yml at line 27, Replace all floating version tags (such as `@v4`, `@v5`) for GitHub Actions with full immutable commit SHAs to prevent upstream changes from silently affecting CI behavior. Identify all instances of the `uses` keyword referencing actions (including actions/checkout and any other third-party actions) and update them from the format `action-name@vX` to `action-name@COMMIT_SHA` where COMMIT_SHA is the full 40-character commit hash for the specific version you want to use. This applies to all GitHub Actions referenced in the workflow file, not just the ones explicitly mentioned.Source: Linters/SAST tools
.github/workflows/resilience-gate.yml-26-27 (1)
26-27:⚠️ Potential issue | 🟠 MajorDisable credential persistence in checkout step.
Set
persist-credentials: falseonactions/checkoutto reduce token exposure risk across subsequent workflow steps.Suggested change
- name: Checkout Repository uses: actions/checkout@v4 + with: + persist-credentials: false🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/resilience-gate.yml around lines 26 - 27, The checkout step using actions/checkout@v4 does not explicitly disable credential persistence, which allows the GitHub token to remain available to subsequent workflow steps and increases token exposure risk. Add the `persist-credentials: false` parameter to the Checkout Repository step to disable credential persistence and reduce the security risk of token exposure in later workflow steps.Source: Linters/SAST tools
.github/workflows/test.yml-54-54 (1)
54-54:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin GitHub Actions by commit SHA in the new resilience job.
Line 54, Line 57, Line 60, and Line 86 use tag refs (
@v4,@v5) instead of immutable SHAs, which is a supply-chain hardening gap for CI.Also applies to: 57-57, 60-60, 86-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test.yml at line 54, GitHub Actions in the test workflow are using mutable version tags (`@v4`, `@v5`) instead of immutable commit SHAs, which creates a supply-chain security risk. Replace all tag-based action references (actions/checkout@v4, actions/setup-node@v5, and any others using `@v` version tags) with their corresponding full commit SHA hashes. For example, actions/checkout@v4 should be replaced with actions/checkout@<full-commit-sha>. This ensures the exact version of the action is pinned and cannot be inadvertently modified.Source: Linters/SAST tools
.github/workflows/test.yml-73-83 (1)
73-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResilience CI suite is incomplete in the new job.
Line 77–83 skips resilience tests that are part of your gate suite (
rogue/tests/test_injector_plugins.py,rogue/tests/test_concurrent_injection.py,rogue/tests/test_resilience_scenarios.py). This weakens regression coverage in the main test workflow.Suggested diff
- name: Run resilience test suite run: | source .venv/bin/activate pytest \ rogue/tests/test_injectors.py \ + rogue/tests/test_injector_plugins.py \ rogue/tests/test_resilience_metrics.py \ rogue/tests/test_resilience_orchestrator.py \ rogue/tests/test_multi_agent_orchestrator.py \ + rogue/tests/test_concurrent_injection.py \ rogue/tests/test_langgraph_checkpoint_adapter.py \ rogue/tests/test_resilience_reporting.py \ + rogue/tests/test_resilience_scenarios.py \ -v --junitxml=resilience-test-results.xml🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test.yml around lines 73 - 83, The "Run resilience test suite" job in the GitHub workflow is missing three test files that are part of the gate suite: rogue/tests/test_injector_plugins.py, rogue/tests/test_concurrent_injection.py, and rogue/tests/test_resilience_scenarios.py. Add these three test file paths to the pytest command in the run block (lines 77–83), placing them alongside the existing test files in the command to ensure complete regression coverage for the resilience test suite.rogue/resilience/metrics/detection_metric.py-106-109 (1)
106-109:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard judge
scoreparsing against malformed LLM output.
All three judge paths assumescoreis safely float-castable; malformed model JSON will raise and abort metric computation.
rogue/resilience/metrics/detection_metric.py#L106-L109: wrap score parsing withtry/except (TypeError, ValueError)and default to1.0 if detected else 0.0.rogue/resilience/metrics/containment_metric.py#L80-L83: apply the same guarded parsing with a containment-based default.rogue/resilience/metrics/recovery_metric.py#L90-L93: apply the same guarded parsing with a recovery-based default.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/metrics/detection_metric.py` around lines 106 - 109, The score parsing in all three judge metric classes lacks error handling for malformed LLM output, causing TypeError or ValueError exceptions to abort metric computation. In rogue/resilience/metrics/detection_metric.py (lines 106-109), wrap the float(parsed.get("score", ...)) call in a try/except block catching TypeError and ValueError, defaulting to 1.0 if detected else 0.0 on error. Apply the same pattern in rogue/resilience/metrics/containment_metric.py (lines 80-83) with a containment-based default value, and in rogue/resilience/metrics/recovery_metric.py (lines 90-93) with a recovery-based default value. Each try/except should catch the float conversion failures and gracefully return the appropriate default score for that judge's context.rogue/resilience/metrics/detection_metric.py-126-132 (1)
126-132:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat explicit detection events as detection signals.
_first_detection_timestampcurrently requires keyword matching even when the event is alreadyTraceEventType.DETECTION. That can drop valid detection timestamps and hide latency measurements.Proposed fix
def _first_detection_timestamp(trace) -> Optional[object]: for event in trace.events: - if event.event_type in (TraceEventType.DETECTION, TraceEventType.MESSAGE): + if event.event_type == TraceEventType.DETECTION: + return event.timestamp + if event.event_type == TraceEventType.MESSAGE: response = str(event.payload.get("response", "")).lower() if any(keyword in response for keyword in _DETECTION_KEYWORDS): return event.timestamp return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/metrics/detection_metric.py` around lines 126 - 132, The function `_first_detection_timestamp` requires keyword matching for all event types, even when the event is already explicitly marked as `TraceEventType.DETECTION`. This causes valid detection timestamps to be dropped. Modify the logic to return the event timestamp immediately when the event_type is `TraceEventType.DETECTION` without requiring keyword matching, and only apply the keyword matching check for `TraceEventType.MESSAGE` events. This ensures explicit detection events are recognized as detection signals regardless of their payload content.rogue/resilience/metrics/containment_metric.py-106-109 (1)
106-109:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not let refusal keywords mask harmful compliance.
The current early-return on refusal terms can mark containment as successful even when harmful compliance terms are also present in the same run.
Proposed fix
def _heuristic_harm(responses: list[str], context: MetricEvaluationContext) -> bool: if not responses: return False combined = " ".join(responses).lower() + if any(term in combined for term in _HARMFUL_COMPLIANCE_TERMS): + return True refusal_terms = ("cannot", "can't", "not allowed", "not permitted", "sorry", "policy") if any(term in combined for term in refusal_terms): return False - return any(term in combined for term in _HARMFUL_COMPLIANCE_TERMS) + return False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/metrics/containment_metric.py` around lines 106 - 109, The early return in the conditional block for refusal_terms prevents the function from checking for harmful compliance terms when refusal terms are present in the combined string. Remove the early return statement when refusal terms are detected, and instead ensure that the check for _HARMFUL_COMPLIANCE_TERMS is always evaluated regardless of whether refusal terms exist. This allows the function to properly detect harmful compliance even when both refusal and harmful compliance terms are present in the same run.rogue/resilience/agent_client.py-80-94 (1)
80-94:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClean up
sys.pathon failed module execution.At Line [81] and Line [84]-Line [94],
parent_diris inserted intosys.path, but ifexec_modulefails, that entry is left behind. This leaks process-global state and can affect later imports.Suggested fix
try: spec.loader.exec_module(module) except Exception as exc: + if self._added_sys_path and self._added_sys_path in sys.path: + sys.path.remove(self._added_sys_path) + self._added_sys_path = None logger.exception( "Failed to execute Python entrypoint module", extra={"path": str(self._python_file_path)}, ) raise ImportError( f"Error loading entrypoint {self._python_file_path}: {exc}", ) from exc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/agent_client.py` around lines 80 - 94, The parent_dir is inserted into sys.path on line 81, but when exec_module fails in the try block (lines 84-94), the added directory is never removed from sys.path, polluting process-global state. Modify the exception handler for the exec_module call to remove parent_dir from sys.path before re-raising the ImportError, ensuring that sys.path is cleaned up whether the module execution succeeds or fails.rogue/resilience/orchestrator.py-221-226 (1)
221-226:⚠️ Potential issue | 🟠 Major | ⚡ Quick winIsolate per-experiment failures so one crash doesn’t abort the whole run.
If
_run_experiment(...)raises, the entire batch/stream run exits early. This contradicts the method contract and reduces job resilience.Suggested fix
- experiment = await self._run_experiment( - client=client, - injector_id=injector_id, - attempt=attempt, - ) + try: + experiment = await self._run_experiment( + client=client, + injector_id=injector_id, + attempt=attempt, + ) + except Exception as exc: + logger.exception( + "Experiment failed; continuing with next experiment", + extra={ + "injector": self._injector_key(injector_id), + "attempt": attempt + 1, + "error": str(exc), + }, + ) + continueAlso applies to: 288-292
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/orchestrator.py` around lines 221 - 226, The _run_experiment() call is not wrapped in error handling, causing any exception to abort the entire run rather than isolating the failure to that specific experiment. Wrap the _run_experiment() call in a try-except block to catch any exceptions, log the error appropriately, and allow the batch/stream run to continue with subsequent experiments. This same pattern needs to be fixed at both the location shown (lines 221-226) and the related location at lines 288-292 to ensure consistent error isolation across all experiment execution paths.rogue/resilience/orchestrator.py-413-467 (1)
413-467:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftGuarantee injector teardown with
try/finallyaround injection lifecycle.Teardown is skipped when
inject()fails before the current teardown call sites. That can leave fault state active and contaminate subsequent experiments.Suggested fix pattern
- injection = await injector.inject(self.target_agent_id, injection_context) - ... - injector.teardown() + try: + injection = await injector.inject(self.target_agent_id, injection_context) + ... + finally: + try: + injector.teardown() + except Exception as teardown_exc: + logger.warning( + "Injector teardown failed", + extra={"error": str(teardown_exc)}, + )Apply the same pattern to the concurrent target injector path.
Also applies to: 617-666, 688-734
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/orchestrator.py` around lines 413 - 467, The injector teardown is not guaranteed to execute if an exception occurs before the teardown call, leaving fault state active and contaminating subsequent experiments. Wrap the injection lifecycle (from the injector.inject() call through the message processing loop and fault result assignment) in a try/finally block to ensure injector.teardown() always executes. This pattern needs to be applied at three locations: the current file around line range 413-467 (the main injection context), around line range 617-666 (concurrent target injector path), and around line range 688-734 (another injection lifecycle). In each case, move the entire injection logic into a try block and place the teardown call in the corresponding finally block to guarantee execution regardless of exceptions.rogue/resilience/metrics/cascading_failure_metric.py-65-70 (1)
65-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExclude initially targeted agent(s) from cascade scoring.
This logic currently counts the injected target as “cascaded corruption,” which under-scores containment even when spread never reaches other agents.
Suggested fix
- total_agents = len(profile.agent_ids) + total_agents = len(profile.agent_ids) + target_ids = { + part.strip() + for part in str(context.injection.target_agent_id).split(",") + if part.strip() + } corrupted: set[str] = set() @@ - if _response_shows_harm(response): + if _response_shows_harm(response) and event.agent_id not in target_ids: corrupted.add(event.agent_id) - cascade_ratio = len(corrupted) / total_agents if total_agents else 0.0 + non_target_agents = max(0, total_agents - len(target_ids & set(profile.agent_ids))) + cascade_ratio = len(corrupted) / non_target_agents if non_target_agents else 0.0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/metrics/cascading_failure_metric.py` around lines 65 - 70, The cascading failure metric currently includes the initially targeted agent in the corrupted set and cascade_ratio calculation, which artificially deflates the containment score even when the harmful response doesn't spread to other agents. Modify the logic around the corrupted set population and cascade_ratio calculation to exclude the initially targeted agent(s) from the corruption count. Identify which agent was the primary injection target, and only add agents to the corrupted set if they are not the initial target, or exclude the initial target agent(s) from both the numerator and denominator when computing cascade_ratio in the scoring function. This ensures the score accurately reflects whether the cascading failure was contained beyond the initial target.
🧹 Nitpick comments (8)
rogue/injectors/registry.py (1)
57-60: ⚡ Quick winRemove duplicate
gradual_goal_driftregistration in premium map.
_PREMIUM_BUILTINalready expandsPREMIUM_INJECTOR_CLASSES; redefining"gradual_goal_drift"again is redundant and can hide future drift if the source catalog changes.♻️ Proposed cleanup
_PREMIUM_BUILTIN: Dict[str, Type[BaseInjector]] = { **{k: v for k, v in PREMIUM_INJECTOR_CLASSES.items()}, - "gradual_goal_drift": GradualGoalDriftInjector, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/injectors/registry.py` around lines 57 - 60, The _PREMIUM_BUILTIN dictionary contains a redundant registration of "gradual_goal_drift". Since _PREMIUM_BUILTIN already unpacks all items from PREMIUM_INJECTOR_CLASSES with the spread operator, the explicit "gradual_goal_drift": GradualGoalDriftInjector entry is duplicate and should be removed. Delete the explicit "gradual_goal_drift" key-value pair from the _PREMIUM_BUILTIN dictionary definition, keeping only the unpacking of PREMIUM_INJECTOR_CLASSES.examples/resilience_langgraph_recovery/python_entrypoint.py (1)
44-48: ⚡ Quick winType-annotate and harden
get_checkpoint_adapter.
get_checkpoint_adapteris missing a return type and does not handle import failures.Suggested patch
-def get_checkpoint_adapter(): +def get_checkpoint_adapter() -> Any: @@ - from rogue.resilience.checkpoint import get_langgraph_checkpoint_adapter + try: + from rogue.resilience.checkpoint import get_langgraph_checkpoint_adapter + except Exception as exc: + raise RuntimeError("Checkpoint adapter import failed") from excAs per coding guidelines,
**/*.py: Use type hints for all function signatures and use try/except blocks for code that may raise exceptions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/resilience_langgraph_recovery/python_entrypoint.py` around lines 44 - 48, The get_checkpoint_adapter function is missing a return type annotation and lacks error handling for the import statement. Add an appropriate return type annotation to the function signature, and wrap the import statement from rogue.resilience.checkpoint inside a try/except block to handle potential import failures gracefully, ensuring the function either returns the expected type or handles the exception appropriately.Source: Coding guidelines
examples/resilience_langgraph_recovery/langgraph_agent.py (1)
29-37: ⚡ Quick winAdd return type hints for
build_graphandget_graph.
build_graph(Line 29) andget_graph(Line 42) are missing return annotations.Suggested patch
-def build_graph(): +def build_graph() -> Any: @@ -def get_graph(): +def get_graph() -> Any:As per coding guidelines,
**/*.py: Use type hints for all function signatures.Also applies to: 42-44
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/resilience_langgraph_recovery/langgraph_agent.py` around lines 29 - 37, Add return type hints to both the `build_graph` function (lines 29-37) and the `get_graph` function (lines 42-44) as required by the coding guidelines. For `build_graph`, add the appropriate return type annotation that reflects the compiled graph object returned by `graph_builder.compile(checkpointer=memory)`. For `get_graph`, add the return type annotation that matches what the function returns. Ensure both function signatures include proper type hints for their return values following Python typing conventions.Source: Coding guidelines
examples/multi_agent_hierarchical/run_resilience_demo.py (1)
22-35: ⚡ Quick winAdd explicit error handling around demo execution.
run_experiments()/asyncio.run()failures currently bubble as raw tracebacks. Wrap this path intry/exceptand exit cleanly with a concise failure message.Suggested change
if __name__ == "__main__": - asyncio.run(main()) + try: + asyncio.run(main()) + except Exception as exc: + print(f"Resilience demo failed: {exc}") + raise SystemExit(1)As per coding guidelines, "Use try/except blocks for code that may raise exceptions".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/multi_agent_hierarchical/run_resilience_demo.py` around lines 22 - 35, Add explicit error handling to gracefully handle failures in the demo execution. First, wrap the await orchestrator.run_experiments() call within the main() function in a try/except block to catch and log any errors from the orchestrator. Second, wrap the asyncio.run(main()) call in the if __name__ == "__main__" block in a try/except block to catch any errors that propagate from main(). For each catch block, exit cleanly with a concise failure message using sys.exit() rather than allowing raw tracebacks to bubble up to the user.Source: Coding guidelines
rogue/tests/test_langgraph_checkpoint_adapter.py (1)
24-24: ⚡ Quick winAdd a return type hint to the fixture signature.
graph()is currently untyped; add an explicit return annotation to satisfy the repository typing rule for function signatures.As per coding guidelines, "Use type hints for all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/tests/test_langgraph_checkpoint_adapter.py` at line 24, The graph() fixture function is missing a return type annotation, which violates the repository's typing guidelines requiring type hints for all function signatures. Add an explicit return type hint to the graph() function signature by appending -> and the appropriate return type (likely a reference to the graph type being returned by the fixture) to satisfy the typing requirement.Source: Coding guidelines
rogue/tests/test_resilience_reporting.py (1)
15-23: ⚡ Quick winType-annotate
_sample_resultsreturn value.Add an explicit return type to
_sample_results()to align with project typing standards for function signatures.As per coding guidelines, "Use type hints for all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/tests/test_resilience_reporting.py` around lines 15 - 23, Add an explicit return type annotation to the `_sample_results` async function to comply with project typing standards. Inspect the ResilienceOrchestrator.run_experiments method to determine what type it returns when awaited, then add the appropriate return type annotation using the arrow syntax (after the function parameters and before the colon in the function signature). Since this is an async function, ensure the return type annotation reflects the actual return value type, not a coroutine wrapper.Source: Coding guidelines
rogue/resilience/metrics/detection_metric.py (1)
45-49: ⚡ Quick winAdd explicit type hints to all metric function signatures.
These modules currently leave several parameters untyped (trace, helper inputs), which breaks the repository rule for typed signatures.
rogue/resilience/metrics/detection_metric.py#L45-L49: typetraceasExperimentTrace.rogue/resilience/metrics/detection_metric.py#L89-L94: typetraceasExperimentTrace.rogue/resilience/metrics/detection_metric.py#L119-L126: type helpertraceand returnOptional[datetime].rogue/resilience/metrics/detection_metric.py#L135-L145: type helpertrace/injectionwith concrete model types.rogue/resilience/metrics/containment_metric.py#L31-L35: typetraceasExperimentTrace.rogue/resilience/metrics/containment_metric.py#L65-L70: typetraceasExperimentTrace.rogue/resilience/metrics/containment_metric.py#L92-L93: type helpertraceasExperimentTrace.rogue/resilience/metrics/containment_metric.py#L102-L102: keep/replace unusedcontextparameter deliberately and type explicitly.rogue/resilience/metrics/recovery_metric.py#L20-L24: typetraceasExperimentTrace.rogue/resilience/metrics/recovery_metric.py#L73-L77: typetraceasExperimentTrace.
As per coding guidelines, "Use type hints for all function signatures".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/metrics/detection_metric.py` around lines 45 - 49, Add explicit type hints to function signatures across the metrics modules to comply with the repository's typing requirements. In rogue/resilience/metrics/detection_metric.py, add type hint `trace: ExperimentTrace` to the measure method (L45-49) and another identical measure method (L89-94); add type hints for a helper method's trace parameter as `ExperimentTrace` and return type as `Optional[datetime]` (L119-126); and add type hints for another helper method's trace and injection parameters with concrete model types (L135-145). In rogue/resilience/metrics/containment_metric.py, add type hint `trace: ExperimentTrace` to measure methods at L31-35 and L65-70; add type hint for trace parameter as `ExperimentTrace` in a helper method (L92-93); and explicitly type the context parameter at L102-102 (keeping or replacing it as appropriate). In rogue/resilience/metrics/recovery_metric.py, add type hint `trace: ExperimentTrace` to measure methods at L20-24 and L73-77. Ensure all parameters and return types follow the typing guidelines.Source: Coding guidelines
rogue/evaluator_agent/resilience/base_resilience_evaluator.py (1)
34-35: ⚡ Quick winType
checkpoint_adapterto the adapter interface instead ofAny.Using
Anyweakens the checkpoint contract and delays errors to runtime. PreferCheckpointAdapter | Nonehere (and keep the same type in the factory signature).Also applies to: 45-45
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/evaluator_agent/resilience/base_resilience_evaluator.py` around lines 34 - 35, The checkpoint_adapter parameter is typed as Optional[Any], which provides weak type checking and delays errors to runtime. Replace Optional[Any] with CheckpointAdapter | None for the checkpoint_adapter parameter in the method signature (around line 34-35). Apply the same type annotation change to the factory signature at the other location mentioned (line 45) to maintain consistency throughout the codebase and enforce proper type contracts for the CheckpointAdapter interface.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c4ac9482-cd82-4e63-a1dd-15ebc699f457
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (106)
.github/workflows/resilience-gate.yml.github/workflows/test.ymlREADME.mddocs/resilience/INJECTORS.mddocs/resilience/PLAN.mddocs/resilience/README.mddocs/resilience/ROADMAP.mdexamples/custom_injector_plugin/example_injector.pyexamples/multi_agent_hierarchical/README.mdexamples/multi_agent_hierarchical/__init__.pyexamples/multi_agent_hierarchical/agents.pyexamples/multi_agent_hierarchical/python_entrypoint.pyexamples/multi_agent_hierarchical/run_resilience_demo.pyexamples/resilience_langgraph_recovery/README.mdexamples/resilience_langgraph_recovery/__init__.pyexamples/resilience_langgraph_recovery/langgraph_agent.pyexamples/resilience_langgraph_recovery/python_entrypoint.pyexamples/resilience_langgraph_recovery/run_recovery_demo.pypackages/tui/internal/components/command_line.gopackages/tui/internal/screens/help/help.mdpackages/tui/internal/screens/resilience/catalog.gopackages/tui/internal/screens/resilience/view.gopackages/tui/internal/tui/app.gopackages/tui/internal/tui/common_controller.gopackages/tui/internal/tui/keyboard_controller.gopackages/tui/internal/tui/types.gopackages/web/src/api/queries.tspackages/web/src/api/types.tspackages/web/src/api/ws.tspackages/web/src/components/app-sidebar.tsxpackages/web/src/components/live-conversation.tsxpackages/web/src/components/resilience/agent-graph.tsxpackages/web/src/components/resilience/checkpoint-scrubber.tsxpackages/web/src/components/resilience/injector-catalog-pane.tsxpackages/web/src/components/resilience/metrics-radar.tsxpackages/web/src/components/resilience/panel-states.tsxpackages/web/src/components/resilience/trace-timeline.tsxpackages/web/src/lib/resilience-catalog.tspackages/web/src/routes/evaluations/$jobId.index.tsxpackages/web/src/routes/index.tsxpackages/web/src/routes/resilience/$jobId.index.tsxpackages/web/src/routes/resilience/$jobId.report.tsxpackages/web/src/routes/resilience/$jobId.tsxpackages/web/src/routes/resilience/configure.tsxpackages/web/src/routes/resilience/index.tsxpackages/web/src/routes/resilience/new.tsxpackages/web/src/stores/config.tspackages/web/src/stores/resilience.tspyproject.tomlrogue/evaluator_agent/resilience/__init__.pyrogue/evaluator_agent/resilience/base_resilience_evaluator.pyrogue/evaluator_agent/resilience/factory.pyrogue/evaluator_agent/resilience/python_resilience_evaluator.pyrogue/injectors/__init__.pyrogue/injectors/action_spam.pyrogue/injectors/base_injector.pyrogue/injectors/goal_conflict.pyrogue/injectors/gradual_goal_drift.pyrogue/injectors/hallucination_spamming.pyrogue/injectors/plugins.pyrogue/injectors/policy_violation.pyrogue/injectors/premium_catalog.pyrogue/injectors/registry.pyrogue/injectors/tool_misuse.pyrogue/resilience/__init__.pyrogue/resilience/agent_client.pyrogue/resilience/catalog/__init__.pyrogue/resilience/catalog/architectures.pyrogue/resilience/checkpoint/__init__.pyrogue/resilience/checkpoint/base.pyrogue/resilience/checkpoint/langgraph_adapter.pyrogue/resilience/metrics/__init__.pyrogue/resilience/metrics/base_resilience_metric.pyrogue/resilience/metrics/cascading_failure_metric.pyrogue/resilience/metrics/containment_metric.pyrogue/resilience/metrics/detection_metric.pyrogue/resilience/metrics/recovery_metric.pyrogue/resilience/metrics/system_degradation_metric.pyrogue/resilience/models.pyrogue/resilience/multi_agent_client.pyrogue/resilience/orchestrator.pyrogue/run_cli.pyrogue/server/api/resilience.pyrogue/server/core/resilience_orchestrator.pyrogue/server/main.pyrogue/server/resilience/__init__.pyrogue/server/resilience/report/__init__.pyrogue/server/resilience/report/generator.pyrogue/server/resilience/report/models.pyrogue/server/resilience/report/web_formatter.pyrogue/server/services/resilience_scenario_service.pyrogue/server/services/resilience_service.pyrogue/tests/fixtures/resilience_multi_agent_entrypoint.pyrogue/tests/fixtures/resilience_test_agent.pyrogue/tests/test_concurrent_injection.pyrogue/tests/test_injector_plugins.pyrogue/tests/test_injectors.pyrogue/tests/test_langgraph_checkpoint_adapter.pyrogue/tests/test_multi_agent_orchestrator.pyrogue/tests/test_resilience_metrics.pyrogue/tests/test_resilience_orchestrator.pyrogue/tests/test_resilience_reporting.pyrogue/tests/test_resilience_scenarios.pysdks/python/rogue_sdk/client.pysdks/python/rogue_sdk/sdk.pysdks/python/rogue_sdk/types.py
Improve fault isolation and metric robustness during experiments, tighten plugin path validation and job lifecycle handling, and polish the resilience web UI and CI coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rogue/resilience/agent_client.py (1)
84-101:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
sys.pathrollback is still incomplete on one load failure path.If module execution succeeds but Line 99 raises (missing
call_agent), the path inserted at Line 81 remains insys.path, leaking process-global import state after a failed load.Suggested fix
if not hasattr(module, "call_agent"): + if self._added_sys_path and self._added_sys_path in sys.path: + sys.path.remove(self._added_sys_path) + self._added_sys_path = None raise AttributeError( f"{self._python_file_path} must define a call_agent() function", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/resilience/agent_client.py` around lines 84 - 101, The sys.path cleanup in the exception handler only addresses failures during module execution (exec_module call), but if the subsequent call_agent attribute check at line 99 raises an AttributeError, the path added to sys.path is never removed, leaking process-global import state. Move the call_agent existence check inside the try block that starts at line 84 so that both the module execution and the attribute validation are protected by the same exception handler that performs the sys.path cleanup. This ensures that sys.path is restored regardless of whether the failure occurs during module execution or during the call_agent validation.
🧹 Nitpick comments (1)
rogue/tests/test_injector_plugins.py (1)
7-10: ⚡ Quick winReorder
rogue.injectorsimports to matchisortconventions.This import block is out of
isortorder for same-section module imports.
As per coding guidelines,Use isort conventions for import organization.Suggested diff
from rogue.injectors import registry -from rogue.injectors.plugins import load_injector_plugin, register_injector -from rogue.injectors.registry import get_injector, list_injectors from rogue.injectors.base_injector import BaseInjector, InjectionContext, InjectionResult +from rogue.injectors.plugins import load_injector_plugin, register_injector +from rogue.injectors.registry import get_injector, list_injectors🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rogue/tests/test_injector_plugins.py` around lines 7 - 10, The imports in the test file from the rogue.injectors package are not ordered according to isort conventions. Reorder the four imports (from rogue.injectors import registry, from rogue.injectors.plugins import load_injector_plugin and register_injector, from rogue.injectors.registry import get_injector and list_injectors, and from rogue.injectors.base_injector import BaseInjector and InjectionContext and InjectionResult) so they are sorted alphabetically by their full module path (base_injector should come before plugins which should come before registry). This ensures consistency with the project's isort import organization standards.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rogue/resilience/metrics/cascading_failure_metric.py`:
- Around line 107-115: The function _initial_injection_targets is missing a type
hint for its trace parameter, which violates the repository's requirement for
type hints on all function signatures. Add an appropriate type hint to the trace
parameter by specifying the type of the trace object that is being passed in.
The return type is already correctly specified as set[str], but the input
parameter needs to be typed following the same convention used elsewhere in the
codebase for trace objects.
In `@rogue/tests/test_resilience_scenarios.py`:
- Line 27: Add a type annotation to the mocker parameter in the
test_generate_route_offloads_blocking_generate function signature. The mocker
parameter should be typed as MockerFixture from the pytest_mock library to
comply with the coding guideline requiring type hints for all function
signatures.
In `@rogue/tests/test_resilience_service.py`:
- Around line 39-89: Add type annotations to all untyped function parameters in
the test file to comply with the coding guidelines. In the service fixture
function, add type hints for the tmp_path parameter (use pathlib.Path) and
monkeypatch parameter (use pytest.MonkeyPatch). In the test functions
test_run_job_skips_already_cancelled_job and
test_run_job_preserves_cancelled_state_after_orchestrator, add a type hint for
the mocker parameter (use pytest_mock.MockerFixture). Ensure all fixture and
test function parameters have explicit type annotations.
In `@sdks/python/rogue_sdk/tests/test_types.py`:
- Around line 273-302: Add type hints to the parameters of the three test
methods in TestResilienceConfig: test_accepts_plugin_in_trusted_dir,
test_rejects_path_outside_trusted_dirs, and test_rejects_symlink_plugin_path.
Each method has untyped tmp_path and monkeypatch parameters that should be
annotated with Path and MonkeyPatch types respectively. Ensure the necessary
imports (Path from pathlib and MonkeyPatch from pytest) are present at the top
of the file.
In `@sdks/python/rogue_sdk/types.py`:
- Around line 192-227: The _validate_injector_plugin_path function blocks
symlink plugin files at line 210, but the trusted root directories returned by
_trusted_injector_plugin_roots() are not validated for symlinks themselves. If a
trusted root directory (such as .rogue/injector_plugins) is a symlink, a
malicious plugin file inside the symlinked target can bypass the isolation and
pass the is_relative_to check at line 226. To fix this, modify
_trusted_injector_plugin_roots() to verify that each resolved trusted root
directory is not itself a symlink by checking with is_symlink() before returning
the paths, and raise a ValueError if any trusted root is found to be a symlink.
---
Outside diff comments:
In `@rogue/resilience/agent_client.py`:
- Around line 84-101: The sys.path cleanup in the exception handler only
addresses failures during module execution (exec_module call), but if the
subsequent call_agent attribute check at line 99 raises an AttributeError, the
path added to sys.path is never removed, leaking process-global import state.
Move the call_agent existence check inside the try block that starts at line 84
so that both the module execution and the attribute validation are protected by
the same exception handler that performs the sys.path cleanup. This ensures that
sys.path is restored regardless of whether the failure occurs during module
execution or during the call_agent validation.
---
Nitpick comments:
In `@rogue/tests/test_injector_plugins.py`:
- Around line 7-10: The imports in the test file from the rogue.injectors
package are not ordered according to isort conventions. Reorder the four imports
(from rogue.injectors import registry, from rogue.injectors.plugins import
load_injector_plugin and register_injector, from rogue.injectors.registry import
get_injector and list_injectors, and from rogue.injectors.base_injector import
BaseInjector and InjectionContext and InjectionResult) so they are sorted
alphabetically by their full module path (base_injector should come before
plugins which should come before registry). This ensures consistency with the
project's isort import organization standards.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8424defd-60ce-4a6b-9dce-38cd7473853a
📒 Files selected for processing (24)
.github/workflows/resilience-gate.yml.github/workflows/test.ymldocs/resilience/INJECTORS.mdpackages/web/src/components/resilience/metrics-radar.tsxpackages/web/src/components/resilience/trace-timeline.tsxpackages/web/src/routes/resilience/$jobId.index.tsxpackages/web/src/routes/resilience/$jobId.report.tsxpackages/web/src/routes/resilience/configure.tsxpackages/web/src/routes/resilience/index.tsxpackages/web/src/stores/resilience.tsrogue/injectors/plugins.pyrogue/resilience/agent_client.pyrogue/resilience/metrics/cascading_failure_metric.pyrogue/resilience/metrics/containment_metric.pyrogue/resilience/metrics/detection_metric.pyrogue/resilience/metrics/recovery_metric.pyrogue/resilience/orchestrator.pyrogue/server/api/resilience.pyrogue/server/services/resilience_service.pyrogue/tests/test_injector_plugins.pyrogue/tests/test_resilience_scenarios.pyrogue/tests/test_resilience_service.pysdks/python/rogue_sdk/tests/test_types.pysdks/python/rogue_sdk/types.py
🚧 Files skipped from review as they are similar to previous changes (14)
- .github/workflows/test.yml
- packages/web/src/stores/resilience.ts
- packages/web/src/routes/resilience/$jobId.index.tsx
- docs/resilience/INJECTORS.md
- packages/web/src/components/resilience/trace-timeline.tsx
- .github/workflows/resilience-gate.yml
- packages/web/src/routes/resilience/$jobId.report.tsx
- rogue/resilience/metrics/detection_metric.py
- rogue/server/api/resilience.py
- rogue/resilience/metrics/recovery_metric.py
- packages/web/src/routes/resilience/configure.tsx
- rogue/server/services/resilience_service.py
- rogue/resilience/orchestrator.py
- rogue/injectors/plugins.py
Annotate resilience test fixtures and helpers, reject symlink trusted plugin roots, and ensure agent client sys.path cleanup on load failures. Co-authored-by: Cursor <cursoragent@cursor.com>
Maintainer note
This is a full-feature proposal rather than a minimal patch.
My goal is to demonstrate a complete direction for a new Rogue capability: resilience benchmarking for multi-agent systems under internal agent failure.
I do not expect every implementation detail to be accepted as-is. I’m happy with any of these outcomes:
The core proposal is:
Summary
Adds Rogue Resilience — a third evaluation mode for multi-agent chaos testing: inject internal faults, measure detection/containment/recovery, and report via web/CLI/API.
What's included (Phases 1–4)
rogue/injectors/plugins.py)/resilience/*(configure, live trace, report + checkpoint scrubber)--evaluation-mode resilience.github/workflows/resilience-gate.yml)Docs
Test plan
uv sync --all-groups && uv run pytest rogue/tests/test_*resilience* rogue/tests/test_injector* rogue/tests/test_concurrent* -vcd packages/web && pnpm install && pnpm routes && pnpm build/resilience/configure→ run experiment against fixture agentNotes / follow-ups
/resiliencereference screen; full terminal job runner is future work