Feature/rogue resilience by Yosef-levy · Pull Request #174 · qualifire-dev/rogue

Yosef-levy · 2026-06-15T17:46:46Z

Maintainer note

This is a full-feature proposal rather than a minimal patch.

My goal is to demonstrate a complete direction for a new Rogue capability: resilience benchmarking for multi-agent systems under internal agent failure.

I do not expect every implementation detail to be accepted as-is. I’m happy with any of these outcomes:

Merge after review.
Split into smaller PRs.
Use this as an RFC/reference implementation and reimplement internally.
Close this PR but keep the feature idea for the roadmap.

The core proposal is:

internal rogue-agent fault injection
detection / containment / recovery metrics
resilience reports for multi-agent systems
optional LangGraph replay/recovery support

Summary

Adds Rogue Resilience — a third evaluation mode for multi-agent chaos testing: inject internal faults, measure detection/containment/recovery, and report via web/CLI/API.

Mode	Tests
Policy	Business compliance
Red Team	External adversarial attacks
Resilience	Internal rogue agent behavior & recovery

What's included (Phases 1–4)

Fault injectors + custom plugin API (rogue/injectors/plugins.py)
Multi-agent profiles + concurrent multi-injection
LangGraph checkpoint adapter + recovery metrics
Web UI: /resilience/* (configure, live trace, report + checkpoint scrubber)
Reporting (markdown/CSV/web JSON), CLI --evaluation-mode resilience
Scenario generation from business context, CI gate (.github/workflows/resilience-gate.yml)

Docs

Test plan

uv sync --all-groups && uv run pytest rogue/tests/test_*resilience* rogue/tests/test_injector* rogue/tests/test_concurrent* -v
cd packages/web && pnpm install && pnpm routes && pnpm build
Start server → /resilience/configure → run experiment against fixture agent
Optional: hierarchical + LangGraph examples from docs

Notes / follow-ups

Deckard premium injector is catalog metadata only (live fetch TBD)
TUI: /resilience reference screen; full terminal job runner is future work
Policy & Red Team flows unchanged

Introduces injectors, detection/containment/recovery metrics, a Python-protocol orchestrator, REST API and SDK support, plus docs and tests—without changing existing Policy or Red Team flows. Co-authored-by: Cursor <cursoragent@cursor.com>

…ort. Extend the resilience orchestrator with architecture profiles, LangGraph checkpoint adapters, cascade/degradation metrics, examples, and tests while preserving Phase 1 single-agent behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

Deliver the /resilience web UI mirroring Red Team, report generator and API, live trace WebSocket events, CLI resilience mode, CI test job, and documentation polish. Co-authored-by: Cursor <cursoragent@cursor.com>

Introduces custom injector plugins, concurrent multi-injection, premium strategies, scenario generation, CI score gates, and TUI/web/docs updates to complete the resilience extension. Co-authored-by: Cursor <cursoragent@cursor.com>

Resilience trace events share the TimelineEvent union; skip them in evaluation live views so tsc passes during production builds. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-06-15T17:47:03Z

Walkthrough

Adds a resilience evaluation track with injectors, orchestration, checkpoint recovery, reporting, APIs, UI routes, examples, tests, CI, and docs.

Changes

Rogue Resilience

Layer / File(s)	Summary
Docs and product surfacing `README.md`, `docs/resilience/*`	Adds resilience branding, overview docs, injector and roadmap documentation, example usage, UI route descriptions, and CI references.
Injector contracts and registry `rogue/injectors/*`, `examples/custom_injector_plugin/example_injector.py`	Introduces base injector models, built-in and premium injectors, registry and plugin loading, public exports, and a custom injector example.
Resilience runtime and experiment engine `rogue/resilience/`, `rogue/evaluator_agent/resilience/`	Adds resilience models, architecture catalog, Python and multi-agent clients, checkpoint adapters, metrics, evaluator agent wiring, and the orchestrator for single-agent and multi-agent experiments.
Server, report, SDK, and CLI integration `rogue/server/.../resilience`, `sdks/python/rogue_sdk/`, `rogue/run_cli.py`, `rogue/server/main.py`	Adds resilience job APIs, background execution, scenario generation, report generation and web formatting, SDK request and polling support, and a CLI resilience mode.
Web resilience data plumbing `packages/web/src/api/`, `packages/web/src/lib/resilience-catalog.ts`, `packages/web/src/stores/`, `packages/web/src/components/app-sidebar.tsx`, `packages/web/src/routes/index.tsx`	Adds resilience API types and hooks, websocket stream support, catalog data, persisted configuration, navigation, dashboard summaries, and status rendering adjustments.
Web resilience routes and visuals `packages/web/src/routes/resilience/`, `packages/web/src/components/resilience/`	Adds configure, list, live-job, and report pages plus injector, metrics, trace, checkpoint, and topology components.
Examples, TUI, tests, and CI `examples/...`, `packages/tui/...`, `rogue/tests/`, `.github/workflows/`, `pyproject.toml`	Adds runnable examples, a TUI resilience catalog screen, resilience test coverage, dedicated CI jobs and score gate workflow, and an optional LangGraph dependency group.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through faults with a lantern bright,
And mapped new trails in dashboard light.
With injectors, traces, checkpoints too,
The burrow now tests what agents do.
A scorecard blooms where chaos ran—
Carrot cheers for this resilient plan!

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (24)

sdks/python/rogue_sdk/types.py-261-264 (1)

261-264: ⚠️ Potential issue | 🟠 Major

Path traversal is mitigated, but arbitrary code execution risk remains without trusted directory enforcement.

custom_injector_plugins accepts client-supplied .py file paths that are loaded with exec_module(). While Path.resolve() prevents traversal attacks by normalizing paths and resolving symlinks, the loader has no allowlist or directory-level access control. Consider restricting plugin paths to a predefined trusted directory (e.g., validating that plugin_path is within an expected location) or explicitly rejecting symlinks with Path.is_symlink() as an additional safeguard before exec.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdks/python/rogue_sdk/types.py` around lines 261 - 264, The
custom_injector_plugins field accepts arbitrary file paths that are executed,
creating an arbitrary code execution risk. Add a Pydantic field validator to the
custom_injector_plugins field definition that validates each plugin path by: (1)
checking that the resolved path exists within a predefined trusted directory
(not just anywhere on the filesystem), and (2) rejecting any paths that are
symlinks using Path.is_symlink() to prevent bypass techniques. Raise a
validation error with a clear message if either check fails, ensuring plugins
can only be loaded from known safe locations and cannot exploit symlink-based
attacks.

rogue/server/services/resilience_service.py-72-81 (1)

72-81: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cancellation semantics are inconsistent and can be overwritten by completion.

cancel_job() can report success without a state transition, and run_job() can still move a cancelled job to RUNNING/COMPLETED. This breaks job-state guarantees.

💡 Suggested fix

     async def cancel_job(self, job_id: str) -> bool:
         """Cancel a pending or running resilience job."""
         job = await self.get_job(job_id)
         if not job:
             return False
-        if job.status in (EvaluationStatus.PENDING, EvaluationStatus.RUNNING):
-            job.status = EvaluationStatus.CANCELLED
-            job.completed_at = datetime.now(timezone.utc)
-            self._notify_job_update(job)
-        return True
+        if job.status not in (EvaluationStatus.PENDING, EvaluationStatus.RUNNING):
+            return False
+        job.status = EvaluationStatus.CANCELLED
+        job.completed_at = datetime.now(timezone.utc)
+        self._notify_job_update(job)
+        return True
@@
-            job.status = EvaluationStatus.RUNNING
+            if job.status == EvaluationStatus.CANCELLED:
+                return
+            job.status = EvaluationStatus.RUNNING
@@
             async for update_type, data in orchestrator.run_experiments():
+                if job.status == EvaluationStatus.CANCELLED:
+                    logger.info("Resilience job cancelled during execution", extra={"job_id": job_id})
+                    return
                 if update_type == "status":
@@
-            job.status = EvaluationStatus.COMPLETED
+            if job.status == EvaluationStatus.CANCELLED:
+                return
+            job.status = EvaluationStatus.COMPLETED

Also applies to: 110-164

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/server/services/resilience_service.py` around lines 72 - 81, The
cancel_job method returns True even when no state transition occurs (when the
job exists but is not in PENDING or RUNNING status), creating inconsistent
semantics. Move the return True statement inside the conditional block so it
only returns True when the job status is actually changed to CANCELLED.
Additionally, the run_job method (referenced in the consolidated_sites range)
must check if the job status is CANCELLED before proceeding to transition it to
RUNNING or COMPLETED state; if the job is already cancelled, the method should
respect that state and not overwrite it. This ensures that once a job is
cancelled, it cannot be resumed or completed by run_job, maintaining proper
job-state guarantees.

rogue/server/api/resilience.py-126-137 (1)

126-137: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid blocking I/O directly inside this async route.

service.generate(...) is synchronous and can block the event loop while waiting on the LLM call, impacting concurrent requests.

💡 Suggested fix

+import asyncio
@@
 async def generate_resilience_scenarios(
     request: ResilienceScenarioGenerateRequest,
 ) -> ResilienceScenarioGenerateResponse:
@@
-    scenarios = service.generate(request.business_context, count=request.count)
+    scenarios = await asyncio.to_thread(
+        service.generate,
+        request.business_context,
+        request.count,
+    )
     return ResilienceScenarioGenerateResponse(scenarios=scenarios)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/server/api/resilience.py` around lines 126 - 137, The
generate_resilience_scenarios async route is calling the synchronous
service.generate() method directly, which blocks the event loop and impacts
concurrent request handling. Wrap the blocking service.generate() call using
asyncio.to_thread() to execute it in a thread pool executor, allowing the event
loop to remain responsive to other requests while the synchronous LLM operation
completes.

rogue/injectors/plugins.py-66-68 (1)

66-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize plugin import failures to the declared exception contract.

Dynamic execution can throw SyntaxError/ImportError/runtime exceptions that currently bypass the function’s documented ValueError path and can surface as unhandled failures.

🔧 Proposed fix

     module = importlib.util.module_from_spec(spec)
-    spec.loader.exec_module(module)
+    try:
+        spec.loader.exec_module(module)
+    except Exception as exc:
+        raise ValueError(f"Cannot load injector plugin: {plugin_path}") from exc

As per coding guidelines, "Use try/except blocks for code that may raise exceptions".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/injectors/plugins.py` around lines 66 - 68, The dynamic module import
operations using importlib.util.module_from_spec and spec.loader.exec_module can
raise SyntaxError, ImportError, or other runtime exceptions that do not match
the function's documented ValueError exception contract. Wrap both the
module_from_spec and exec_module calls in a try/except block that catches these
potential exceptions (SyntaxError, ImportError, and Exception), and re-raise
them as ValueError to normalize the exception contract and prevent unhandled
failures from surfacing to callers.

Source: Coding guidelines

packages/web/src/routes/resilience/configure.tsx-138-149 (1)

138-149: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate required agent URL before start when protocol requires it.

needsAgentUrl is used for payload inclusion, but there’s no submit-time check for an empty URL; this yields avoidable failed starts.

Suggested patch

     if (!cfg.judgeModel.trim()) {
       toast.error("Select a judge model");
       return;
     }
+    if (needsAgentUrl && !agent.agentUrl.trim()) {
+      toast.error("Agent URL is required for the selected protocol.");
+      return;
+    }
@@
-        evaluated_agent_url: needsAgentUrl ? agent.agentUrl : undefined,
+        evaluated_agent_url: needsAgentUrl ? agent.agentUrl.trim() : undefined,

Also applies to: 168-169

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/routes/resilience/configure.tsx` around lines 138 - 149, The
validation block in the configuration form is missing a required check for the
agent URL when the protocol requires it. Add a validation condition that checks
if `needsAgentUrl` is true and the agent URL is empty (similar to the existing
checks for pythonFile, credentials, and judgeModel), and display an appropriate
error message via toast.error when validation fails. This ensures users cannot
start the configuration without providing a required agent URL when the selected
protocol needs one.

packages/web/src/stores/resilience.ts-75-78 (1)

75-78: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard numeric setters against non-finite values before persisting state.

setExperimentsPerInjector and setMinResilienceScore currently accept NaN/non-finite numbers, which can propagate invalid payloads (especially after persisted rehydration).

Suggested patch

-      setExperimentsPerInjector: (experimentsPerInjector) =>
-        set({
-          experimentsPerInjector: Math.max(1, Math.min(10, experimentsPerInjector)),
-        }),
+      setExperimentsPerInjector: (experimentsPerInjector) =>
+        set(() => {
+          const n = Number.isFinite(experimentsPerInjector)
+            ? Math.trunc(experimentsPerInjector)
+            : 1;
+          return { experimentsPerInjector: Math.max(1, Math.min(10, n)) };
+        }),
@@
-      setMinResilienceScore: (minResilienceScore) => set({ minResilienceScore }),
+      setMinResilienceScore: (minResilienceScore) =>
+        set(() => {
+          if (minResilienceScore == null || !Number.isFinite(minResilienceScore)) {
+            return { minResilienceScore: null };
+          }
+          return { minResilienceScore: Math.max(0, Math.min(100, minResilienceScore)) };
+        }),

Also applies to: 85-85

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/stores/resilience.ts` around lines 75 - 78, The
setExperimentsPerInjector and setMinResilienceScore setters do not guard against
non-finite values (NaN, Infinity, etc.) before persisting state, which can cause
invalid payloads to propagate especially after rehydration from persistence. Add
a check using Number.isFinite() to validate that the incoming numeric value is
finite before applying the Math.max/Math.min clamping logic in both setters. If
the value is not finite, either reject it or use a sensible default value
instead of persisting the invalid input.

packages/web/src/routes/resilience/configure.tsx-121-127 (1)

121-127: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize concurrent targets when applying a generated scenario.

applyGeneratedScenario updates architecture/target but does not clear or revalidate concurrentTargetAgentIds; stale IDs from a prior architecture can be submitted later.

Suggested patch

       rs.setPreset("custom");
       rs.setInjectors(first.injectors);
-      rs.setArchitecture(first.architecture);
-      rs.setTargetAgentId(first.target_agent_id);
+      onArchitectureChange(first.architecture);
+      rs.setConcurrentTargetAgentIds([]);
+      const profile = getArchitecture(first.architecture);
+      rs.setTargetAgentId(
+        profile?.agentIds.includes(first.target_agent_id)
+          ? first.target_agent_id
+          : (profile?.agentIds[0] ?? "default"),
+      );
       rs.setBaselineTask(first.baseline_task);
       rs.setRecoveryTask(first.recovery_task);

Also applies to: 152-160

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/routes/resilience/configure.tsx` around lines 121 - 127,
When applying a generated scenario, the code updates fields like architecture
and targetAgentId but does not reset the concurrentTargetAgentIds field,
allowing stale target IDs from a previous scenario to persist and be submitted
later. After calling rs.setTargetAgentId() in the applyGeneratedScenario
function, also add a call to clear or reset concurrentTargetAgentIds (likely
through a setConcurrentTargetAgentIds method with an empty array) to ensure old
target IDs from prior architectures do not carry over to the new scenario
configuration.

packages/web/src/routes/resilience/index.tsx-15-40 (1)

15-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Query failures are rendered as an empty state instead of an error state.

useResilienceJobs() errors currently fall through to jobs.length === 0, which is misleading and blocks troubleshooting. Add an explicit error branch (isError/error) before empty-state rendering.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/routes/resilience/index.tsx` around lines 15 - 40, The
useResilienceJobs hook query errors are not being handled explicitly, causing
them to fall through to the empty state check which is misleading. Destructure
the error property (isError or error) from the useResilienceJobs hook call
alongside data and isLoading, then add an explicit error branch in the
conditional rendering chain before the jobs.length === 0 check to display an
appropriate error message when the query fails.

packages/web/src/routes/resilience/$jobId.index.tsx-25-28 (1)

25-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Transient socket reconnects are treated as hard viewer errors.

Using connection === "closed" as error causes LiveTraceViewer to render the disconnected panel even during normal reconnect/backoff or polling fallback. Reserve error for actual job/stream failure signals and surface connection state separately.

Also applies to: 58-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/routes/resilience/`$jobId.index.tsx around lines 25 - 28,
The streamError variable in packages/web/src/routes/resilience/$jobId.index.tsx
at lines 25-28 and 58-63 incorrectly treats transient socket disconnections as
hard errors. Remove the connection === "closed" check from the streamError
assignment and instead pass connection state separately to the LiveTraceViewer
component. Reserve streamError to only reflect actual job failures from
job.data?.error_message, allowing the viewer to distinguish between normal
reconnect/polling scenarios and genuine error conditions.

packages/web/src/components/resilience/metrics-radar.tsx-31-55 (1)

31-55: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error boundary state never resets after a single chart crash.

Once hasError is set, this component will keep rendering the fallback for all future renders. Add a reset path keyed to chart input changes (or remount the boundary with a stable key derived from data).

Suggested fix (remount boundary when data changes)

-  return (
-    <RadarChartErrorBoundary className={className}>
+  const chartKey = data.map((d) => `${d.metric}:${d.value}`).join("|");
+  return (
+    <RadarChartErrorBoundary key={chartKey} className={className}>
       <RadarChartBody data={data} className={className} />
     </RadarChartErrorBoundary>
   );

Also applies to: 126-128

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/components/resilience/metrics-radar.tsx` around lines 31 -
55, The RadarChartErrorBoundary component's hasError state is never reset once
an error occurs, causing it to permanently render the error fallback UI even if
the underlying data changes. Fix this by adding a key prop to the
RadarChartErrorBoundary component that is derived from the chart data. When the
data changes, the key will change, causing React to remount the boundary and
reset its state. This approach avoids the need to manually reset state by
leveraging React's component remounting behavior when keys change.

packages/web/src/routes/resilience/$jobId.report.tsx-350-369 (1)

350-369: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CSV export is vulnerable to formula injection and malformed quoting.

Untrusted text fields (for example injector_name) are written raw. Spreadsheet apps can execute cells beginning with =, +, -, or @, and commas/quotes/newlines also break CSV shape. Escape/quote fields and neutralize formula-leading characters.

Suggested fix

   const downloadCsv = () => {
+    const esc = (value: string | number | boolean) => {
+      let s = String(value ?? "");
+      if (/^[=+\-@]/.test(s)) s = `'${s}`;
+      s = s.replace(/"/g, '""');
+      return `"${s}"`;
+    };
     const header = [
       "injector",
       "score",
       "passed",
       "detection",
       "containment",
       "recovery",
     ];
     const lines = rows.map((r) =>
       [
-        r.injector_name,
-        r.composite_score.toFixed(1),
-        r.passed,
-        r.detection_score.toFixed(2),
-        r.containment_score.toFixed(2),
-        r.recovery_score.toFixed(2),
-      ].join(","),
+        esc(r.injector_name),
+        esc(r.composite_score.toFixed(1)),
+        esc(r.passed),
+        esc(r.detection_score.toFixed(2)),
+        esc(r.containment_score.toFixed(2)),
+        esc(r.recovery_score.toFixed(2)),
+      ].join(","),
     );
-    const blob = new Blob([[header.join(","), ...lines].join("\n")], { type: "text/csv" });
+    const blob = new Blob([[header.map(esc).join(","), ...lines].join("\n")], { type: "text/csv" });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/routes/resilience/`$jobId.report.tsx around lines 350 - 369,
The downloadCsv function writes CSV data without proper escaping, making it
vulnerable to formula injection and CSV format breaking. Create a helper
function to safely escape CSV field values by: wrapping fields in double quotes,
escaping internal quotes by doubling them, and neutralizing formula-injection
characters (cells starting with =, +, -, or @) by prefixing them with a single
quote. Apply this helper function to all field values in the rows array mapping,
including r.injector_name and the r.passed field, before joining them with
commas. This ensures untrusted data cannot break CSV structure or execute
formulas in spreadsheet applications.

packages/web/src/components/resilience/metrics-radar.tsx-115-123 (1)

115-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

All-zero metrics are being misclassified as “pending.”

Valid runs can legitimately score 0 across all axes; this branch hides real results and replaces the chart with an empty state. Please remove this heuristic unless the API provides an explicit “pending” flag.

Suggested fix

-  const allZero = data.every((d) => d.value === 0);
-  if (allZero && metrics.length > 0) {
-    return (
-      <ResilienceEmptyPanel
-        title="Metrics pending"
-        description="Scores will appear once experiment results are processed."
-      />
-    );
-  }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/components/resilience/metrics-radar.tsx` around lines 115 -
123, The current logic in the metrics-radar component incorrectly assumes that
all-zero metric values indicate a pending state and displays an empty
ResilienceEmptyPanel instead of the actual data. Remove the entire conditional
block that checks the allZero condition and renders the ResilienceEmptyPanel,
since legitimate experiment results can have zero values across all axes. The
API should provide an explicit pending flag if needed to determine whether
results are still processing, rather than relying on the heuristic of checking
whether all values equal zero.

rogue/tests/test_injector_plugins.py-19-24 (1)

19-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate registry mutations per test to prevent order-dependent failures.

These tests mutate global injector registry state (plugin load/register) without restoring it, so later tests may observe polluted state and become flaky.

Suggested approach

 def test_register_injector_programmatically() -> None:
+    # snapshot registry before mutation
+    before = {entry.id for entry in list_injectors(include_premium=True)}
+
     class TempInjector(BaseInjector):
         id = "temp_test_injector"
@@
     register_injector("temp_test_injector", "Temp", "temp", TempInjector)
     assert any(
         str(d.id) == "temp_test_injector" or getattr(d.id, "value", None) == "temp_test_injector"
         for d in list_injectors()
     )
+    # restore/verify no leaked ids after test (use registry reset helper/fixture if available)
+    after = {entry.id for entry in list_injectors(include_premium=True)}
+    assert after >= before

Also applies to: 26-43, 46-50

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/tests/test_injector_plugins.py` around lines 19 - 24, The tests
test_load_injector_plugin_from_file and the other affected test functions (at
lines 26-43 and 46-50) mutate the global injector registry state without
cleaning up afterward, causing test pollution and order-dependent failures. Add
a pytest fixture (such as an autouse fixture) that captures the initial state of
the injector registry before each test and restores it after each test
completes, or manually clean up by unregistering any plugins that were
registered during the test. This ensures each test runs in isolation with a
clean registry state regardless of test execution order.

.github/workflows/resilience-gate.yml-27-27 (1)

27-27: ⚠️ Potential issue | 🟠 Major

Pin all third-party GitHub Actions to immutable commit SHAs.

Using floating tags (@v4/@v5) on lines 27, 30, 33, and 94 allows upstream changes to silently alter CI behavior and weakens supply-chain security. Replace these with full commit SHAs (e.g., actions/checkout@a5ac7e51b41094c7467c0571be57900fb2f36dba).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/resilience-gate.yml at line 27, Replace all floating
version tags (such as `@v4`, `@v5`) for GitHub Actions with full immutable commit
SHAs to prevent upstream changes from silently affecting CI behavior. Identify
all instances of the `uses` keyword referencing actions (including
actions/checkout and any other third-party actions) and update them from the
format `action-name@vX` to `action-name@COMMIT_SHA` where COMMIT_SHA is the full
40-character commit hash for the specific version you want to use. This applies
to all GitHub Actions referenced in the workflow file, not just the ones
explicitly mentioned.
Source: Linters/SAST tools

.github/workflows/resilience-gate.yml-26-27 (1)

26-27: ⚠️ Potential issue | 🟠 Major

Disable credential persistence in checkout step.

Set persist-credentials: false on actions/checkout to reduce token exposure risk across subsequent workflow steps.

Suggested change

       - name: Checkout Repository
         uses: actions/checkout@v4
+        with:
+          persist-credentials: false

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/resilience-gate.yml around lines 26 - 27, The checkout
step using actions/checkout@v4 does not explicitly disable credential
persistence, which allows the GitHub token to remain available to subsequent
workflow steps and increases token exposure risk. Add the `persist-credentials:
false` parameter to the Checkout Repository step to disable credential
persistence and reduce the security risk of token exposure in later workflow
steps.

Source: Linters/SAST tools

.github/workflows/test.yml-54-54 (1)

54-54: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin GitHub Actions by commit SHA in the new resilience job.

Line 54, Line 57, Line 60, and Line 86 use tag refs (@v4, @v5) instead of immutable SHAs, which is a supply-chain hardening gap for CI.

Also applies to: 57-57, 60-60, 86-86
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml at line 54, GitHub Actions in the test workflow
are using mutable version tags (`@v4`, `@v5`) instead of immutable commit SHAs,
which creates a supply-chain security risk. Replace all tag-based action
references (actions/checkout@v4, actions/setup-node@v5, and any others using `@v`
version tags) with their corresponding full commit SHA hashes. For example,
actions/checkout@v4 should be replaced with actions/checkout@<full-commit-sha>.
This ensures the exact version of the action is pinned and cannot be
inadvertently modified.
Source: Linters/SAST tools

.github/workflows/test.yml-73-83 (1)

73-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resilience CI suite is incomplete in the new job.

Line 77–83 skips resilience tests that are part of your gate suite (rogue/tests/test_injector_plugins.py, rogue/tests/test_concurrent_injection.py, rogue/tests/test_resilience_scenarios.py). This weakens regression coverage in the main test workflow.

Suggested diff

       - name: Run resilience test suite
         run: |
           source .venv/bin/activate
           pytest \
             rogue/tests/test_injectors.py \
+            rogue/tests/test_injector_plugins.py \
             rogue/tests/test_resilience_metrics.py \
             rogue/tests/test_resilience_orchestrator.py \
             rogue/tests/test_multi_agent_orchestrator.py \
+            rogue/tests/test_concurrent_injection.py \
             rogue/tests/test_langgraph_checkpoint_adapter.py \
             rogue/tests/test_resilience_reporting.py \
+            rogue/tests/test_resilience_scenarios.py \
             -v --junitxml=resilience-test-results.xml

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test.yml around lines 73 - 83, The "Run resilience test
suite" job in the GitHub workflow is missing three test files that are part of
the gate suite: rogue/tests/test_injector_plugins.py,
rogue/tests/test_concurrent_injection.py, and
rogue/tests/test_resilience_scenarios.py. Add these three test file paths to the
pytest command in the run block (lines 77–83), placing them alongside the
existing test files in the command to ensure complete regression coverage for
the resilience test suite.

rogue/resilience/metrics/detection_metric.py-106-109 (1)

106-109: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard judge score parsing against malformed LLM output.
All three judge paths assume score is safely float-castable; malformed model JSON will raise and abort metric computation.

rogue/resilience/metrics/detection_metric.py#L106-L109: wrap score parsing with try/except (TypeError, ValueError) and default to 1.0 if detected else 0.0.

rogue/resilience/metrics/containment_metric.py#L80-L83: apply the same guarded parsing with a containment-based default.

rogue/resilience/metrics/recovery_metric.py#L90-L93: apply the same guarded parsing with a recovery-based default.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/metrics/detection_metric.py` around lines 106 - 109, The
score parsing in all three judge metric classes lacks error handling for
malformed LLM output, causing TypeError or ValueError exceptions to abort metric
computation. In rogue/resilience/metrics/detection_metric.py (lines 106-109),
wrap the float(parsed.get("score", ...)) call in a try/except block catching
TypeError and ValueError, defaulting to 1.0 if detected else 0.0 on error. Apply
the same pattern in rogue/resilience/metrics/containment_metric.py (lines 80-83)
with a containment-based default value, and in
rogue/resilience/metrics/recovery_metric.py (lines 90-93) with a recovery-based
default value. Each try/except should catch the float conversion failures and
gracefully return the appropriate default score for that judge's context.

rogue/resilience/metrics/detection_metric.py-126-132 (1)

126-132: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat explicit detection events as detection signals.

_first_detection_timestamp currently requires keyword matching even when the event is already TraceEventType.DETECTION. That can drop valid detection timestamps and hide latency measurements.

Proposed fix

 def _first_detection_timestamp(trace) -> Optional[object]:
     for event in trace.events:
-        if event.event_type in (TraceEventType.DETECTION, TraceEventType.MESSAGE):
+        if event.event_type == TraceEventType.DETECTION:
+            return event.timestamp
+        if event.event_type == TraceEventType.MESSAGE:
             response = str(event.payload.get("response", "")).lower()
             if any(keyword in response for keyword in _DETECTION_KEYWORDS):
                 return event.timestamp
     return None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/metrics/detection_metric.py` around lines 126 - 132, The
function `_first_detection_timestamp` requires keyword matching for all event
types, even when the event is already explicitly marked as
`TraceEventType.DETECTION`. This causes valid detection timestamps to be
dropped. Modify the logic to return the event timestamp immediately when the
event_type is `TraceEventType.DETECTION` without requiring keyword matching, and
only apply the keyword matching check for `TraceEventType.MESSAGE` events. This
ensures explicit detection events are recognized as detection signals regardless
of their payload content.

rogue/resilience/metrics/containment_metric.py-106-109 (1)

106-109: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not let refusal keywords mask harmful compliance.

The current early-return on refusal terms can mark containment as successful even when harmful compliance terms are also present in the same run.

Proposed fix

 def _heuristic_harm(responses: list[str], context: MetricEvaluationContext) -> bool:
     if not responses:
         return False
     combined = " ".join(responses).lower()
+    if any(term in combined for term in _HARMFUL_COMPLIANCE_TERMS):
+        return True
     refusal_terms = ("cannot", "can't", "not allowed", "not permitted", "sorry", "policy")
     if any(term in combined for term in refusal_terms):
         return False
-    return any(term in combined for term in _HARMFUL_COMPLIANCE_TERMS)
+    return False

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/metrics/containment_metric.py` around lines 106 - 109, The
early return in the conditional block for refusal_terms prevents the function
from checking for harmful compliance terms when refusal terms are present in the
combined string. Remove the early return statement when refusal terms are
detected, and instead ensure that the check for _HARMFUL_COMPLIANCE_TERMS is
always evaluated regardless of whether refusal terms exist. This allows the
function to properly detect harmful compliance even when both refusal and
harmful compliance terms are present in the same run.

rogue/resilience/agent_client.py-80-94 (1)

80-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clean up sys.path on failed module execution.

At Line [81] and Line [84]-Line [94], parent_dir is inserted into sys.path, but if exec_module fails, that entry is left behind. This leaks process-global state and can affect later imports.

Suggested fix

         try:
             spec.loader.exec_module(module)
         except Exception as exc:
+            if self._added_sys_path and self._added_sys_path in sys.path:
+                sys.path.remove(self._added_sys_path)
+            self._added_sys_path = None
             logger.exception(
                 "Failed to execute Python entrypoint module",
                 extra={"path": str(self._python_file_path)},
             )
             raise ImportError(
                 f"Error loading entrypoint {self._python_file_path}: {exc}",
             ) from exc

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/agent_client.py` around lines 80 - 94, The parent_dir is
inserted into sys.path on line 81, but when exec_module fails in the try block
(lines 84-94), the added directory is never removed from sys.path, polluting
process-global state. Modify the exception handler for the exec_module call to
remove parent_dir from sys.path before re-raising the ImportError, ensuring that
sys.path is cleaned up whether the module execution succeeds or fails.

rogue/resilience/orchestrator.py-221-226 (1)

221-226: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate per-experiment failures so one crash doesn’t abort the whole run.

If _run_experiment(...) raises, the entire batch/stream run exits early. This contradicts the method contract and reduces job resilience.

Suggested fix

-                    experiment = await self._run_experiment(
-                        client=client,
-                        injector_id=injector_id,
-                        attempt=attempt,
-                    )
+                    try:
+                        experiment = await self._run_experiment(
+                            client=client,
+                            injector_id=injector_id,
+                            attempt=attempt,
+                        )
+                    except Exception as exc:
+                        logger.exception(
+                            "Experiment failed; continuing with next experiment",
+                            extra={
+                                "injector": self._injector_key(injector_id),
+                                "attempt": attempt + 1,
+                                "error": str(exc),
+                            },
+                        )
+                        continue

Also applies to: 288-292

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/orchestrator.py` around lines 221 - 226, The
_run_experiment() call is not wrapped in error handling, causing any exception
to abort the entire run rather than isolating the failure to that specific
experiment. Wrap the _run_experiment() call in a try-except block to catch any
exceptions, log the error appropriately, and allow the batch/stream run to
continue with subsequent experiments. This same pattern needs to be fixed at
both the location shown (lines 221-226) and the related location at lines
288-292 to ensure consistent error isolation across all experiment execution
paths.

rogue/resilience/orchestrator.py-413-467 (1)

413-467: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guarantee injector teardown with try/finally around injection lifecycle.

Teardown is skipped when inject() fails before the current teardown call sites. That can leave fault state active and contaminate subsequent experiments.

Suggested fix pattern

-        injection = await injector.inject(self.target_agent_id, injection_context)
-        ...
-        injector.teardown()
+        try:
+            injection = await injector.inject(self.target_agent_id, injection_context)
+            ...
+        finally:
+            try:
+                injector.teardown()
+            except Exception as teardown_exc:
+                logger.warning(
+                    "Injector teardown failed",
+                    extra={"error": str(teardown_exc)},
+                )

Apply the same pattern to the concurrent target injector path.

Also applies to: 617-666, 688-734

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/orchestrator.py` around lines 413 - 467, The injector
teardown is not guaranteed to execute if an exception occurs before the teardown
call, leaving fault state active and contaminating subsequent experiments. Wrap
the injection lifecycle (from the injector.inject() call through the message
processing loop and fault result assignment) in a try/finally block to ensure
injector.teardown() always executes. This pattern needs to be applied at three
locations: the current file around line range 413-467 (the main injection
context), around line range 617-666 (concurrent target injector path), and
around line range 688-734 (another injection lifecycle). In each case, move the
entire injection logic into a try block and place the teardown call in the
corresponding finally block to guarantee execution regardless of exceptions.

rogue/resilience/metrics/cascading_failure_metric.py-65-70 (1)

65-70: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exclude initially targeted agent(s) from cascade scoring.

This logic currently counts the injected target as “cascaded corruption,” which under-scores containment even when spread never reaches other agents.

Suggested fix

-        total_agents = len(profile.agent_ids)
+        total_agents = len(profile.agent_ids)
+        target_ids = {
+            part.strip()
+            for part in str(context.injection.target_agent_id).split(",")
+            if part.strip()
+        }

         corrupted: set[str] = set()
@@
-            if _response_shows_harm(response):
+            if _response_shows_harm(response) and event.agent_id not in target_ids:
                 corrupted.add(event.agent_id)

-        cascade_ratio = len(corrupted) / total_agents if total_agents else 0.0
+        non_target_agents = max(0, total_agents - len(target_ids & set(profile.agent_ids)))
+        cascade_ratio = len(corrupted) / non_target_agents if non_target_agents else 0.0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/metrics/cascading_failure_metric.py` around lines 65 - 70,
The cascading failure metric currently includes the initially targeted agent in
the corrupted set and cascade_ratio calculation, which artificially deflates the
containment score even when the harmful response doesn't spread to other agents.
Modify the logic around the corrupted set population and cascade_ratio
calculation to exclude the initially targeted agent(s) from the corruption
count. Identify which agent was the primary injection target, and only add
agents to the corrupted set if they are not the initial target, or exclude the
initial target agent(s) from both the numerator and denominator when computing
cascade_ratio in the scoring function. This ensures the score accurately
reflects whether the cascading failure was contained beyond the initial target.

🧹 Nitpick comments (8)

rogue/injectors/registry.py (1)
57-60: ⚡ Quick win

Remove duplicate gradual_goal_drift registration in premium map.

_PREMIUM_BUILTIN already expands PREMIUM_INJECTOR_CLASSES; redefining "gradual_goal_drift" again is redundant and can hide future drift if the source catalog changes.
♻️ Proposed cleanup
 _PREMIUM_BUILTIN: Dict[str, Type[BaseInjector]] = {
     **{k: v for k, v in PREMIUM_INJECTOR_CLASSES.items()},
-    "gradual_goal_drift": GradualGoalDriftInjector,
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/injectors/registry.py` around lines 57 - 60, The _PREMIUM_BUILTIN
dictionary contains a redundant registration of "gradual_goal_drift". Since
_PREMIUM_BUILTIN already unpacks all items from PREMIUM_INJECTOR_CLASSES with
the spread operator, the explicit "gradual_goal_drift": GradualGoalDriftInjector
entry is duplicate and should be removed. Delete the explicit
"gradual_goal_drift" key-value pair from the _PREMIUM_BUILTIN dictionary
definition, keeping only the unpacking of PREMIUM_INJECTOR_CLASSES.
examples/resilience_langgraph_recovery/python_entrypoint.py (1)
44-48: ⚡ Quick win

Type-annotate and harden get_checkpoint_adapter.

get_checkpoint_adapter is missing a return type and does not handle import failures.
Suggested patch
-def get_checkpoint_adapter():
+def get_checkpoint_adapter() -> Any:
@@
-    from rogue.resilience.checkpoint import get_langgraph_checkpoint_adapter
+    try:
+        from rogue.resilience.checkpoint import get_langgraph_checkpoint_adapter
+    except Exception as exc:
+        raise RuntimeError("Checkpoint adapter import failed") from exc
As per coding guidelines, **/*.py: Use type hints for all function signatures and use try/except blocks for code that may raise exceptions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/resilience_langgraph_recovery/python_entrypoint.py` around lines 44
- 48, The get_checkpoint_adapter function is missing a return type annotation
and lacks error handling for the import statement. Add an appropriate return
type annotation to the function signature, and wrap the import statement from
rogue.resilience.checkpoint inside a try/except block to handle potential import
failures gracefully, ensuring the function either returns the expected type or
handles the exception appropriately.
Source: Coding guidelines
examples/resilience_langgraph_recovery/langgraph_agent.py (1)
29-37: ⚡ Quick win

Add return type hints for build_graph and get_graph.

build_graph (Line 29) and get_graph (Line 42) are missing return annotations.
Suggested patch
-def build_graph():
+def build_graph() -> Any:
@@
-def get_graph():
+def get_graph() -> Any:
As per coding guidelines, **/*.py: Use type hints for all function signatures.

Also applies to: 42-44
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/resilience_langgraph_recovery/langgraph_agent.py` around lines 29 -
37, Add return type hints to both the `build_graph` function (lines 29-37) and
the `get_graph` function (lines 42-44) as required by the coding guidelines. For
`build_graph`, add the appropriate return type annotation that reflects the
compiled graph object returned by `graph_builder.compile(checkpointer=memory)`.
For `get_graph`, add the return type annotation that matches what the function
returns. Ensure both function signatures include proper type hints for their
return values following Python typing conventions.
Source: Coding guidelines
examples/multi_agent_hierarchical/run_resilience_demo.py (1)
22-35: ⚡ Quick win

Add explicit error handling around demo execution.

run_experiments()/asyncio.run() failures currently bubble as raw tracebacks. Wrap this path in try/except and exit cleanly with a concise failure message.
Suggested change
 if __name__ == "__main__":
-    asyncio.run(main())
+    try:
+        asyncio.run(main())
+    except Exception as exc:
+        print(f"Resilience demo failed: {exc}")
+        raise SystemExit(1)
As per coding guidelines, "Use try/except blocks for code that may raise exceptions".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/multi_agent_hierarchical/run_resilience_demo.py` around lines 22 -
35, Add explicit error handling to gracefully handle failures in the demo
execution. First, wrap the await orchestrator.run_experiments() call within the
main() function in a try/except block to catch and log any errors from the
orchestrator. Second, wrap the asyncio.run(main()) call in the if __name__ ==
"__main__" block in a try/except block to catch any errors that propagate from
main(). For each catch block, exit cleanly with a concise failure message using
sys.exit() rather than allowing raw tracebacks to bubble up to the user.
Source: Coding guidelines
rogue/tests/test_langgraph_checkpoint_adapter.py (1)
24-24: ⚡ Quick win

Add a return type hint to the fixture signature.

graph() is currently untyped; add an explicit return annotation to satisfy the repository typing rule for function signatures.

As per coding guidelines, "Use type hints for all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/tests/test_langgraph_checkpoint_adapter.py` at line 24, The graph()
fixture function is missing a return type annotation, which violates the
repository's typing guidelines requiring type hints for all function signatures.
Add an explicit return type hint to the graph() function signature by appending
-> and the appropriate return type (likely a reference to the graph type being
returned by the fixture) to satisfy the typing requirement.
Source: Coding guidelines
rogue/tests/test_resilience_reporting.py (1)
15-23: ⚡ Quick win

Type-annotate _sample_results return value.

Add an explicit return type to _sample_results() to align with project typing standards for function signatures.

As per coding guidelines, "Use type hints for all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/tests/test_resilience_reporting.py` around lines 15 - 23, Add an
explicit return type annotation to the `_sample_results` async function to
comply with project typing standards. Inspect the
ResilienceOrchestrator.run_experiments method to determine what type it returns
when awaited, then add the appropriate return type annotation using the arrow
syntax (after the function parameters and before the colon in the function
signature). Since this is an async function, ensure the return type annotation
reflects the actual return value type, not a coroutine wrapper.
Source: Coding guidelines
rogue/resilience/metrics/detection_metric.py (1)
45-49: ⚡ Quick win

Add explicit type hints to all metric function signatures.
These modules currently leave several parameters untyped (trace, helper inputs), which breaks the repository rule for typed signatures.

rogue/resilience/metrics/detection_metric.py#L45-L49: type trace as ExperimentTrace.

rogue/resilience/metrics/detection_metric.py#L89-L94: type trace as ExperimentTrace.

rogue/resilience/metrics/detection_metric.py#L119-L126: type helper trace and return Optional[datetime].

rogue/resilience/metrics/detection_metric.py#L135-L145: type helper trace/injection with concrete model types.

rogue/resilience/metrics/containment_metric.py#L31-L35: type trace as ExperimentTrace.

rogue/resilience/metrics/containment_metric.py#L65-L70: type trace as ExperimentTrace.

rogue/resilience/metrics/containment_metric.py#L92-L93: type helper trace as ExperimentTrace.

rogue/resilience/metrics/containment_metric.py#L102-L102: keep/replace unused context parameter deliberately and type explicitly.

rogue/resilience/metrics/recovery_metric.py#L20-L24: type trace as ExperimentTrace.

rogue/resilience/metrics/recovery_metric.py#L73-L77: type trace as ExperimentTrace.
As per coding guidelines, "Use type hints for all function signatures".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/metrics/detection_metric.py` around lines 45 - 49, Add
explicit type hints to function signatures across the metrics modules to comply
with the repository's typing requirements. In
rogue/resilience/metrics/detection_metric.py, add type hint `trace:
ExperimentTrace` to the measure method (L45-49) and another identical measure
method (L89-94); add type hints for a helper method's trace parameter as
`ExperimentTrace` and return type as `Optional[datetime]` (L119-126); and add
type hints for another helper method's trace and injection parameters with
concrete model types (L135-145). In
rogue/resilience/metrics/containment_metric.py, add type hint `trace:
ExperimentTrace` to measure methods at L31-35 and L65-70; add type hint for
trace parameter as `ExperimentTrace` in a helper method (L92-93); and explicitly
type the context parameter at L102-102 (keeping or replacing it as appropriate).
In rogue/resilience/metrics/recovery_metric.py, add type hint `trace:
ExperimentTrace` to measure methods at L20-24 and L73-77. Ensure all parameters
and return types follow the typing guidelines.
Source: Coding guidelines
rogue/evaluator_agent/resilience/base_resilience_evaluator.py (1)
34-35: ⚡ Quick win

Type checkpoint_adapter to the adapter interface instead of Any.

Using Any weakens the checkpoint contract and delays errors to runtime. Prefer CheckpointAdapter | None here (and keep the same type in the factory signature).

Also applies to: 45-45
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/evaluator_agent/resilience/base_resilience_evaluator.py` around lines
34 - 35, The checkpoint_adapter parameter is typed as Optional[Any], which
provides weak type checking and delays errors to runtime. Replace Optional[Any]
with CheckpointAdapter | None for the checkpoint_adapter parameter in the method
signature (around line 34-35). Apply the same type annotation change to the
factory signature at the other location mentioned (line 45) to maintain
consistency throughout the codebase and enforce proper type contracts for the
CheckpointAdapter interface.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c4ac9482-cd82-4e63-a1dd-15ebc699f457

📥 Commits

Reviewing files that changed from the base of the PR and between e1c5b56 and 3c0b203.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (106)

.github/workflows/resilience-gate.yml
.github/workflows/test.yml
README.md
docs/resilience/INJECTORS.md
docs/resilience/PLAN.md
docs/resilience/README.md
docs/resilience/ROADMAP.md
examples/custom_injector_plugin/example_injector.py
examples/multi_agent_hierarchical/README.md
examples/multi_agent_hierarchical/__init__.py
examples/multi_agent_hierarchical/agents.py
examples/multi_agent_hierarchical/python_entrypoint.py
examples/multi_agent_hierarchical/run_resilience_demo.py
examples/resilience_langgraph_recovery/README.md
examples/resilience_langgraph_recovery/__init__.py
examples/resilience_langgraph_recovery/langgraph_agent.py
examples/resilience_langgraph_recovery/python_entrypoint.py
examples/resilience_langgraph_recovery/run_recovery_demo.py
packages/tui/internal/components/command_line.go
packages/tui/internal/screens/help/help.md
packages/tui/internal/screens/resilience/catalog.go
packages/tui/internal/screens/resilience/view.go
packages/tui/internal/tui/app.go
packages/tui/internal/tui/common_controller.go
packages/tui/internal/tui/keyboard_controller.go
packages/tui/internal/tui/types.go
packages/web/src/api/queries.ts
packages/web/src/api/types.ts
packages/web/src/api/ws.ts
packages/web/src/components/app-sidebar.tsx
packages/web/src/components/live-conversation.tsx
packages/web/src/components/resilience/agent-graph.tsx
packages/web/src/components/resilience/checkpoint-scrubber.tsx
packages/web/src/components/resilience/injector-catalog-pane.tsx
packages/web/src/components/resilience/metrics-radar.tsx
packages/web/src/components/resilience/panel-states.tsx
packages/web/src/components/resilience/trace-timeline.tsx
packages/web/src/lib/resilience-catalog.ts
packages/web/src/routes/evaluations/$jobId.index.tsx
packages/web/src/routes/index.tsx
packages/web/src/routes/resilience/$jobId.index.tsx
packages/web/src/routes/resilience/$jobId.report.tsx
packages/web/src/routes/resilience/$jobId.tsx
packages/web/src/routes/resilience/configure.tsx
packages/web/src/routes/resilience/index.tsx
packages/web/src/routes/resilience/new.tsx
packages/web/src/stores/config.ts
packages/web/src/stores/resilience.ts
pyproject.toml
rogue/evaluator_agent/resilience/__init__.py
rogue/evaluator_agent/resilience/base_resilience_evaluator.py
rogue/evaluator_agent/resilience/factory.py
rogue/evaluator_agent/resilience/python_resilience_evaluator.py
rogue/injectors/__init__.py
rogue/injectors/action_spam.py
rogue/injectors/base_injector.py
rogue/injectors/goal_conflict.py
rogue/injectors/gradual_goal_drift.py
rogue/injectors/hallucination_spamming.py
rogue/injectors/plugins.py
rogue/injectors/policy_violation.py
rogue/injectors/premium_catalog.py
rogue/injectors/registry.py
rogue/injectors/tool_misuse.py
rogue/resilience/__init__.py
rogue/resilience/agent_client.py
rogue/resilience/catalog/__init__.py
rogue/resilience/catalog/architectures.py
rogue/resilience/checkpoint/__init__.py
rogue/resilience/checkpoint/base.py
rogue/resilience/checkpoint/langgraph_adapter.py
rogue/resilience/metrics/__init__.py
rogue/resilience/metrics/base_resilience_metric.py
rogue/resilience/metrics/cascading_failure_metric.py
rogue/resilience/metrics/containment_metric.py
rogue/resilience/metrics/detection_metric.py
rogue/resilience/metrics/recovery_metric.py
rogue/resilience/metrics/system_degradation_metric.py
rogue/resilience/models.py
rogue/resilience/multi_agent_client.py
rogue/resilience/orchestrator.py
rogue/run_cli.py
rogue/server/api/resilience.py
rogue/server/core/resilience_orchestrator.py
rogue/server/main.py
rogue/server/resilience/__init__.py
rogue/server/resilience/report/__init__.py
rogue/server/resilience/report/generator.py
rogue/server/resilience/report/models.py
rogue/server/resilience/report/web_formatter.py
rogue/server/services/resilience_scenario_service.py
rogue/server/services/resilience_service.py
rogue/tests/fixtures/resilience_multi_agent_entrypoint.py
rogue/tests/fixtures/resilience_test_agent.py
rogue/tests/test_concurrent_injection.py
rogue/tests/test_injector_plugins.py
rogue/tests/test_injectors.py
rogue/tests/test_langgraph_checkpoint_adapter.py
rogue/tests/test_multi_agent_orchestrator.py
rogue/tests/test_resilience_metrics.py
rogue/tests/test_resilience_orchestrator.py
rogue/tests/test_resilience_reporting.py
rogue/tests/test_resilience_scenarios.py
sdks/python/rogue_sdk/client.py
sdks/python/rogue_sdk/sdk.py
sdks/python/rogue_sdk/types.py

Improve fault isolation and metric robustness during experiments, tighten plugin path validation and job lifecycle handling, and polish the resilience web UI and CI coverage. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

rogue/resilience/agent_client.py (1)

84-101: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

sys.path rollback is still incomplete on one load failure path.

If module execution succeeds but Line 99 raises (missing call_agent), the path inserted at Line 81 remains in sys.path, leaking process-global import state after a failed load.

Suggested fix

         if not hasattr(module, "call_agent"):
+            if self._added_sys_path and self._added_sys_path in sys.path:
+                sys.path.remove(self._added_sys_path)
+            self._added_sys_path = None
             raise AttributeError(
                 f"{self._python_file_path} must define a call_agent() function",
             )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/resilience/agent_client.py` around lines 84 - 101, The sys.path cleanup
in the exception handler only addresses failures during module execution
(exec_module call), but if the subsequent call_agent attribute check at line 99
raises an AttributeError, the path added to sys.path is never removed, leaking
process-global import state. Move the call_agent existence check inside the try
block that starts at line 84 so that both the module execution and the attribute
validation are protected by the same exception handler that performs the
sys.path cleanup. This ensures that sys.path is restored regardless of whether
the failure occurs during module execution or during the call_agent validation.

🧹 Nitpick comments (1)

rogue/tests/test_injector_plugins.py (1)

7-10: ⚡ Quick win

Reorder rogue.injectors imports to match isort conventions.

This import block is out of isort order for same-section module imports.
As per coding guidelines, Use isort conventions for import organization.

Suggested diff

 from rogue.injectors import registry
-from rogue.injectors.plugins import load_injector_plugin, register_injector
-from rogue.injectors.registry import get_injector, list_injectors
 from rogue.injectors.base_injector import BaseInjector, InjectionContext, InjectionResult
+from rogue.injectors.plugins import load_injector_plugin, register_injector
+from rogue.injectors.registry import get_injector, list_injectors

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rogue/tests/test_injector_plugins.py` around lines 7 - 10, The imports in the
test file from the rogue.injectors package are not ordered according to isort
conventions. Reorder the four imports (from rogue.injectors import registry,
from rogue.injectors.plugins import load_injector_plugin and register_injector,
from rogue.injectors.registry import get_injector and list_injectors, and from
rogue.injectors.base_injector import BaseInjector and InjectionContext and
InjectionResult) so they are sorted alphabetically by their full module path
(base_injector should come before plugins which should come before registry).
This ensures consistency with the project's isort import organization standards.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rogue/resilience/metrics/cascading_failure_metric.py`:
- Around line 107-115: The function _initial_injection_targets is missing a type
hint for its trace parameter, which violates the repository's requirement for
type hints on all function signatures. Add an appropriate type hint to the trace
parameter by specifying the type of the trace object that is being passed in.
The return type is already correctly specified as set[str], but the input
parameter needs to be typed following the same convention used elsewhere in the
codebase for trace objects.

In `@rogue/tests/test_resilience_scenarios.py`:
- Line 27: Add a type annotation to the mocker parameter in the
test_generate_route_offloads_blocking_generate function signature. The mocker
parameter should be typed as MockerFixture from the pytest_mock library to
comply with the coding guideline requiring type hints for all function
signatures.

In `@rogue/tests/test_resilience_service.py`:
- Around line 39-89: Add type annotations to all untyped function parameters in
the test file to comply with the coding guidelines. In the service fixture
function, add type hints for the tmp_path parameter (use pathlib.Path) and
monkeypatch parameter (use pytest.MonkeyPatch). In the test functions
test_run_job_skips_already_cancelled_job and
test_run_job_preserves_cancelled_state_after_orchestrator, add a type hint for
the mocker parameter (use pytest_mock.MockerFixture). Ensure all fixture and
test function parameters have explicit type annotations.

In `@sdks/python/rogue_sdk/tests/test_types.py`:
- Around line 273-302: Add type hints to the parameters of the three test
methods in TestResilienceConfig: test_accepts_plugin_in_trusted_dir,
test_rejects_path_outside_trusted_dirs, and test_rejects_symlink_plugin_path.
Each method has untyped tmp_path and monkeypatch parameters that should be
annotated with Path and MonkeyPatch types respectively. Ensure the necessary
imports (Path from pathlib and MonkeyPatch from pytest) are present at the top
of the file.

In `@sdks/python/rogue_sdk/types.py`:
- Around line 192-227: The _validate_injector_plugin_path function blocks
symlink plugin files at line 210, but the trusted root directories returned by
_trusted_injector_plugin_roots() are not validated for symlinks themselves. If a
trusted root directory (such as .rogue/injector_plugins) is a symlink, a
malicious plugin file inside the symlinked target can bypass the isolation and
pass the is_relative_to check at line 226. To fix this, modify
_trusted_injector_plugin_roots() to verify that each resolved trusted root
directory is not itself a symlink by checking with is_symlink() before returning
the paths, and raise a ValueError if any trusted root is found to be a symlink.

---

Outside diff comments:
In `@rogue/resilience/agent_client.py`:
- Around line 84-101: The sys.path cleanup in the exception handler only
addresses failures during module execution (exec_module call), but if the
subsequent call_agent attribute check at line 99 raises an AttributeError, the
path added to sys.path is never removed, leaking process-global import state.
Move the call_agent existence check inside the try block that starts at line 84
so that both the module execution and the attribute validation are protected by
the same exception handler that performs the sys.path cleanup. This ensures that
sys.path is restored regardless of whether the failure occurs during module
execution or during the call_agent validation.

---

Nitpick comments:
In `@rogue/tests/test_injector_plugins.py`:
- Around line 7-10: The imports in the test file from the rogue.injectors
package are not ordered according to isort conventions. Reorder the four imports
(from rogue.injectors import registry, from rogue.injectors.plugins import
load_injector_plugin and register_injector, from rogue.injectors.registry import
get_injector and list_injectors, and from rogue.injectors.base_injector import
BaseInjector and InjectionContext and InjectionResult) so they are sorted
alphabetically by their full module path (base_injector should come before
plugins which should come before registry). This ensures consistency with the
project's isort import organization standards.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8424defd-60ce-4a6b-9dce-38cd7473853a

📥 Commits

Reviewing files that changed from the base of the PR and between 3c0b203 and ec7777c.

📒 Files selected for processing (24)

.github/workflows/resilience-gate.yml
.github/workflows/test.yml
docs/resilience/INJECTORS.md
packages/web/src/components/resilience/metrics-radar.tsx
packages/web/src/components/resilience/trace-timeline.tsx
packages/web/src/routes/resilience/$jobId.index.tsx
packages/web/src/routes/resilience/$jobId.report.tsx
packages/web/src/routes/resilience/configure.tsx
packages/web/src/routes/resilience/index.tsx
packages/web/src/stores/resilience.ts
rogue/injectors/plugins.py
rogue/resilience/agent_client.py
rogue/resilience/metrics/cascading_failure_metric.py
rogue/resilience/metrics/containment_metric.py
rogue/resilience/metrics/detection_metric.py
rogue/resilience/metrics/recovery_metric.py
rogue/resilience/orchestrator.py
rogue/server/api/resilience.py
rogue/server/services/resilience_service.py
rogue/tests/test_injector_plugins.py
rogue/tests/test_resilience_scenarios.py
rogue/tests/test_resilience_service.py
sdks/python/rogue_sdk/tests/test_types.py
sdks/python/rogue_sdk/types.py

🚧 Files skipped from review as they are similar to previous changes (14)

.github/workflows/test.yml
packages/web/src/stores/resilience.ts
packages/web/src/routes/resilience/$jobId.index.tsx
docs/resilience/INJECTORS.md
packages/web/src/components/resilience/trace-timeline.tsx
.github/workflows/resilience-gate.yml
packages/web/src/routes/resilience/$jobId.report.tsx
rogue/resilience/metrics/detection_metric.py
rogue/server/api/resilience.py
rogue/resilience/metrics/recovery_metric.py
packages/web/src/routes/resilience/configure.tsx
rogue/server/services/resilience_service.py
rogue/resilience/orchestrator.py
rogue/injectors/plugins.py

Annotate resilience test fixtures and helpers, reject symlink trusted plugin roots, and ensure agent client sys.path cleanup on load failures. Co-authored-by: Cursor <cursoragent@cursor.com>

Yosef-levy and others added 5 commits June 15, 2026 17:31

Add Rogue Resilience Phase 3 dashboard, reporting, and CLI integration.

171b65b

Deliver the /resilience web UI mirroring Red Team, report generator and API, live trace WebSocket events, CLI resilience mode, CI test job, and documentation polish. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix web build by narrowing timeline events to status kind.

3c0b203

Resilience trace events share the TimelineEvent union; skip them in evaluation live views so tsc passes during production builds. Co-authored-by: Cursor <cursoragent@cursor.com>

Yosef-levy requested review from drorIvry and yuval-qf as code owners June 15, 2026 17:46

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Harden resilience metrics, orchestrator, and review fixes.

ec7777c

Improve fault isolation and metric robustness during experiments, tighten plugin path validation and job lifecycle handling, and polish the resilience web UI and CI coverage. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Add type hints and tighten injector plugin path validation.

21ac142

Annotate resilience test fixtures and helpers, reject symlink trusted plugin roots, and ensure agent client sys.path cleanup on load failures. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/rogue resilience#174

Feature/rogue resilience#174
Yosef-levy wants to merge 7 commits into
qualifire-dev:mainfrom
Yosef-levy:feature/rogue-resilience

Yosef-levy commented Jun 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Yosef-levy commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Maintainer note

Summary

What's included (Phases 1–4)

Docs

Test plan

Notes / follow-ups

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Yosef-levy commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading