Added SADE agent #5
Conversation
…layout
End-to-end step1+step2+step3 on simple_bgp + frr_service_down hung or
degraded in five places. Verified fixed by a clean run that diagnoses the
fault via the SADE workflow (Skill -> h.py -> helper) and exits cleanly
with the eval-pipeline token summary populated.
src/agent/claude_code_agent_sade.py
- Restore `break` after `ResultMessage` so the async-for loop exits when
the SDK emits the result. Without it step3 hangs at the end of every
session.
- Restore the `token_summary` block + flat `tokens.*` fields in the
`llm_end` event. `trace_parser.py` and the evaluation CSV read these
flat fields; on the previous wrapper they populated as zero even when
a run consumed ~450k input + ~7k output tokens.
- Restore the `system_logger.info("REMINDER at API turn N/M")` line, the
reproducibility marker for the half-way pivot.
- Repoint `cwd` from `Path(__file__).parent` to `parents[2] /
"network_agents/SADE/src/agent"` so the SDK loads `.claude/skills/`
and `CLAUDE.md` from the embedded SADE engine. The wrapper file lives
at `src/agent/...` but the engine lives nested under the
contributor-agent dir; the unchanged path resolved to a dir with no
skills, and the agent never made a single Skill call.
src/agent/utils/mcp_servers.py
- Restore the Windows-aware venv-python detection (`sys.platform ==
"win32"` selects `.venv/Scripts/python.exe`, otherwise `bin/python`,
falling back to `sys.executable`) and `BASE_DIR` env forwarding.
Hardcoding `"python3"` failed on Windows where that name is not on
PATH, so all five MCP servers failed to spawn and the agent saw zero
`mcp__*` tools.
network_agents/SADE/h.py
- Detect the project venv at the outer repo root when SADE is embedded
(`../../.venv/`), falling back to the standalone layout (`./.venv/`).
Without this the launcher ran helper scripts under the system Python
(no `nika` installed) and every helper bailed with a misleading
"missing project dependency" error pointing at a path that does not
exist on the embedded layout.
- Apply the same standalone-vs-embedded fallback to
`runtime/current_session.json` lookup so `_child_env` injects the
correct `LAB_NAME` instead of silently defaulting to a hardcoded lab.
_utils/wireguard/keys.txt + .gitignore
- Removed the file (committed in b3aebe5 with four WireGuard
private/public key pairs in plaintext) and added the path to
`.gitignore`. `kathara_lab_generator.py` raises a clear
FileNotFoundError with regeneration instructions when the file is
missing, so no scaffold is needed.
- NOTE: removal here is the working-tree fix only. The keys are still
in git history (commit b3aebe5) and must be rotated, with history
scrubbed (BFG / git-filter-repo), before this branch goes public.
…parsing + Windows-safe teardown
Surfaced while running step4_result_eval.py end-to-end on the previously
committed SADE wrapper fix (d66ccae): no `llm_judge.json` was produced
and the eval CSV showed `in_tokens=0, out_tokens=0` for a run that
actually used ~380k input + ~5k output tokens. All five reverts hash-
match upstream main; the only non-revert addition is the explanatory
comment.
src/nika/evaluator/trace_parser.py
- Restore `encoding="utf-8"` on the conversation log read. Windows
defaults to cp1252 and the SADE run logs contain en-dashes, arrows,
and other non-ASCII characters; without UTF-8 the parser crashed
before counting anything.
- Restore the `_accumulate_tokens()` dispatch covering all three
llm_end token formats: LangGraph `usage_metadata`, Claude Code SDK
`tokens.total_input_tokens`, and a raw Anthropic `usage` fallback.
The simplified version only handled the LangGraph schema, so every
Claude Code / SADE run wrote `in_tokens=0, out_tokens=0` to the eval
CSV even when the conversation log captured the real counts. This
pairs with the wrapper's `tokens.*` field restored in d66ccae.
src/nika/evaluator/llm_judge.py
- Restore `encoding="utf-8"` on both the trace read and the judge JSON
write. The judge response routinely contains `->`, arrows, and other
non-ASCII characters; without UTF-8 the write crashed with
`UnicodeEncodeError 'charmap'` and no `llm_judge.json` was produced.
src/nika/utils/logger.py + src/nika/utils/session.py
- Restore the `close_logger()` function in `logger.py` (closes and
detaches every handler on the `SystemLogger`) and its call at the
top of `Session.clear_session()`. Without releasing the file handle
first, `shutil.move("runtime/system.log", ...)` raises `PermissionError
[WinError 32]` on Windows because the file is still open; the
fallback copy-then-unlink leaves both the source and destination on
disk and the eval script aborts mid-teardown.
src/scripts/step4_result_eval.py
- Restore `encoding="utf-8"` on the `ground_truth.json` and
`submission.json` reads (same cp1252 fallback risk as above for any
payload containing non-ASCII tokens).
- Restore the `try/except` wrapper around `net_env.undeploy()`. The
eval row is already written by `_eval_problem` before this point;
a transient Docker 403 "active endpoints" race during undeploy is
cleanup-only and must not bubble up, or the benchmark runner will
record a duplicate SKIP row for a case that actually succeeded.
Verified by a full clean step1 -> step2 -> step3 -> step4 on
`simple_bgp + frr_service_down`:
- step4 exits 0 (was: PermissionError teardown crash)
- `runtime/` empties on teardown (was: orphaned `system.log`)
- eval CSV: `in_tokens=379311, out_tokens=4972` (was: `0, 0`)
- `llm_judge.json` produced with scores 5/5 across relevance,
correctness, efficiency, clarity, final_outcome, overall
- detection / localization / rca all 1.0
There was a problem hiding this comment.
@YasodGinige please review with the following changes:
Remove all experiments-related artifacts: remove SADE figures, tables, execution artifacts.
Conform to the existing project structure, and do not use a separate folder network_agents for the new contributions. Suggestions:
- if you made changes to
nika(e.g., fixed failure injectors, added new problems, or added new network topologies and configurations) pull frommainthe latest commit, and merge your changes there. Do not add duplicate code as a separate folder. - move all changes that include the SADE agent (e.g., prompts, SADE skills, agent code) under
src/agent/community/sade. You can create thecommunity/sadesub-folder.
Briefly document relevant changes: please add a comment in your PR discussing an overview of the changes and why they have been necessary.
|
Dear Alessandro, Thank you very much for the detailed review and suggestions. We've addressed all of them in a new pull request, #6, which we rebased onto the latest main, so it fits the current pluggable agent architecture. In short: SADE now lives under src/agent/community/sade/ as a pluggable "-a sade" agent (implementing protocols.TroubleshootingAgent), the separate network_agents/ folder and all experiment artifacts have been removed. Since this branch is now outdated and conflicts with the current main, we'd suggest continuing the review on #6 - happy to close this one in its favor whenever you prefer. Best regards, |
We created this pull request at your request in our repository. We have added the SADE agent to the network_agents directory. Please review and merge.