From ae4cd549dc70c47b705f1c51b1fc899af88685fb Mon Sep 17 00:00:00 2001 From: Evgeny Kiriyak <224408464+evkir@users.noreply.github.com> Date: Fri, 19 Jun 2026 00:06:25 +0300 Subject: [PATCH 1/4] docs: comprehensive README rewrite for v1.0 positioning --- README.md | 223 +++++++++++++++++++++++++++--------------------------- 1 file changed, 112 insertions(+), 111 deletions(-) diff --git a/README.md b/README.md index 8f2e2d3..73c7c30 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,14 @@
- -![CI](https://github.com/evkir/CyberAI/actions/workflows/ci.yml/badge.svg) ![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue) ![License](https://img.shields.io/badge/license-MIT-green) +![CI](https://github.com/evkir/CyberAI/actions/workflows/ci.yml/badge.svg) +![Python](https://img.shields.io/badge/python-3.11%2B-blue) +![License](https://img.shields.io/badge/license-MIT-green) +![Status](https://img.shields.io/badge/status-v0.5.0-orange) +![LLM](https://img.shields.io/badge/LLM-OpenAI%20%7C%20Anthropic-blueviolet) # πŸ€– CyberAI -**AI-powered pentest orchestration platform** - -![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=flat-square&logo=python&logoColor=white) -![License](https://img.shields.io/badge/License-MIT-green?style=flat-square) -![Status](https://img.shields.io/badge/Status-Active%20Development-orange?style=flat-square) -![LLM](https://img.shields.io/badge/LLM-OpenAI%20%7C%20Anthropic-blueviolet?style=flat-square) +**OOB-driven, agent-trust-aware AI pentest platform** > Built by someone who red-teams AI, not just with it. @@ -20,121 +18,132 @@ ## What is CyberAI? -CyberAI is a multi-agent orchestration layer for offensive security workflows. -It connects the **phantom toolchain** β€” OOB detection, CVE intelligence, TLS analysis β€” -and routes findings through an AI pipeline that surfaces actionable attack paths. +CyberAI is a multi-agent orchestration layer for offensive security. Five +specialized agents β€” **Recon, Intel, Exploit, Report, Web3** β€” run a typed, +auditable pipeline that turns a target into actionable attack paths and a +validated report. + +Two things set it apart from "LLM wrapper over nmap": -This is not a chatbot wrapper for pentesters. -It's an agentic system where specialized AI agents handle recon, correlation, -and reporting autonomously β€” while you focus on what matters: exploitation. +- **OOB-driven exploitation.** Blind vulns (SSRF, XXE, blind injection) are + confirmed through out-of-band callbacks captured by + [phantom-grid](https://github.com/evkir/phantom-grid), not guessed from + response diffs. +- **Agent-trust-aware design.** Every banner and tool output is treated as + untrusted input: sanitized, injection-scanned, and parsed before it ever + reaches the LLM context. Adversarial thinking is a design input, not a + disclaimer. + +Reach beyond the network: the **Web3 agent** runs Slither static analysis and +maps detectors to Immunefi severity tiers for smart-contract audits. --- -## Architecture +## Architecture +------------------+ target -----------> | Orchestrator | typed pipeline, dry-run, budget -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ CyberAI Core β”‚ -β”‚ β”‚ -β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ -β”‚ β”‚ Orchestrator │──────▢│ Agent Pool β”‚ β”‚ -β”‚ β”‚ Agent β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ -β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Recon Agent β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”‚ Intel Agent β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”‚ Exploit Agent β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β”‚ Report Agent β”‚ β”‚ β”‚ -β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ -β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ -β”‚ β–Ό β”‚ -β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ -β”‚ β”‚ Phantom Stack β”‚ β”‚ -β”‚ β”‚ phantom-grid Β· phantom-intel β”‚ β”‚ -β”‚ β”‚ reality-probe β”‚ β”‚ -β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` ++--------+---------+ injection-scan at phase boundaries -### Agent responsibilities +| -| Agent | Role | -|-------|------| -| **Orchestrator** | Routes tasks, manages agent lifecycle, aggregates results | -| **Recon** | Target enumeration β€” DNS, WHOIS, subdomains, open ports | -| **Intel** | CVE lookups, CVSS scoring, exploit availability | -| **Exploit** | CVE β†’ PoC mapping, attack surface analysis | -| **Report** | Findings aggregation β†’ structured Markdown / PDF output | ++-----------+----------+-----------+------------+ ---- +v v v v v -## Security design ++------+ +------+ +--------+ +--------+ +------+ + +|Recon |-->|Intel |-->|Exploit |->|Report | | Web3 | (standalone) + ++------+ +------+ +---+----+ +--------+ +--+---+ -Multi-agent security is a first-class concern, not an afterthought: +DNS NVD/CVE OOB | PoC judge | Slither -- **Agent trust boundaries** β€” each agent operates with minimal necessary permissions -- **Input validation** β€” all external data sanitized before entering the LLM context -- **Prompt injection resistance** β€” structured prompts, output parsing, no raw passthrough -- **Audit trail** β€” every agent action logged with full inputs and outputs +nmap EPSS nuclei H1-export | Immunefi -> The irony of building an AI pentest tool while studying AI attack surfaces -> is intentional. Adversarial thinking is a design input. +subdom prioritize | | severity + +v + ++-------------+ + +| phantom-grid| OOB callback capture + ++-------------+ +Observability: SQLite audit log . session export/import . cyberai replay + +Interfaces: CLI . FastAPI dashboard (SSE) . MCP server (Claude Desktop) ### Agents + +| Agent | Input | Output | Key tools | +|-------|-------|--------|-----------| +| **Recon** | target | open ports, DNS, WHOIS, subdomains | nmap (flag-whitelisted), async DNS, subdomain enum | +| **Intel** | recon kb | ranked CVEs | NVD client, EPSS enrichment, risk prioritizer | +| **Exploit** | intel kb | attack paths, OOB findings | nuclei, searchsploit, OOB/SSRF/XXE workflows | +| **Report** | session kb | structured Markdown / H1 export | LLM summary + LLM-as-judge validation | +| **Web3** | .sol path / address | severity-tiered findings | Slither, Etherscan, Immunefi classifier | --- -## Project structure +## Security design -``` -CyberAI/ -β”œβ”€β”€ cyberai/ -β”‚ β”œβ”€β”€ core/ # Orchestrator, config, LLM client -β”‚ β”œβ”€β”€ agents/ -β”‚ β”‚ β”œβ”€β”€ recon/ # Target enumeration pipeline -β”‚ β”‚ β”œβ”€β”€ intel/ # CVE intelligence feed -β”‚ β”‚ β”œβ”€β”€ exploit/ # CVE β†’ PoC mapping -β”‚ β”‚ └── report/ # Report generation -β”‚ β”œβ”€β”€ integrations/ # Phantom stack connectors -β”‚ └── utils/ # Shared helpers -β”œβ”€β”€ templates/ # Jinja2 report templates -β”œβ”€β”€ tests/ -β”‚ β”œβ”€β”€ unit/ -β”‚ └── integration/ -β”œβ”€β”€ config.example.yml -β”œβ”€β”€ .env.example -β”œβ”€β”€ requirements.txt -└── setup.py -``` +- **Agent trust boundaries** β€” each agent runs with minimal permissions. +- **Untrusted input handling** β€” banners sanitized, length-capped, marked + `UNTRUSTED` before LLM context. +- **Prompt-injection detection** β€” 33-pattern detector at every phase boundary; + hits become MEDIUM findings, visible in the report. +- **Scope enforcement** β€” wildcard + `!`-exclusion matching honors HackerOne / + Bugcrowd briefs (`cyberai scope import`). +- **Audit trail** β€” every agent action logged (JSONL or SQLite) with full + inputs/outputs; sessions are replayable. --- ## Quick start -**1. Clone and install** - ```bash git clone https://github.com/evkir/CyberAI.git cd CyberAI pip install -e . ``` -> Prefer isolation? Run `python -m venv venv && source venv/bin/activate` first. - -**2. Configure** - ```bash cp config.example.yml config.yml cp .env.example .env -# Edit .env -- add your OPENAI_API_KEY or ANTHROPIC_API_KEY +# Edit .env β€” add OPENAI_API_KEY or ANTHROPIC_API_KEY (not needed for --dry-run) ``` -**3. Run a scan** - ```bash -# Dry-run: walks all 4 phases, no network calls, no API key needed +# Dry-run: walks all 4 phases, no network, no API key python -m cyberai scan example.com --dry-run -# Real scan -python -m cyberai scan target.htb +# Real scan, scope-restricted +python -m cyberai scan target.htb --scope '*.target.htb' + +# Replay a saved session deterministically +python -m cyberai replay + +# Import a bug-bounty scope +python -m cyberai scope import h1 --program acme + +# Status / config +python -m cyberai status +``` + +### Web dashboard + +```bash +uvicorn cyberai.web.app:app --reload +# http://127.0.0.1:8000 β€” session list, live SSE progress, report view +``` + +### MCP server (Claude Desktop / Cursor) + +```bash +python -m cyberai.mcp.server ``` +Exposes recon/intel tools (`nmap_scan`, `dns_enum`, `cve_search`, +`epss_score`, …) over the Model Context Protocol. See +[docs/mcp/integration.md](docs/mcp/integration.md). + --- ## Configuration @@ -142,37 +151,31 @@ python -m cyberai scan target.htb ```yaml # config.yml llm: - provider: openai # openai | anthropic + provider: openai # openai | anthropic model: gpt-4o max_tokens: 4096 temperature: 0.2 phantom: - grid_url: http://127.0.0.1:8080 - intel_db: ~/.phantom/intel.db + grid_url: http://127.0.0.1:9090 output_dir: reports/ -verbose: false -timeout: 60 +max_cost_usd: 0.0 # 0 = disabled; set to enforce a budget ``` +Optional feature flags (default off, no-regression): +`use_native_tools`, `use_nuclei`, `use_llm_summary`, `use_judge`. + --- -## Roadmap +## Documentation -``` -[x] Project structure & scaffolding -[x] Config system (.env + YAML) -[ ] LLM client abstraction (OpenAI / Anthropic) -[ ] Orchestrator agent core loop -[ ] Recon agent β€” DNS, WHOIS, subdomain enum -[ ] phantom-intel integration β€” CVE context injection -[ ] phantom-grid integration β€” OOB result correlation -[ ] Exploit suggestion agent β€” CVE β†’ PoC mapping -[ ] Report generation β€” Markdown + PDF output -[ ] Multi-agent safety protocol layer -[ ] CLI interface (click) -``` +| Doc | What | +|-----|------| +| [docs/api/agents.md](docs/api/agents.md) | Agent API reference | +| [docs/exploit/oob-exploitation-workflow.md](docs/exploit/oob-exploitation-workflow.md) | OOB / SSRF walkthrough | +| [docs/web3/web3-audit.md](docs/web3/web3-audit.md) | Smart-contract audit for Immunefi | +| [docs/mcp/integration.md](docs/mcp/integration.md) | MCP server setup | --- @@ -180,7 +183,7 @@ timeout: 60 | Tool | Role | |------|------| -| [phantom-grid](https://github.com/evkir/phantom-grid) | OOB interaction capture & analysis | +| [phantom-grid](https://github.com/evkir/phantom-grid) | OOB interaction capture | | [phantom-intel](https://github.com/evkir/phantom-intel) | CVE intelligence feed | | [reality-probe](https://github.com/evkir/reality-probe) | TLS analysis & config auditing | @@ -188,9 +191,9 @@ timeout: 60 ## Requirements -- Python 3.10+ -- OpenAI API key **or** Anthropic API key -- phantom-grid (optional, for OOB correlation) +- Python 3.11+ +- OpenAI **or** Anthropic API key (not required for `--dry-run`) +- Optional: phantom-grid (OOB), nuclei, slither, NVD API key --- @@ -198,8 +201,6 @@ timeout: 60 MIT β€” see [LICENSE](LICENSE) ---- -
Part of the evkir security toolchain.
From 5af221e1764e3577533dd2e236c07ce17c9557ad Mon Sep 17 00:00:00 2001 From: Evgeny Kiriyak <224408464+evkir@users.noreply.github.com> Date: Fri, 19 Jun 2026 00:07:46 +0300 Subject: [PATCH 2/4] docs: rewrite agent API reference with real contract --- docs/api/agents.md | 112 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 95 insertions(+), 17 deletions(-) diff --git a/docs/api/agents.md b/docs/api/agents.md index 3507dd4..304edc2 100644 --- a/docs/api/agents.md +++ b/docs/api/agents.md @@ -1,22 +1,100 @@ -# CyberAI Agent API +# Agent API Reference -## AsyncPipeline -from cyberai.core.pipeline import AsyncPipeline -result = AsyncPipeline.execute("10.10.10.1") -print(result.success, result.recon, result.intel, result.exploit) +All pipeline agents share the `BaseAgent` contract. The orchestrator constructs +each agent with explicit dependencies and calls `run()`: -## AsyncReconAgent -from cyberai.agents.recon.async_agent import AsyncReconAgent -result = await AsyncReconAgent().run("10.10.10.1") +```python +agent = ReconAgent(config, session, llm, audit) +result = agent.run(target, context=None) # -> dict; data also written to session.kb +``` -## AsyncIntelAgent -from cyberai.agents.recon.async_agent import AsyncIntelAgent -result = await AsyncIntelAgent().run(recon_result) +- `config` β€” `CyberAIConfig` (feature flags, budget, output_dir) +- `session` β€” `ScanSession`; agents read/write findings and `session.kb` +- `llm` β€” `LLMClient` (may be `None` for deterministic / dry-run paths) +- `audit` β€” `AuditLogger`; every action is recorded -## AsyncExploitAgent -from cyberai.agents.recon.async_agent import AsyncExploitAgent -result = await AsyncExploitAgent().run(intel_result) +`run()` returns a status dict and persists structured data into `session.kb` +under the agent's key. Agents never mutate each other directly β€” the knowledge +base is the single source of truth between phases. -## Safety -from cyberai.core.safety import InputSanitizer, ScopeValidator, ScopeConfig -clean = InputSanitizer.sanitize(untrusted_string) +--- + +## ReconAgent + +`cyberai/agents/recon/agent.py` + +- **Input:** `target` (host / domain / IP) +- **Output dict + kb key `recon`:** open ports, DNS records, WHOIS, subdomains +- **Tools:** `nmap_scan` (flag-whitelisted), `dns_lookup`, `whois_lookup`, + `subdomain_enum` +- **Edge cases:** + - nmap flags are validated against a whitelist; unknown flags are rejected + before subprocess (no shell, argv list). + - Results are cached by `target + flags` hash (TTL); failed scans (rc != 0) + are not cached. + - Async variant (`AsyncReconAgent`) gathers DNS + subdomain enumeration + concurrently; nmap/TLS stay on executor (blocking subprocess). + +## IntelAgent + +`cyberai/agents/intel/agent.py` + +- **Input:** `target` + recon data from `session.kb` +- **Output dict + kb key `intel`:** ranked CVEs with CVSS, EPSS, exploit factor +- **Tools:** `cve_search` (NVD), `epss_score` +- **Edge cases:** + - NVD rate-limited (50/30s with API key, 5/30s without); 429/503 β†’ + exponential backoff, max 3 retries. + - EPSS HTTP failure β†’ silent `0.0`, pipeline survives `api.first.org` outage. + - Composite score boosts EPSS non-linearly (EPSS > 0.5 β†’ πŸ”₯, > 0.2 β†’ ⚠). + +## ExploitAgent + +`cyberai/agents/exploit/agent.py` + +- **Input:** `target` + intel data from `session.kb` +- **Output dict + kb key `exploit`:** attack paths, PoC mappings, OOB findings +- **Tools:** `build_chain`, `map_poc`; optional nuclei / OOB workflows +- **Flags:** `use_native_tools` (LLM-driven chain via native tool calling), + `use_nuclei` (nuclei engine + OOB wiring) +- **Edge cases:** + - OOB workflows (SSRF/XXE) confirm blind vulns via phantom-grid callbacks β€” + see [../exploit/oob-exploitation-workflow.md](../exploit/oob-exploitation-workflow.md). + - Native tool args carry identifiers (`cve_id`/`target`), not full CVE dicts; + real data is resolved agent-side (anti-hallucination, fewer tokens). + - Falls back to the deterministic path if the model never calls `build_chain`. + - searchsploit / nuclei absent β†’ graceful (`available = False`), not fatal. + +## ReportAgent + +`cyberai/agents/report/agent.py` + +- **Input:** `target` + full `session.kb` +- **Output dict + kb key `report`:** Markdown report path; optional H1 export +- **Tools:** deterministic renderer; optional LLM summary + judge +- **Flags:** `use_llm_summary` (structured LLM summary), `use_judge` + (LLM-as-judge validation) +- **Edge cases:** + - Deterministic report never fails on LLM error (fail-safe try/except). + - Judge validates each claim against kb evidence; score < 0.7 triggers a + regeneration with feedback. Hallucinated CVEs (not in kb) are caught. + - HackerOne export follows the H1 template (title / severity / steps / + impact / recommendation). + +## SmartContractAgent (Web3) + +`cyberai/agents/web3/agent.py` + +- **Standalone** β€” not part of the network pipeline; a contract is not a + network target. +- **Input:** `target` = local `.sol` path **or** contract address +- **Output dict + kb key `web3`:** findings, `highest_severity`, + `slither_available`; for addresses, `source_meta` from Etherscan +- **Tools:** `slither_scan`, `fetch_source` (Etherscan) +- **Edge cases:** + - Local `.sol` is the primary path; Etherscan is graceful without an API key. + - Slither absent β†’ `available = False`, findings empty, no crash. + - Detectors map to Immunefi tiers (reentrancy-eth / arbitrary-send / + suicidal / delegatecall β†’ Critical); unknown detectors fall back to + impact Γ— confidence. + - See [../web3/web3-audit.md](../web3/web3-audit.md). From 6934f431c970c2e355b397006db7507e5d92eb1b Mon Sep 17 00:00:00 2001 From: Evgeny Kiriyak <224408464+evkir@users.noreply.github.com> Date: Fri, 19 Jun 2026 00:09:31 +0300 Subject: [PATCH 3/4] docs: rewrite OOB exploitation walkthrough for phantom-grid v2 token-flow --- docs/exploit/oob-exploitation-workflow.md | 107 +++++++++++++++------- 1 file changed, 74 insertions(+), 33 deletions(-) diff --git a/docs/exploit/oob-exploitation-workflow.md b/docs/exploit/oob-exploitation-workflow.md index d2924a7..cb81e26 100644 --- a/docs/exploit/oob-exploitation-workflow.md +++ b/docs/exploit/oob-exploitation-workflow.md @@ -2,46 +2,87 @@ ## Overview -Out-of-band (OOB) techniques confirm blind vulnerabilities where the -application response gives no direct feedback. CyberAI routes OOB -payloads through phantom-grid, which captures DNS/HTTP callbacks. +Out-of-band (OOB) techniques confirm **blind** vulnerabilities β€” cases where +the application response gives no direct feedback (blind SSRF, blind XXE, blind +injection). Instead of diffing responses, CyberAI plants a payload that forces +the target to call back to [phantom-grid](https://github.com/evkir/phantom-grid), +which captures the DNS/HTTP interaction out of band. A captured callback is +proof of execution. -## Architecture +## Components ExploitAgent -ExploitAgent -β”‚ -β”œβ”€β”€ SSRFWorkflow ──► target app ──► phantom-grid (OOB callback) -β”‚ β”‚ -└── XXEWorkflow ──► target XML parser β”€β”€β”€β”˜ -β”‚ -PhantomGridPoller -(polls for callback) ++-- SSRFWorkflow --> target app ----+ -## SSRF Detection Flow ++-- XXEWorkflow --> XML parser ----+--> phantom-grid (captures callback) -1. Generate unique `interaction_id` -2. Build payload: `http:///` -3. Inject into URL parameter via GET or POST -4. Poll phantom-grid `/api/interactions/` for DNS/HTTP hit -5. Confirmed hit β†’ HIGH severity finding ++-- OOBWorkflow --> generic inject -+ -## Blind XXE Flow +| -1. Generate XXE payload referencing phantom-grid domain -2. Deliver via POST body, SOAP envelope, or file upload -3. Parser resolves external entity β†’ OOB DNS/HTTP to phantom-grid -4. Poll for callback β†’ confirmed blind XXE +PhantomGridClient.get_interactions(id) <+ (polls captured interactions) - `cyberai/integrations/phantom_grid.py` β€” `PhantomGridClient` (token-flow, v2 API) +- `cyberai/agents/exploit/ssrf_workflow.py` β€” `SSRFWorkflow` +- `cyberai/agents/exploit/xxe_workflow.py` β€” `XXEWorkflow` +- `cyberai/agents/exploit/oob_workflow.py` β€” `OOBWorkflow` (generic orchestration) -## Payload Types +## phantom-grid v2 token-flow -| Type | Technique | Confirms | -|------|-----------|---------| -| Basic OOB | `SYSTEM "http://phantom/id"` | HTTP callback | -| Parameter entity | `%remote` DTD load | DNS + HTTP | +The grid runs on port **9090**. The capture URL is derived from a server-issued +**token**, not a client-generated id: -## Operational Notes +```python +from cyberai.integrations.phantom_grid import PhantomGridClient -- Set `max_wait` based on target response time (default 30s) -- Use per-test `interaction_id` β€” never reuse across targets -- phantom-grid must be reachable from target server, not just attacker -- All payloads logged in AuditTrail automatically via decorators +grid = PhantomGridClient(base_url="http://127.0.0.1:9090") +if not grid.available(): # health check; graceful if grid is down + ... + +token = grid.create_token(label="ssrf-example") # POST /api/tokens -> token +url = grid.capture_url(token) # http:///c/ +# ... inject `url` into the target ... +hits = grid.get_interactions(token) # GET captured interactions +``` + +`OOBInteraction.confirmed` is `True` once a DNS/HTTP hit lands on the token. + +## SSRF detection flow + +1. `SSRFWorkflow` requests a capture URL from phantom-grid (server token). +2. Build the SSRF payload pointing at that URL (`_make_payload`). +3. Inject into the candidate parameter via GET or POST (`test` / `test_batch`). +4. Poll `get_interactions(token)` for a DNS/HTTP callback. +5. Confirmed hit β†’ `SSRFResult` with HIGH severity; recorded as a finding. + +## Blind XXE flow + +1. `XXEWorkflow` builds an XML payload with an external entity referencing the + phantom-grid capture URL. +2. Submit to the XML-parsing endpoint. +3. A parser that resolves the entity triggers the OOB callback. +4. Poll for the interaction β†’ confirmed blind XXE. + +## Worked example β€” blind SSRF on example.com + +> Authorized targets only. Confirm scope before running +> (`cyberai scope import`). + +1. Start phantom-grid locally (or point `phantom.grid_url` at your instance). +2. Run the exploit phase with OOB enabled: + +```bash + python -m cyberai scan example.com --scope example.com +``` + +3. ExploitAgent picks an SSRF-candidate parameter, requests a token from the + grid, and injects `http:///c/` into it. +4. If `example.com` fetches the URL server-side, phantom-grid records the hit. +5. `get_interactions(token)` returns a confirmed `OOBInteraction`; the agent + raises a HIGH-severity SSRF finding into the report. + +## Notes + +- No callback within the poll window β†’ reported as **unconfirmed**, never a + false HIGH. Absence of evidence is not evidence. +- phantom-grid absent / unreachable β†’ workflows degrade gracefully + (`available = False`); the deterministic pipeline still completes. +- WebSocket push is on the phantom-grid roadmap; the current client polls over + HTTP. From 4314d522f0b1e0cbb67a3b366ab9451c50205021 Mon Sep 17 00:00:00 2001 From: Evgeny Kiriyak <224408464+evkir@users.noreply.github.com> Date: Fri, 19 Jun 2026 00:11:05 +0300 Subject: [PATCH 4/4] docs: add Web3 / Immunefi audit workflow guide --- docs/web3/web3-audit.md | 81 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 docs/web3/web3-audit.md diff --git a/docs/web3/web3-audit.md b/docs/web3/web3-audit.md new file mode 100644 index 0000000..a452efa --- /dev/null +++ b/docs/web3/web3-audit.md @@ -0,0 +1,81 @@ +# Web3 Audit Workflow β€” SmartContractAgent β†’ Slither β†’ Immunefi + +## Overview + +The `SmartContractAgent` runs static analysis on a Solidity contract and maps +each finding to an [Immunefi](https://immunefi.com/) severity tier β€” so you can +triage before submitting to a bounty program. It is **standalone**: a contract +is not a network target, so this agent runs outside the reconβ†’intelβ†’exploit +pipeline. + +## Components SmartContractAgent.run(target) + ++-- local .sol --> SlitherTool.analyze() --> parse_slither_json() + +| | + +| v + +| immunefi_severity.classify_all() + +| immunefi_severity.highest_tier() + ++-- address --> EtherscanClient.fetch_source() (graceful w/o API key) - `cyberai/agents/web3/agent.py` β€” `SmartContractAgent` +- `cyberai/agents/web3/slither_tool.py` β€” `SlitherTool`, `parse_slither_json`, + `SlitherFinding` +- `cyberai/agents/web3/immunefi_severity.py` β€” `classify`, `classify_all`, + `highest_tier` +- `cyberai/agents/web3/etherscan.py` β€” `EtherscanClient` + +## Input modes + +| `target` | Mode | Path | +|----------|------|------| +| local `*.sol` file | `local` | Slither analysis (primary) | +| contract address | `address` | Etherscan source fetch (needs API key) | + +Local `.sol` is the primary, fully-offline path. Address mode is graceful +without `ETHERSCAN_API_KEY` (returns source metadata only). + +## Severity mapping + +Slither detectors are mapped to Immunefi tiers by a per-check table; unknown +detectors fall back to `impact Γ— confidence`: + +| Slither detector | Immunefi tier | +|------------------|---------------| +| `reentrancy-eth` | Critical | +| `arbitrary-send` | Critical | +| `suicidal` | Critical | +| `controlled-delegatecall` | Critical | +| (unknown) | impact Γ— confidence fallback | + +`highest_tier(findings)` returns the worst tier across all findings β€” your +headline severity for a submission. + +## Worked example β€” reentrancy audit + +1. Point the agent at a local contract: + +```python + from cyberai.agents.web3.agent import SmartContractAgent + + agent = SmartContractAgent(config, session, llm=None, audit=audit) + result = agent.run("contracts/Vault.sol") + print(result["highest_severity"], len(result["findings"])) +``` + +2. On a TheDAO-style reentrant contract, Slither reports `reentrancy-eth` + (alongside `solc-version`, `low-level-calls`). +3. `reentrancy-eth` maps to **Critical** β†’ `highest_severity = "Critical"`. +4. Triage the finding against the program's scope and PoC requirements before + submitting to Immunefi. + +## Notes + +- Slither absent β†’ `slither_available = False`, findings empty, no crash; + CI covers the logic with mocked Slither output. +- JSON parsing is verified against Slither 0.11.5 + (`results.detectors[].{check,impact,confidence,description}`). +- This is a triage aid, **not** a substitute for manual review β€” static + analysis has false positives; confirm exploitability before submission.