OOB-driven, agent-trust-aware AI pentest platform
Built by someone who red-teams AI, not just with it.
CyberAI is a multi-agent orchestration layer for offensive security. Five specialized agents — Recon, Intel, Exploit, Report, Web3 — run a typed, auditable pipeline that turns a target into actionable attack paths and a validated report.
Two things set it apart from "LLM wrapper over nmap":
- OOB-driven exploitation. Blind vulns (SSRF, XXE, blind injection) are confirmed through out-of-band callbacks captured by phantom-grid, not guessed from response diffs.
- Agent-trust-aware design. Every banner and tool output is treated as untrusted input: sanitized, injection-scanned, and parsed before it ever reaches the LLM context. Adversarial thinking is a design input, not a disclaimer.
Reach beyond the network: the Web3 agent runs Slither static analysis and maps detectors to Immunefi severity tiers for smart-contract audits.
Architecture +------------------+ target -----------> | Orchestrator | typed pipeline, dry-run, budget
+--------+---------+ injection-scan at phase boundaries
|
+-----------+----------+-----------+------------+
v v v v v
+------+ +------+ +--------+ +--------+ +------+
|Recon |-->|Intel |-->|Exploit |->|Report | | Web3 | (standalone)
+------+ +------+ +---+----+ +--------+ +--+---+
DNS NVD/CVE OOB | PoC judge | Slither
nmap EPSS nuclei H1-export | Immunefi
subdom prioritize | | severity
v
+-------------+
| phantom-grid| OOB callback capture
+-------------+ Observability: SQLite audit log . session export/import . cyberai replay
Interfaces: CLI . FastAPI dashboard (SSE) . MCP server (Claude Desktop) ### Agents
| Agent | Input | Output | Key tools |
|---|---|---|---|
| Recon | target | open ports, DNS, WHOIS, subdomains | nmap (flag-whitelisted), async DNS, subdomain enum |
| Intel | recon kb | ranked CVEs | NVD client, EPSS enrichment, risk prioritizer |
| Exploit | intel kb | attack paths, OOB findings | nuclei, searchsploit, OOB/SSRF/XXE workflows |
| Report | session kb | structured Markdown / H1 export | LLM summary + LLM-as-judge validation |
| Web3 | .sol path / address | severity-tiered findings | Slither, Etherscan, Immunefi classifier |
- Agent trust boundaries — each agent runs with minimal permissions.
- Untrusted input handling — banners sanitized, length-capped, marked
UNTRUSTEDbefore LLM context. - Prompt-injection detection — 33-pattern detector at every phase boundary; hits become MEDIUM findings, visible in the report.
- Scope enforcement — wildcard +
!-exclusion matching honors HackerOne / Bugcrowd briefs (cyberai scope import). - Audit trail — every agent action logged (JSONL or SQLite) with full inputs/outputs; sessions are replayable.
git clone https://github.com/evkir/CyberAI.git
cd CyberAI
pip install -e .cp config.example.yml config.yml
cp .env.example .env
# Edit .env — add OPENAI_API_KEY or ANTHROPIC_API_KEY (not needed for --dry-run)# Dry-run: walks all 4 phases, no network, no API key
python -m cyberai scan example.com --dry-run
# Real scan, scope-restricted
python -m cyberai scan target.htb --scope '*.target.htb'
# Replay a saved session deterministically
python -m cyberai replay <session_id>
# Import a bug-bounty scope
python -m cyberai scope import h1 --program acme
# Status / config
python -m cyberai statusuvicorn cyberai.web.app:app --reload
# http://127.0.0.1:8000 — session list, live SSE progress, report viewpython -m cyberai.mcp.serverExposes recon/intel tools (nmap_scan, dns_enum, cve_search,
epss_score, …) over the Model Context Protocol. See
docs/mcp/integration.md.
# config.yml
llm:
provider: openai # openai | anthropic
model: gpt-4o
max_tokens: 4096
temperature: 0.2
phantom:
grid_url: http://127.0.0.1:9090
output_dir: reports/
max_cost_usd: 0.0 # 0 = disabled; set to enforce a budgetOptional feature flags (default off, no-regression):
use_native_tools, use_nuclei, use_llm_summary, use_judge.
| Doc | What |
|---|---|
| docs/api/agents.md | Agent API reference |
| docs/exploit/oob-exploitation-workflow.md | OOB / SSRF walkthrough |
| docs/web3/web3-audit.md | Smart-contract audit for Immunefi |
| docs/mcp/integration.md | MCP server setup |
| Tool | Role |
|---|---|
| phantom-grid | OOB interaction capture |
| phantom-intel | CVE intelligence feed |
| reality-probe | TLS analysis & config auditing |
- Python 3.11+
- OpenAI or Anthropic API key (not required for
--dry-run) - Optional: phantom-grid (OOB), nuclei, slither, NVD API key
MIT — see LICENSE