Skip to content

evkir/CyberAI

Repository files navigation

CI Python License Status LLM

🤖 CyberAI

OOB-driven, agent-trust-aware AI pentest platform

Built by someone who red-teams AI, not just with it.


What is CyberAI?

CyberAI is a multi-agent orchestration layer for offensive security. Five specialized agents — Recon, Intel, Exploit, Report, Web3 — run a typed, auditable pipeline that turns a target into actionable attack paths and a validated report.

Two things set it apart from "LLM wrapper over nmap":

  • OOB-driven exploitation. Blind vulns (SSRF, XXE, blind injection) are confirmed through out-of-band callbacks captured by phantom-grid, not guessed from response diffs.
  • Agent-trust-aware design. Every banner and tool output is treated as untrusted input: sanitized, injection-scanned, and parsed before it ever reaches the LLM context. Adversarial thinking is a design input, not a disclaimer.

Reach beyond the network: the Web3 agent runs Slither static analysis and maps detectors to Immunefi severity tiers for smart-contract audits.


Architecture +------------------+ target -----------> | Orchestrator | typed pipeline, dry-run, budget

+--------+---------+ injection-scan at phase boundaries

|

+-----------+----------+-----------+------------+

v v v v v

+------+ +------+ +--------+ +--------+ +------+

|Recon |-->|Intel |-->|Exploit |->|Report | | Web3 | (standalone)

+------+ +------+ +---+----+ +--------+ +--+---+

DNS NVD/CVE OOB | PoC judge | Slither

nmap EPSS nuclei H1-export | Immunefi

subdom prioritize | | severity

v

+-------------+

| phantom-grid| OOB callback capture

+-------------+ Observability: SQLite audit log . session export/import . cyberai replay

Interfaces: CLI . FastAPI dashboard (SSE) . MCP server (Claude Desktop) ### Agents

Agent Input Output Key tools
Recon target open ports, DNS, WHOIS, subdomains nmap (flag-whitelisted), async DNS, subdomain enum
Intel recon kb ranked CVEs NVD client, EPSS enrichment, risk prioritizer
Exploit intel kb attack paths, OOB findings nuclei, searchsploit, OOB/SSRF/XXE workflows
Report session kb structured Markdown / H1 export LLM summary + LLM-as-judge validation
Web3 .sol path / address severity-tiered findings Slither, Etherscan, Immunefi classifier

Security design

  • Agent trust boundaries — each agent runs with minimal permissions.
  • Untrusted input handling — banners sanitized, length-capped, marked UNTRUSTED before LLM context.
  • Prompt-injection detection — 33-pattern detector at every phase boundary; hits become MEDIUM findings, visible in the report.
  • Scope enforcement — wildcard + !-exclusion matching honors HackerOne / Bugcrowd briefs (cyberai scope import).
  • Audit trail — every agent action logged (JSONL or SQLite) with full inputs/outputs; sessions are replayable.

Quick start

git clone https://github.com/evkir/CyberAI.git
cd CyberAI
pip install -e .
cp config.example.yml config.yml
cp .env.example .env
# Edit .env — add OPENAI_API_KEY or ANTHROPIC_API_KEY (not needed for --dry-run)
# Dry-run: walks all 4 phases, no network, no API key
python -m cyberai scan example.com --dry-run

# Real scan, scope-restricted
python -m cyberai scan target.htb --scope '*.target.htb'

# Replay a saved session deterministically
python -m cyberai replay <session_id>

# Import a bug-bounty scope
python -m cyberai scope import h1 --program acme

# Status / config
python -m cyberai status

Web dashboard

uvicorn cyberai.web.app:app --reload
# http://127.0.0.1:8000  — session list, live SSE progress, report view

MCP server (Claude Desktop / Cursor)

python -m cyberai.mcp.server

Exposes recon/intel tools (nmap_scan, dns_enum, cve_search, epss_score, …) over the Model Context Protocol. See docs/mcp/integration.md.


Configuration

# config.yml
llm:
  provider: openai        # openai | anthropic
  model: gpt-4o
  max_tokens: 4096
  temperature: 0.2

phantom:
  grid_url: http://127.0.0.1:9090

output_dir: reports/
max_cost_usd: 0.0         # 0 = disabled; set to enforce a budget

Optional feature flags (default off, no-regression): use_native_tools, use_nuclei, use_llm_summary, use_judge.


Documentation

Doc What
docs/api/agents.md Agent API reference
docs/exploit/oob-exploitation-workflow.md OOB / SSRF walkthrough
docs/web3/web3-audit.md Smart-contract audit for Immunefi
docs/mcp/integration.md MCP server setup

Related tools

Tool Role
phantom-grid OOB interaction capture
phantom-intel CVE intelligence feed
reality-probe TLS analysis & config auditing

Requirements

  • Python 3.11+
  • OpenAI or Anthropic API key (not required for --dry-run)
  • Optional: phantom-grid (OOB), nuclei, slither, NVD API key

License

MIT — see LICENSE

Part of the evkir security toolchain.

Packages

 
 
 

Contributors