An adversarial multi-agent analysis skill for Claude Code.
A single AI response gives you one line of reasoning. It anchors early, hedges, and lands on a safe middle-ground answer. For hard questions — strategy, architecture, real trade-offs — that's not enough.
Deep Research runs the question through parallel advocate/critic agent pairs, each arguing from a genuinely distinct school of thought, then synthesizes what survived the cross-examination into a layered brief. Every layer gets adversarially checked, including the synthesis itself.
This repo contains the full implementation: a SKILL.md orchestration spec plus four agent prompt templates. It runs as a personal skill inside Claude Code — there's no separate runtime or server.
Question in
│
▼
GATE — score on 3 binary axes
(Ambiguity / Consequence / Breadth — 2+ to fire)
│
▼
"How hard should I go?" → Light / Standard / Heavy
│
▼
Propose N directions (distinct lenses) → user confirms
│
▼
WAVE 1 — N advocate agents, in parallel
each builds the strongest case for one lens
│
▼
WAVE 2 — N critic agents, in parallel
each attacks one advocate's actual output
│
▼
Orchestrator — judges the debates, takes a position
│
▼
Meta-steelman — attacks the synthesis itself
│
▼
Layered output: verdict → surviving args → killed args → meta-critique
Every incoming question is scored on three binary axes:
| Axis | Scores 1 if... |
|---|---|
| Ambiguity | Multiple valid interpretations or approaches exist |
| Consequence | Getting it wrong has real downstream cost |
| Breadth | Genuinely distinct schools of thought bear on the answer |
Two or more points and the skill offers to fire: "This looks like it'd benefit from deep research. How hard should I go?" The user picks an intensity — or declines, and the skill stands down. Simple factual questions never trip the gate.
The skill proposes N directions and waits for confirmation. Each direction is a lens, not an option: "operational complexity" and "team autonomy" are lenses; "microservices vs. monolith" restated two ways is not.
N agents launch simultaneously, one per direction, using advocate-prompt.md. Each builds the strongest possible case from its assigned lens — no hedging, no weasel words, evidence-backed, capped at 800 words.
After Wave 1 completes, N critic agents launch using critic-prompt.md. Each receives one advocate's full output and goes after structural flaws: weak assumptions, cherry-picked evidence, hidden costs, critical omissions. Running Wave 2 strictly after Wave 1 is the load-bearing design choice — critics attack real arguments, not strawmen.
A single agent (orchestrator-prompt.md) reads all N debates and judges them — which arguments survived intact, which got dismantled, where directions unexpectedly converged, where surviving arguments still conflict. It must take a position, not average everything into mush.
The orchestrator is a single point of failure, so a final agent (meta-steelman-prompt.md) attacks the synthesis: cherry-picking, false convergence, conclusions the debates don't actually support. If the synthesis is solid, it says so. If not, it amends the verdict.
| Intensity | Directions | Advocates | Critics | Orchestrator | Meta-steelman | Total agents |
|---|---|---|---|---|---|---|
| Light | 3 | 3 | 3 | 1 | 1 | 8 |
| Standard | 5 | 5 | 5 | 1 | 1 | 12 |
| Heavy | 8+ | 8+ | 8+ | 1 | 1 | 18+ |
Advocates and critics run in parallel within each wave, so N agents cost roughly the wall-clock time of one. You're buying depth, not waiting for it.
Layered so the reader stops at their depth:
- Verdict — 3 lines. Act on this alone if it's clear.
- Surviving arguments — what held up under attack and why.
- Killed arguments — what was considered and dismantled, one line each.
- Meta-critique — what might be wrong with the conclusion itself.
- Raw debates — full advocate/critic exchanges, on request.
Copy the skill/ directory into your Claude Code personal skills folder:
git clone https://github.com/upneja/deep-research.git
cp -r deep-research/skill ~/.claude/skills/deep-researchClaude Code picks it up automatically. From then on, any question that trips the gate gets the "how hard should I go?" prompt. You can also invoke it directly by asking for "deep research on [question]". Saying "just answer" or "skip" stands it down.
Requires Claude Code with subagent (Task/Agent tool) support. No other dependencies.
deep-research/
├── README.md
└── skill/
├── SKILL.md # Gate logic + orchestration flow
├── advocate-prompt.md # Wave 1 agent template
├── critic-prompt.md # Wave 2 agent template
├── orchestrator-prompt.md # Synthesis agent template
└── meta-steelman-prompt.md # Final adversarial check template
The implementation is prompts and orchestration instructions — SKILL.md tells Claude Code when to fire, how to dispatch agents in parallel waves, and how to assemble the layered output. The four templates use {{PLACEHOLDER}} substitution for the question, direction, and upstream agent outputs.
Good fits: meaningful trade-offs with no obvious right answer, decisions with real downstream cost, questions where multiple disciplines disagree, stress-testing a position you already hold.
Bad fits: factual lookups, anything where speed beats depth, questions with one established answer. The gate exists to keep these out.
- Adversarial review at every layer. Advocates get critics, the synthesis gets a meta-steelman. Remove it from any layer and you're back to theater.
- Critics engage with the actual argument. Wave ordering guarantees it.
- The synthesis judges, it doesn't average. Directions that lost their debates get discounted, not blended in.
- Output respects the reader. The verdict comes first; depth is opt-in.
Built with Claude Code.