Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions agent-battle-arena/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
node_modules/
.arena/
.bankr/
.env
*.log
.DS_Store
136 changes: 136 additions & 0 deletions agent-battle-arena/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
name: agent-battle-arena
description: Run a trading-agent competition. Players create an AI trading agent, pick a personality (meme hunter, conservative DCA, degen sniper, whale follower, AI narrative trader), and a weekly leaderboard ranks them on PnL, drawdown, win rate, rugs avoided, best call, and worst trade. Defaults to safe paper-trading; real trades are an opt-in via the Bankr Agent API. Use when the user wants an agent battle/arena/tournament, a trading leaderboard, to "raise" or pick a trading-bot personality, or to compare trading strategies head-to-head.
metadata:
{
"clawdbot":
{
"emoji": "⚔️",
"homepage": "https://github.com/BankrBot/skills",
"requires": { "bins": ["node"] },
},
}
---

# Agent Battle Arena

A competition where AI agents trade against each other. Each player creates one
agent and picks a **personality** — they don't need to know how to trade. Every
season (default one week) a **leaderboard** ranks the agents.

The point: you raise an agent and choose its character. The strategy does the
trading; you watch it climb (or tank) the board.

- **Default mode is paper-trading** — real market dynamics, simulated money, zero
financial risk. Perfect for tournaments and demos.
- **Real trading is opt-in** — flip an agent to `--mode real` and the orders
route through the [Bankr Agent API](https://github.com/BankrBot/skills/tree/main/bankr).
Gated behind explicit env flags so nobody trades real funds by accident.

## Requirements

- Node.js ≥ 22.6 (runs the TypeScript engine directly — no build step, no runtime deps).
- For real mode only: a write-enabled Bankr API key (`bk_...`) from
[bankr.bot/api](https://bankr.bot/api) and the `bankr` CLI / Agent API.

## Quick start

```bash
# from the skill directory
node src/cli.ts seed-demo # create an arena with all 5 demo personalities
node src/cli.ts run --all # play out the whole season (paper trades)
node src/cli.ts leaderboard # see the ranked board + weekly highlights
```

Or in one shot: `npm run demo`.

## The five personalities

| Key | Character | One-liner |
|-----|-----------|-----------|
| `meme-hunter` | Meme Hunter | Chases momentum memecoins; skips anything that smells like a rug. |
| `conservative-dca` | Conservative DCA | DCAs into blue chips. Lowest drawdown, steady curve. |
| `degen-sniper` | Degen Sniper | Apes fresh runners with size and high rug tolerance. High variance. |
| `whale-follower` | Whale Follower | Mirrors smart-money flow. Buys accumulation, exits distribution. |
| `ai-narrative-trader` | AI Narrative Trader | Rotates into rising-mindshare narratives, exits when the story fades. |

Full strategy logic and the signals each one reads: [references/personalities.md](references/personalities.md).

## Creating agents

```bash
node src/cli.ts new --weeks 1 # fresh empty arena (seeded market)
node src/cli.ts add-agent --name "PepeRadar" \
--personality meme-hunter --owner alice --cash 1000
node src/cli.ts list # who's in the arena
```

Each player runs `add-agent` once with their chosen `--personality`. Same arena,
same seeded market → fair head-to-head.

## Running a season

```bash
node src/cli.ts run --rounds 24 # advance 24 ticks (≈ 1 day at hourly resolution)
node src/cli.ts run --all # finish the season
```

A "tick" is one market step (think hourly). A 1-week season = 168 ticks. The
market is **deterministic from a seed**, so a season is reproducible and the
same for every agent — see [references/arena-workflow.md](references/arena-workflow.md).

## The leaderboard

`node src/cli.ts leaderboard` ranks agents by PnL% and reports, per agent:

- **PnL** (USD and %) — equity vs starting bankroll
- **Max drawdown** — worst peak-to-trough on the equity curve
- **Win rate** — share of closed trades that were profitable
- **Rugs avoided** — risky tokens the agent flagged and skipped that later rugged
- **Best call** / **Worst trade** — top and bottom closed trades by %

Exact definitions and how each is computed: [references/leaderboard.md](references/leaderboard.md).

`node src/cli.ts agent <name>` shows one agent's full card: positions, recent
trades with reasons, and its highlights.

## Real trading (opt-in)

Paper is the default. To let an agent trade **real funds** through Bankr:

```bash
export ARENA_LIVE=1
export BANKR_API_KEY=bk_your_write_enabled_key
export ARENA_MAX_TRADE_USD=25 # per-trade cap (default 25)
node src/cli.ts add-agent --name LiveBot --personality whale-follower --mode real
node src/cli.ts run --rounds 1
```

Without `ARENA_LIVE=1` **and** a key, real mode refuses to run. Orders become
natural-language Bankr prompts (`Buy $25 of WETH on base`) executed via the
Agent API. **Start tiny, use a dedicated agent wallet.** Full safety guidance:
[references/trading-modes.md](references/trading-modes.md).

## Command reference

| Command | Description |
|---------|-------------|
| `seed-demo [--seed N] [--weeks W\|--ticks T]` | Create an arena with all 5 demo agents |
| `new [--seed N] [--weeks W\|--ticks T]` | Create an empty arena |
| `add-agent --name <n> --personality <p> [--owner o] [--mode sim\|real] [--cash N]` | Add an agent |
| `list` | List agents |
| `run [--rounds N \| --all]` | Advance the season (default 24 ticks) |
| `leaderboard` (alias `lb`) | Ranked board + highlights |
| `agent <id\|name>` | Inspect one agent |
| `personalities` | List the 5 personalities |
| `reset` | Delete the current arena |

State persists to `.arena/state.json`. Override the location with `ARENA_DIR`.

## Extending

Add a personality by implementing the `Strategy` interface in
`src/personalities/` and registering it in `src/personalities/index.ts`. A
strategy reads market signals (momentum, liquidity, whale flow, narrative score,
rug risk) and returns buy/sell **orders** plus **skips** (tokens it refused on
risk grounds — the basis of the "rugs avoided" metric).
23 changes: 23 additions & 0 deletions agent-battle-arena/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"name": "agent-battle-arena",
"version": "0.1.0",
"description": "Agents compete with real or simulated trading strategies. Pick a personality, raise your agent, climb the weekly leaderboard. Bankr-native.",
"type": "module",
"engines": {
"node": ">=22.6.0"
},
"bin": {
"arena": "src/cli.ts"
},
"scripts": {
"arena": "node src/cli.ts",
"demo": "node src/cli.ts seed-demo && node src/cli.ts run --rounds 168 && node src/cli.ts leaderboard",
"typecheck": "tsc --noEmit"
},
"keywords": ["bankr", "agent", "trading", "arena", "leaderboard", "crypto"],
"license": "MIT",
"devDependencies": {
"typescript": "^5.6.0",
"@types/node": "^22.0.0"
}
}
70 changes: 70 additions & 0 deletions agent-battle-arena/references/arena-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Arena workflow

How a season runs, end to end.

## Concepts

- **Arena / season** — one competition over a fixed number of ticks. State lives
in `.arena/state.json` (override dir with `ARENA_DIR`).
- **Tick** — one market step. Treat it as ~1 hour; a 1-week season = **168 ticks**.
- **Seed** — the market is fully determined by `(seed, seasonTicks)`. Same seed →
same market for everyone. Reproducible and fair.
- **Agent** — one player's entry: a name, an owner, a personality, a mode, and a
starting bankroll (default $1,000).

## Running a one-week tournament

```bash
# 1. Create the week's arena. Use the week number as the seed for a
# fresh-but-reproducible market.
node src/cli.ts new --weeks 1 --seed 2026_22

# 2. Each player adds their agent (once).
node src/cli.ts add-agent --name "PepeRadar" --personality meme-hunter --owner alice
node src/cli.ts add-agent --name "SteadyHands" --personality conservative-dca --owner bob
node src/cli.ts add-agent --name "ApeFirst" --personality degen-sniper --owner carol
node src/cli.ts add-agent --name "WhaleWatch" --personality whale-follower --owner dave
node src/cli.ts add-agent --name "NarrativeMax" --personality ai-narrative-trader --owner erin

# 3. Play it out — all at once, or in daily chunks.
node src/cli.ts run --all # whole week
# …or advance gradually for a daily check-in:
node src/cli.ts run --rounds 24 # day 1
node src/cli.ts leaderboard # standings so far
node src/cli.ts run --rounds 24 # day 2 …

# 4. Publish final standings.
node src/cli.ts leaderboard
node src/cli.ts agent ApeFirst # deep-dive any agent
```

Or just demo it: `node src/cli.ts seed-demo && node src/cli.ts run --all && node src/cli.ts leaderboard`.

## What happens each tick (`src/engine/arena.ts → runRounds`)

1. **Settle rugs.** Any token that rugs this tick is recorded; agents holding it
are force-liquidated at the collapse price (booked as a `RUGGED:` trade).
2. **Decisions.** Each agent's strategy runs `decide({ snapshot, agent })`,
returning orders and skips. New distinct skips are stored.
3. **Execution.** Orders go to the agent's broker — `SimBroker` for sim, the
gated `BankrBroker` for real.
4. **Mark to market.** Every agent records an equity snapshot for the curve that
drives PnL and drawdown.

The season ends when `tick` reaches `seasonTicks`.

## Resetting / new seasons

```bash
node src/cli.ts reset # delete current arena
node src/cli.ts new --weeks 2 # longer season
```

Because everything derives from the seed, you can re-run an identical season any
time by reusing the same `--seed`.

## Mixed sim + real

You can keep most agents on `sim` and run one on `--mode real` in the same arena
to benchmark a live agent against paper opponents. Real mode requires the opt-in
env flags — see [trading-modes.md](trading-modes.md).
60 changes: 60 additions & 0 deletions agent-battle-arena/references/leaderboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Leaderboard metrics

`node src/cli.ts leaderboard` ranks all agents and prints highlights. Metrics are
computed in `src/metrics/leaderboard.ts` from each agent's trade log and equity
curve. Agents are **ranked by PnL%** (descending).

Each agent records an equity snapshot every tick (`cash + Σ positions valued at
the current mark price`), giving an equity curve the metrics read from.

## Metrics

### PnL (USD and %)
`equityUsd − startingCashUsd`, where `equityUsd` is the last point on the equity
curve. Percent is relative to the starting bankroll.

### Max drawdown
Largest peak-to-trough drop on the equity curve:
`max over time of (runningPeak − equity) / runningPeak`. Reported as a negative
percent. Lower magnitude = steadier. This is where Conservative DCA wins.

### Win rate
Of all **closed** trades (sells with a realized PnL), the fraction with
`realizedPnl > 0`. Shown as `—` when an agent has no closed trades yet (e.g. a
pure accumulator still holding everything).

### Rugs avoided
Distinct tokens the agent **skipped for risk reasons** (`Decision.skips`) that
**later actually rugged**, and which the agent was *not* holding when the rug
fired. This rewards strategies that read `rugRisk` and stay away. A skip only
counts if it happened on or before the rug tick.

### Rugs held (💥)
Distinct tokens the agent was **still holding when they rugged**. On a rug the
engine force-liquidates the position at the collapse price, booking the loss as a
closed trade tagged `RUGGED:`. Degen Sniper, with its high rug tolerance, is the
usual victim.

### Best call / Worst trade
The closed trades with the highest and lowest realized **PnL %**. PnL% per trade
is `realizedPnl / costBasis`, where cost basis is derived as
`proceeds − realizedPnl`. A held rug typically shows up as the worst trade
(≈ −90%).

## Reading the board

```
# AGENT STYLE PnL PnL% MaxDD Win Rug✓
🥇 PepeRadar meme-hunter $619.36 +61.9% -5.2% 60% 2
```

- **Rug✓** column = rugs avoided.
- The **Highlights** block under the table shows equity, best call, worst trade,
and a 💥 flag with the count if the agent got rugged.

## Weekly cadence

A "week" is just a season of 168 ticks (`--weeks 1`). To run a recurring weekly
tournament: start a fresh arena each week (`new --weeks 1 --seed <week>`), have
players re-add their agents, `run --all`, then publish `leaderboard`. Using the
week number as the seed makes every week a fresh-but-reproducible market.
Loading