Inspired by autoresearch.
Autonomous, AI-driven quantitative trading strategy optimization. AutoBacktest links LLM agents with deterministic backtesting and statistical validation to iteratively refine trading strategies — completely unattended.
LLM → code edit → preflight check → backtest → gate → commit or rollback
- Python 3.12+
- uv — install:
curl -LsSf https://astral.sh/uv/install.sh | sh - Git
- API key — for an LLM provider (OpenAI, Anthropic, Google Gemini, etc.)
git clone https://github.com/LeFi8/autobacktest.git
cd autobacktest
uv sync
cp .env.example .envEdit .env to set your API key and any backtest date windows.
uv run autobacktest run \
--program program.md \
--strategy equal_weight \
--iterations 5This evaluates the baseline strategy, generates candidate mutations via an LLM, validates them through a multi-stage gate (preflight, backtest, diversity, drawdown, DSR), and commits improvements to git.
uv run autobacktest reportFull visual reports, equity curves, and strategy summaries land in runs/<run_id>/.
Each strategy lives in a subdirectory with two files:
| File | Purpose |
|---|---|
strategies/<name>/strategy.py |
Signal generation — exports generate_signals(prices, config) |
strategies/<name>/config.yaml |
Parameters — universe, limits, Pydantic-validated fields |
Scaffold a new strategy:
uv run autobacktest init-strategy --name my_strategyThis generates boilerplate code and a validated YAML config with guided prompts.
Write your program:
Edit program.md (the LLM's objective + constraints). See program.md for the
template — fill in your goal, constraints, and any strategy background.
To keep multiple program specs without tracking them in git, copy program.md
to program-<name>.md — files matching program-*.md are git-ignored by default.
Run it:
uv run autobacktest run --program program.md --strategy my_strategy --iterations 10See strategies/equal_weight/strategy.py and strategies/equal_weight/config.yaml for a complete reference strategy.
sequenceDiagram
participant O as Orchestrator
participant LS as LessonStore (SQLite)
participant LLM as LLM Provider
participant Git as Git Ledger
O->>LS: Read lessons
O->>LLM: AgentContext (+ lessons_text)
LLM-->>O: AgentEdit (+ lessons_text)
O->>LS: Ingest updated lessons
alt Edit passes gate (select + confirm)
O->>Git: Commit (strategy + config)
else Edit rejected
O->>Git: Rollback (strategy + config)
end
- Read the program spec (objective + constraints) and past lessons.
- The LLM generates N candidate mutations in parallel — code edits, YAML changes, refined lessons.
- Each candidate runs preflight validation: import whitelist, AST checks, Pydantic config, smoke test.
- Config similarity gate filters duplicate parameter proposals (Tier 1).
- Backtesting: walk-forward (in-sample) + holdout (out-of-sample) evaluation.
- Returns correlation gate filters functionally identical variants (Tier 2).
- Two-phase gate:
select(in-sample metrics, DSR non-degradation) →confirm(holdout confirmation). - Passed candidates are committed to git; failures are rolled back with structured feedback.
| Command | Description |
|---|---|
run |
Autonomous optimization loop |
report |
Print leaderboard from SQLite ledger |
reset |
Reset strategy to baseline, wipe caches |
evaluate |
Run walk-forward + holdout on a single strategy |
init-strategy |
Scaffold a new strategy with validated config |
llm-test |
Test an LLM edit against preflight checks |
spa |
Hansen's Superior Predictive Ability audit |
Use uv run autobacktest --help for full flag details.
uv run pytest # 447+ tests
uv run ruff check . # linting (line-length 120, target py312)
uv run mypy src/ # strict type checkingautobacktest/
├── strategies/ # Strategy subdirectories (<name>/strategy.py + config.yaml)
├── src/autobacktest/ # Core engine
│ ├── cli.py # Typer entrypoint
│ ├── commands/ # Subcommand implementations (run, report, evaluate, etc.)
│ ├── orchestrator.py # Optimization loop orchestration
│ ├── optimization/ # Candidate generation, eval mgmt, persistence
│ ├── gate.py # Two-phase (select + confirm)
│ ├── evaluator/ # Backtest, engine, metrics, CSCV/PBO, DSR, regime
│ ├── strategy/ # Validator, AST linter, sandbox, codemod, diversity
│ └── data/ # Price data, caching
├── docs/ # Architecture, API reference, setup guides
├── runs/ # Run artifacts (git-ignored)
├── program.md # LLM objective + constraints template
├── runs/lessons.db # SQLite-backed LLM lesson store
└── .env.example # Environment template
Detailed documentation is available in docs/:
| File | Description |
|---|---|
docs/index.md |
Documentation hub with full table of contents |
docs/about-project.md |
Project overview, goals, and primary interaction flows |
docs/strategy-guide.md |
Complete guide to creating, configuring, and optimizing strategies |
docs/architecture.md |
System architecture and component design |
docs/api-reference.md |
Full API reference for CLI and core modules |
docs/developer-setup.md |
Environment setup and development workflow |
docs/optimization-config-reference.md |
Complete configuration parameter catalog |
- yfinance data quality: corporate actions, dividend timing, survivorship bias. Verify against professional data feeds before deploying live.
- Backtest overfitting: walk-forward + DSR mitigate this, but extreme iteration counts on small datasets will overfit.
- No live trading: this is a research platform — no brokerage connectivity.
- LLM costs: ~$2–$5 per 50 iterations on GPT-4o / Claude 3.5 Sonnet.
- Monthly rebalancing only: intraday / HFT is unsupported.