22 May 05:15

f355567

v0.2.0 Latest

Latest

v0.2.0: Frozen Benchmark Protocol And Reproduction Pack

This is the first protocol-focused TradeArena release. It freezes the v0.2 benchmark spec, separates engineering/benchmark/scientific claim boundaries, and ships a no-key external reproduction pack.

Highlights

Frozen v0.2 benchmark spec with canonical spec hashing.
Claim boundary badge and public claim-boundary policy.
One-command external reproduction pack with command logs, environment metadata, artifact hashes, trajectory hash, and provenance flags.
Expanded classical baselines and failure-autopsy tooling.
Public notes for execution calibration priorities and known limitations.

One-command reproduction

python scripts/run_external_reproduction_pack.py --output-dir outputs/reproduction/v0_2

Expected no-key trajectory reproducibility hash:

sha256:bf3b1084aeec89f3bf0f99ab91b6c16a989dc8c8a29d9e93c8c72109548e442f

Canonical v0.2 benchmark spec hash:

sha256:a777cdfb962a07e658996c9366070d4b0ffb867659c2ccc45685a5c788bf6204

Official package hashes

File	SHA-256
`tradearena_benchmark-0.2.0-py3-none-any.whl`	`sha256:2d21b11554100a9c52fd3b934e2919976e7e5ce4f2912aa7df0ff9110eda621e`
`tradearena_benchmark-0.2.0.tar.gz`	`sha256:25d0fc6a58914558e3197a17d85ed64dd754e67a09d4aa176c48f7a8544a2568`

Known limitations

The no-key reproduction pack is an engineering reproducibility target, not a model-skill claim.
Provider-backed model rows remain sensitive to provider routing, prompts, rate limits, cache provenance, and model-version drift.
The default execution simulator is a stress-test simulator, not a calibrated venue-level quote/order-book/fill replay.
Scientific claims require repeated seeds or rolling windows, non-LLM baselines, statistical intervals, failure autopsy, and independent reproduction reports.

PyPI: https://pypi.org/project/tradearena-benchmark/0.2.0/

Assets 4

17 May 12:57

github-actions

v0.1.2

2b52bc1

v0.1.2

Full Changelog: v0.1.1...v0.1.2

Assets 4

17 May 08:29

weich97

v0.1.1

2937fae

v0.1.1: High-Spread Execution Stress Preset

TradeArena v0.1.1 is a small maintenance release focused on making execution
realism easier to inspect and reproduce.

Highlights

Added an explicit spread_bps parameter to the realistic order simulator.
Added a high_spread row to examples/execution_realism_sweep_demo.py.
The high-spread preset models market orders crossing half the quoted
bid-ask spread before market impact and volatility slippage.
The execution sweep now emits spread configuration fields into its JSON and
CSV artifacts.
Added tests covering spread-driven crossing cost and the high-spread demo
row.

Why It Matters

The preset separates spread cost from generic slippage. This makes it easier
to show that an agent can keep a high fill rate while still losing realized
performance to wide quoted markets.

Reproduce

python -m pip install -e ".[dev]"
python examples/execution_realism_sweep_demo.py
python scripts/run_showcase.py --reuse-existing
python -m pytest tests -q

Related Issue

Closes #3.

Assets 2

17 May 02:29

weich97

v0.1.0

4238a9b

v0.1.0: Auditable benchmark release for LLM trading agents

v0.1.0: Auditable Benchmark Release For LLM Trading Agents

TradeArena v0.1.0 is the first public benchmark release for evaluating LLM
trading agents as auditable decision-making systems under realistic market
constraints.

Highlights

Quickstart showcase: run python scripts/run_showcase.py to generate a
local demo portal without model keys or live market-data downloads.
Captioned demo video: watch or regenerate a 3-minute walkthrough of the
showcase portal, audit report, execution realism, extension walkthrough, and
retail planning sandbox. Browser playback is available at
https://weich97.github.io/TradeArena/demo_video.html.
Replayable audit trajectories: every decision records observation,
signals, intended allocation, risk-gate changes, orders, fills/rejections,
portfolio state, memory, and reproducibility metadata.
Execution realism: built-in simulator models fees, slippage, latency,
liquidity constraints, partial fills, pending orders, and rejections.
Risk lifecycle: pre-trade gates, in-trade monitors, post-trade
attribution, suitability checks, and risk-violation logs are first-class
artifacts.
Hands-on extensions: examples cover custom analysts, custom risk modules,
custom evaluators, A-share rules, AkShare CSV reuse, retail planning, and
paper rebalance reports.
Research-grade diagnostics: tracked artifacts show representation
signatures, crisis-scene probes, feedback-alignment diagnostics, and 51-stock
intraday portfolio behavior without exposing raw provider prompt/response
caches.

Quick Start

python -m pip install -e ".[dev]"
python scripts/run_showcase.py

Open:

outputs/examples/showcase.html

What This Release Is Not

TradeArena is not a live trading bot and does not promise profitable trading.
It is a benchmark, simulation, and audit framework for studying whether LLM
trading agents can be reproduced, inspected, risk-gated, and evaluated under
realistic constraints.

Suggested GitHub Release Title

v0.1.0: Auditable benchmark release for LLM trading agents

Assets 3

Releases: weich97/TreLLM

v0.2.0

v0.2.0: Frozen Benchmark Protocol And Reproduction Pack

Highlights

One-command reproduction

Official package hashes

Known limitations

Uh oh!

v0.1.2

Uh oh!

v0.1.1: High-Spread Execution Stress Preset

v0.1.1: High-Spread Execution Stress Preset

Highlights

Why It Matters

Reproduce

Related Issue

Uh oh!