Skip to content

alejandromav/quantarded

Repository files navigation

Quantarded

Quantarded

An experiment in extracting trading signals from messy, public data.

Live dashboard · Algorithm docs · Changelog


TL;DR

Quantarded is a personal research project I ran to test a simple hypothesis: public, messy, unstructured data sources leak enough signal to systematically beat a broad index — if you process them without the emotional bias humans bring to markets.

This repository is the data pipeline that powers the experiment. It ingests three signal sources in parallel — Reddit (r/wallstreetbets), US congressional trade disclosures (via Quiver Quant), and contrarian indicators — normalizes them into a unified event schema, classifies the unstructured content with an LLM, and streams everything into Tinybird for analytics.

The companion site at quantarded.com publishes a weekly trading basket derived from the data, and tracks live performance against the NASDAQ.

Important

This is an educational research project. Results below are hypothetical (paper trading on public signals), not investment advice, and past performance does not predict future returns.

Results so far

The experiment has been running continuously since week 51 of 2025 (late December 2025), publishing one signal-driven trading basket per week. As of early May 2026:

Quantarded portfolio value vs NASDAQ since inception
Metric Value
Cumulative return +32.84%
Edge vs NASDAQ +23.59 pp
Max drawdown −9.34%
Sharpe ratio (annualized) 1.77
Weeks running 20+

Live numbers, position history, and the weekly newsletter are at quantarded.com. The intent of publishing in the open was to make the experiment falsifiable: every signal, every entry, and every loss is timestamped and public.

What I learned

A few things became obvious only by running this end-to-end for several months:

  • Breadth beats depth. Baskets where many independent signals pointed the same direction were structurally more stable than baskets with one large conviction trade. Concentrated bets won the biggest weeks and lost the biggest weeks; broad consensus was less spectacular but compounded.
  • Visibility is not conviction. Tickers like TSLA and NVDA dominated raw mention counts on WSB, but sentiment was so divided that they rarely cleared the imbalance threshold. The signals worth trading were almost never the loudest.
  • Congressional trades are slow, not useless. Form 4 disclosures are stale by the time they're public — but clustered, repeat purchases by the same representative over multiple weeks did indicate position-building worth tracking on a longer horizon.
  • LLMs are cheap precision filters. A naive regex over r/wallstreetbets produces thousands of false positives (every "FOR", "ALL", "ON" gets flagged as a ticker). A constrained prompt with high-precision rules cuts that to a usable signal at fractions of a cent per request.
  • Ship to a real warehouse from day one. Writing every event to Tinybird from the first commit meant I could answer "what was the signal on April 3rd?" months later without re-running anything. The instinct to write to JSONL files and "figure out storage later" would have killed the project.

Architecture

                      ┌──────────────────────────────────┐
                      │           SOURCES                │
                      ├──────────────────────────────────┤
   ┌──────────────────┤  Reddit  ·  Quiver Quant API     │
   │                  └──────────────────────────────────┘
   │
   │   ┌─────────────┐    ┌──────────────┐    ┌──────────────────┐    ┌───────────┐
   └──▶│   Fetch     │───▶│  Normalize   │───▶│  LLM classify    │───▶│ Tinybird  │
       │  (paginate, │    │  (event      │    │  (Reddit only:   │    │ (events + │
       │   rate-lim, │    │   schema,    │    │   ticker +       │    │  job_runs │
       │   proxy)    │    │   dedupe)    │    │   sentiment)     │    │   tables) │
       └─────────────┘    └──────────────┘    └──────────────────┘    └───────────┘

Two scrapers, one container, shared event schema:

Scraper Source Schedule Why LLM?
reddit-scraper r/wallstreetbets submissions + comments Every 5 min, 15-min window Yes — content is unstructured prose
quiver-scraper US House congressional trades Cron (default 6h) No — tickers are structured fields

Both scrapers emit NormalizedEvent records to the same Tinybird events_landing data source, so downstream analytics queries the same table regardless of source. Job execution metrics (success, duration, counts, errors) land in a separate job_runs data source for observability.

Why this shape

  • Single normalized event schema means new sources can be added without touching the warehouse layer. The payload field is intentionally untyped JSON so each source can preserve its native fields verbatim.
  • Deterministic event IDs (SHA-256 of the natural key) make ingestion idempotent — re-running the scraper on the same window produces identical event IDs, so duplicates are dropped at the warehouse.
  • Parallel LLM batches with bounded concurrency keep latency low without thrashing rate limits. Empirically, 3 concurrent batches of 50 items each was the sweet spot for gpt-4o-mini.
  • No queue, no broker, no orchestrator. It's a single Node process running in a container with a shell loop. The simplest thing that could work — and for ~$5/month on Hetzner, it does.

Algorithm docs

The scoring algorithms that turn raw events into a weekly basket live in doc/algorithm/:

Quick start

Prerequisites

Key Where to get it
LLM_API_KEY OpenAI — platform.openai.com/api-keys
TINYBIRD_TOKEN Tinybird — tinybird.co
QUIVER_API_KEY Quiver Quant (Hobbyist+) — api.quiverquant.com

Deploy the Tinybird datasources

tb push tinybird/datasources/events_landing__v0.datasource
tb push tinybird/datasources/job_runs__v0.datasource

Run locally

npm install
cp .env.example .env       # fill in API keys
npm run reddit:scrape      # one-shot Reddit scrape
npm run quiver:scrape      # one-shot Quiver scrape

In development (default NODE_ENV), events are also written to tmp/*.jsonl for inspection. In production, only Tinybird receives them.

Run continuously with Docker

cp docker-compose.yml.example docker-compose.yml
# edit env vars
docker compose up --build

The container runs both scrapers on their own schedules — Reddit every 5 minutes, Quiver on a cron expression — and restarts automatically on failure.

Configuration

All configuration is environment-variable driven. See .env.example for the full list with defaults; the most important ones:

Variable Default Notes
REDDIT_SCRAPER_ENABLED true Toggle the Reddit scraper
TIME_WINDOW_MINUTES 15 How far back each Reddit run looks
SCRAPER_INTERVAL_MINUTES 5 How often Reddit runs (Docker)
CLASSIFY_BATCH_SIZE 50 Items per LLM call
CLASSIFY_CONCURRENCY 3 Parallel LLM batches
MIN_CONTENT_LENGTH 10 Skip items shorter than this (saves tokens)
MAX_CONTENT_LENGTH 2000 Truncate longer items (caps tokens)
LLM_MODEL gpt-4o-mini Any OpenAI-compatible chat model
QUIVER_SCRAPER_ENABLED false Toggle the Quiver scraper
QUIVER_SCRAPER_CRON 0 */6 * * * Standard 5-field cron expression

Reddit's API caps listings at 1,000 items per endpoint. A 15-minute window with a 5-minute interval comfortably fits inside that limit during peak WSB hours.

Event schema

A single shape covers every source. payload is intentionally loose so each source preserves its native fields.

{
  "event_type": "reddit_comment",        // or reddit_submission, congressional_trade
  "event_id":   "<sha256 of natural key>",
  "source":     "wsb",                   // or quiver-daily
  "timestamp":  "2025-12-16T17:58:35Z",
  "version":    "1",
  "payload": {
    "reddit_link": "...",
    "content":     "...",
    "tickers": [
      { "ticker": "TSLA", "sentiment": "sell", "confidence": 0.85 }
    ]
  }
}

For congressional trades, payload carries the full Quiver row verbatim plus an ingested_at timestamp, so any new field Quiver adds is captured without a schema change.

Project layout

src/
├── lib/                          # Domain modules
│   ├── reddit.ts                 # Reddit API client (paginated)
│   ├── quiver.ts                 # Quiver API client (paginated, rate-limited)
│   ├── normalize.ts              # Reddit → NormalizedEvent
│   ├── normalize-quiver.ts       # Quiver trade → NormalizedEvent
│   ├── classify.ts               # LLM ticker + sentiment extraction
│   ├── tinybird.ts               # Tinybird ingestion client
│   └── job-runner.ts             # Shared job lifecycle utilities
├── scripts/                      # CLI entry points (one per scraper)
├── utils/                        # HTTP, hashing, date helpers
├── config.ts                     # Env-driven configuration
└── types.ts                      # Shared TypeScript types

tinybird/                         # Tinybird datasources & pipes
doc/algorithm/                    # Algorithm design docs (versioned)
infra/                            # Terraform + Hetzner deploy scripts
bin/docker-entrypoint.sh          # Container scheduler (both scrapers)
.github/workflows/                # CI (lint + typecheck) + CD (GHCR + deploy)

Development

npm run lint           # ESLint
npm run lint:fix       # ESLint with --fix
npm run format         # Prettier write
npm run format:check   # Prettier check (used in CI)

A Husky pre-commit hook runs Prettier, ESLint --fix, and tsc --noEmit on staged files. CI runs the same checks on every push.

Deployment

The infra/ directory contains a Terraform setup that provisions a single Hetzner Cloud VM and a GitHub Actions pipeline that builds the Docker image, pushes it to GHCR, and deploys on every tagged release. See infra/README.md for details.

The whole production setup costs ~€5/month. The point isn't that this is the right way to deploy a serious system — it's the minimum viable infrastructure that runs the experiment reliably enough to publish weekly results.

License

MIT — use it, fork it, learn from it.

Disclaimer

This is a personal research project for educational purposes. Nothing in this repository or on quantarded.com constitutes financial advice. The published returns are hypothetical and based on paper-trading public signals; they should not be interpreted as a recommendation or as evidence of future performance. Trade your own money at your own risk.

About

AI-powered algorithmic trading experiment

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors