Run a World Cup prediction pool for your team, company, or friends — deployed in minutes on Databricks.
Quick Deploy | Features | Scoring | Configuration | Development
Predict scorelines for all 104 World Cup matches, pick the tournament champion and top goal scorers, and compete on a live leaderboard against your colleagues. Match results sync automatically — just deploy and invite your team.
Built as a Databricks App powered by Lakebase (managed PostgreSQL) and deployed via Databricks Asset Bundles.
Predict scores for every match — group stage through the final
Pick the tournament winner and top 3 goal scorers before kickoff
Live leaderboard — track rankings across 150+ participants
- 104 match predictions — fill in scorelines for every group and knockout match. Knockout brackets update automatically based on your predictions.
- Tournament picks — choose the champion and up to 3 top goal scorers before the tournament starts.
- Live leaderboard — real-time rankings with detailed points breakdown (outcome, exact score, scorer goals, advancer bonus).
- Automatic sync — match fixtures and live scores pulled from football-data.org every 5 minutes.
- Custom branding — upload your company logo and set a pool name from the admin panel.
- Player profiles — display name, nationality, and profile picture for each participant.
- Scales to hundreds of users — multi-worker FastAPI with connection pooling and batch operations.
- One-click deploy — Databricks Asset Bundle handles everything: app, sync job, and optional AI/BI dashboard.
Points are awarded per match after results are synced. Knockout rounds are worth more.
| Round | Multiplier | Outcome | Exact score | Scorer | Advancer |
|---|---|---|---|---|---|
| Group | x1 | 2 | 5 | 2 | — |
| R32 | x1.5 | 3 | 8 | 3 | 5 |
| R16 | x2 | 4 | 10 | 4 | 6 |
| QF | x2.5 | 5 | 13 | 5 | 8 |
| SF / 3rd / Final | x3 | 6 | 15 | 6 | 9 |
Tournament winner: +25 pts when the final result matches your champion pick.
Prerequisites: A Databricks workspace with Lakebase enabled, Databricks CLI (v0.285+), Node.js 18+, psql and jq, and an API token from football-data.org — see API token tier below for which plan to pick.
# 1. Clone
git clone https://github.com/onno101/worldcup-pool.git && cd worldcup-pool
# 2. Store your football-data.org API token as a Databricks secret
# (used by the 5-minute scheduled sync job)
databricks secrets create-scope worldcup_pool
databricks secrets put-secret worldcup_pool football_data_token --string-value "YOUR_TOKEN"
# 3. Deploy — pass your email as admin so you can manage the pool
databricks bundle deploy -t dev --var admin_emails=you@company.com
# 4. Grant the app's service principal access to Lakebase (one-time, idempotent)
./scripts/bootstrap_lakebase_app.sh dev
# 5. Start the app
databricks bundle run worldcup_pool_app -t dev
Step 3 — admin emails must be set at deploy time.
--var admin_emailswritesADMIN_EMAILSinto the app environment so you can access admin endpoints (config, logo upload, manual sync). You can re-runbundle deploywith a different value any time; the script in the next step is idempotent so it won't redo work.For multiple admins, set the value in
databricks.ymlinstead of via--var. The Databricks CLI doesn't reliably forward comma-separated values through the--varflag, so passing--var admin_emails=alice@x.com,bob@x.comends up with only the first address taking effect. Opendatabricks.yml, find your target (e.g.dev) undertargets:, and add anadmin_emailsentry to itsvariables:block:targets: dev: default: true mode: development variables: admin_emails: "alice@company.com,bob@company.com"Then deploy without
--var admin_emails:databricks bundle deploy -t dev. Commit this change to your fork so future deploys keep the same admin list.
Why step 4? The Databricks App runs as its own service principal. Lakebase requires that SP to be registered as a Postgres OAuth role and granted schema permissions before the app can create its tables. The bootstrap script reads the SP from the deployed app and provisions everything via the Lakebase API.
Why step 6 is a UI step. The scheduled sync job reads
FOOTBALL_DATA_TOKENfrom your secret scope (step 2), so matches will populate within ~5 minutes regardless. Setting the token directly on the app speeds up first sync and lets the in-app admin "Sync now" button work immediately. If you'd rather wait for the cron, you can skip step 6.
The Lakebase endpoint defaults to projects/worldcup-pool/branches/production/endpoints/primary. Override with:
databricks bundle deploy -t dev --var lakebase_endpoint="projects/YOUR_PROJECT/branches/production/endpoints/primary"Create a Lakebase project first if you don't have one yet: in the Databricks UI, go to Catalog > Lakebase > Create project.
The pool uses football-data.org for fixtures, live scores, and goal scorers. Pick your plan based on which features you want:
| Plan | Cost | Fixtures & live scores | Goal scorer events |
|---|---|---|---|
| Free Tier | Free | Yes | No — scorer points won't be awarded |
| Free + Deep Data | €29 / month | Yes | Yes — required for top-scorer scoring |
| Higher paid tiers | Paid | Yes | Yes |
If you want the top scorer picks and the per-match scorer points to actually score, you need the Free + Deep Data add-on (€29/month at the time of writing) or a higher paid tier. The standard free key returns matches and scores but omits the goal-event detail the sync job needs to credit scorer points.
Subscribe to Deep Data from your account page after registering at football-data.org/client/register. Without it, the app still runs — outcome / exact-score / advancer points all work, but every match's scorer column will be 0.
| Variable | Description | Default |
|---|---|---|
LAKEBASE_ENDPOINT |
Lakebase Autoscale endpoint resource name | projects/worldcup-pool/.../primary |
LAKEBASE_DATABASE |
Postgres database name | databricks_postgres |
FOOTBALL_DATA_TOKEN |
API token from football-data.org | (required) |
ADMIN_EMAILS |
Comma-separated admin emails | (empty) |
WEB_CONCURRENCY |
Uvicorn worker processes | 2 |
PREDICTION_LOCK_BEFORE_KICKOFF_HOURS |
Hours before kickoff when predictions close | 1 |
TOURNAMENT_PICKS_LOCK_AT_UTC |
When tournament picks become read-only (ISO8601) | 2026-06-11T18:00:00+00:00 |
INIT_SCHEMA_ON_START |
Run DDL on app cold start | false |
AUTO_SYNC_MATCHES_IF_EMPTY |
Auto-sync fixtures when matches table is empty | true |
See .env.example for a complete template.
Three targets ship in databricks.yml. Pick the one that matches what you're doing — they differ in which Lakebase branch they point at, whether predictions are still open, and whether schema init / fixture sync run on cold start.
| Target | When to use | Tournament lock | Auto-sync fixtures | Init schema on start | Mode |
|---|---|---|---|---|---|
dev (default) |
Active development, internal testing, your real company pool | 2026-06-11T18:00:00Z (first kickoff) |
true |
true |
development |
simulation |
Demoing what a finished pool looks like — 150 simulated users with picks already submitted | 2026-04-01T00:00:00Z (already past — picks read-only) |
false (uses pre-seeded data) |
false |
development |
prod |
Production deployment for your live pool | 2026-06-11T18:00:00Z |
true |
true |
production (locks workspace path, requires explicit deploy) |
dev is the default and the lightweight option. Use it to test UI changes, try out scoring tweaks, run through the prediction flow, and generally play with the app. Deploys with a bare databricks bundle deploy — no flags needed. If you don't plan to fork the code or customize much, dev is perfectly fine as your real, live company pool too — share its URL, invite your team, you're done.
simulation is for showing the pool off before the tournament starts — recruiters, leadership, internal demos. It points at a separate Lakebase branch (branches/simulation) so it doesn't pollute your real pool's data, and tournament picks are already locked, so visitors see a fully populated leaderboard immediately. To populate the simulated users, run scripts/simulate_data.py against the simulation Lakebase after deploy.
prod is mostly a naming convention — it deploys an app called worldcup-pool-prod (separate URL from -dev) and runs under Databricks Asset Bundle production mode. As shipped, that mode mainly affects bundle-side validation (rejects deploys to per-user paths, requires run_as on jobs) — it doesn't restrict who can update the app or lock down the deployment. The real value is just having two side-by-side namespaces: a stable URL you share with your team, and a scratch URL for trying changes without breaking the first one. If you're not iterating on the code, you don't need prod at all — dev is fine as your live pool. If you do want stronger production guardrails (workspace path locking, permissions, service-principal run_as), add them to the prod target yourself.
All three targets accept --var lakebase_endpoint=projects/<your-project>/branches/<your-branch>/endpoints/primary to point at your own Lakebase project. The defaults in databricks.yml use placeholder values that you'll need to override.
┌─────────────────────────────────────────────────────┐
│ Databricks App │
│ ┌───────────┐ ┌──────────────────────────────┐ │
│ │ React SPA │───▶│ FastAPI (multi-worker) │ │
│ │ (Vite) │ │ - Match predictions API │ │
│ └───────────┘ │ - Tournament picks API │ │
│ │ - Leaderboard (cached) │ │
│ │ - Admin (sync, config) │ │
│ └──────────┬───────────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────┘
│ SQLAlchemy + OAuth
▼
┌──────────────────────┐
│ Lakebase (Postgres) │
│ Auto-scaling │
└──────────────────────┘
▲
┌───────────────────┘
│ Scheduled job (5 min)
┌──────────┴──────────┐ ┌─────────────────────┐
│ Match Sync Job │───────▶│ football-data.org │
│ (Databricks Job) │ │ (scores & fixtures) │
└─────────────────────┘ └─────────────────────┘
uv sync
cp .env.example .env # fill in your values
export DATABASE_URL_OVERRIDE=postgresql://user:pass@localhost:5432/worldcup
export WORLDCUP_DEV_CORS=1
uv run uvicorn worldcup_pool.backend.app:app --reload --host 127.0.0.1 --port 8000In another terminal:
cd ui && npm ci && npm run devWithout a football-data token, call POST /api/dev/seed-demo-matches to populate sample data.
Budget roughly WEB_CONCURRENCY x (DB_POOL_SIZE + DB_MAX_OVERFLOW) connections. Keep that under your Lakebase connection limit. Match prediction saves use batch upserts for efficiency.
The bundle includes a Lakebase-to-Delta mirror that powers an AI/BI dashboard — showing that one database drives both the app and the Lakehouse.
databricks bundle deploy -t dev --var dashboard_warehouse_id=YOUR_ID
databricks bundle run worldcup_sync_matches -t devThis deploys a sync task that mirrors four tables into Unity Catalog Delta, plus a dashboard with KPIs, champion pick distribution, and prediction activity.
The committed ui/package-lock.json resolves packages from registry.npmjs.org (pinned via ui/.npmrc). If your local network can't reach the public npm registry — for example, you're behind a corporate proxy that only allows an internal mirror — the predeploy step will fail.
To regenerate the lockfile against whichever registry your machine can reach:
rm -rf ui/node_modules ui/package-lock.json
cd ui && npm install && cd ..
databricks bundle deploy -t dev --var admin_emails=you@company.comThe regenerated lockfile is local-only; don't commit it back if it now references an internal mirror, since that would break deploys for other people on different networks.
Apache 2.0 — see LICENSE.


