Skip to content

baywire/baywire.app

Repository files navigation

Baywire

The live wire for Tampa Bay.

A unified guide to live music, festivals, food, and family fun across the Tampa Bay area — Tampa, St. Petersburg, Clearwater, Brandon, Bradenton, Safety Harbor, Dunedin, and an Other catch-all for edge cases. Listings are aggregated daily from multiple sources, deduplicated, and ranked for readability. Curated Places (beaches, venues, food, and similar) ship on /places via a separate discovery pipeline. Lives at baywire.app.

Baywire runs 18 event adapters (src/lib/scrapers/index.ts). The daily matrix includes only sources that are enabled in the database and have a matching adapter (scripts/ci-scrape-matrix.ts). Each job tries structured data first (JSON-LD on detail pages, the WordPress Tribe Events JSON API, the Ticketmaster Discovery API, iCal exports) and falls back to OpenAI extraction when nothing structured is available; venue rows are upserted into Places as events are processed. Vercel hosts the read-only Next.js app — scrapes run on GitHub Actions, not on Vercel.

   GHA cron (daily 12:00 UTC)
       │  matrix: one job per source
       ▼
   adapter.listEvents
       │
       ├─ tryStructured() ── parse JSON-LD / ICS / vendor JSON ──┐
       │                                                          ▼
       └─ fetchAndReduce() ── HTML reducer ── OpenAI extractor ──▶ Prisma Postgres + Accelerate
                                                                      │
                                                                      │ (Places upserted from venues)
                                                                      ▼
                                                              Vercel Next.js (read-only)

Stack

  • Next.js 16 (App Router, React 19, RSC by default) + Serwist (@serwist/turbopack) for the PWA shell
  • proxy.ts (Next.js proxy) — anonymous guest profile cookie bootstrap
  • Tailwind CSS v4 + custom coastal palette
  • Prisma ORM + Prisma Postgres + Prisma Accelerate (managed Postgres + edge cache / connection pool in one URL); URL also lives in prisma.config.ts for Prisma 7 CLI
  • OpenAI gpt-4.1-mini with Zod-typed structured outputs (also works with any OpenAI-compatible proxy via OPENAI_BASE_URL, e.g. Poe / Groq / Together)
  • Google Places API (New) + Vercel Blob for place discovery hero images (npm run places:discover, scheduled workflow)
  • Stytch for SMS sign-in / sessions (optional for local dev depending on feature work)
  • cheerio for HTML reduction, p-limit for per-host pacing, Playwright where adapters need a real browser
  • GitHub Actions — daily event scrape matrix, weekly places discovery, daily expired-row cleanup
  • Vercel hosts the read-only Next.js app (event and place pages load from the DB on the server)

Sources

Slug Site Path Notes
eventbrite eventbrite.com JSON-LD Geo-search across all metro cities (excluding other), 2 pages each
ticketmaster ticketmaster.com/discover/tampa Discovery API Official Discovery API, DMA 635 (Tampa-St. Pete-Sarasota)
visit_tampa_bay visittampabay.com/events JSON-LD Official tourism, curated
visit_st_pete_clearwater visitstpeteclearwater.com JSON-LD Both /events and /events-festivals listings
tampa_gov tampa.gov/calendar JSON-LD + ICS City of Tampa public events calendar
ilovetheburg ilovetheburg.com Tribe REST API St. Pete blog
thats_so_tampa thatssotampa.com Tribe REST API Tampa-side blog
tampa_bay_times tampabay.com/things-to-do HTML + LLM Editorial weekend picks
tampa_bay_markets tampabaymarkets.com HTML + LLM Recurring farmers' markets across the bay
safety_harbor cityofsafetyharbor.com RSS hint + LLM CivicPlus RSS feed → SSR detail pages
side_splitters sidesplitterscomedy.com HTML + LLM Comedy club; listings link out to OvationTix
dont_tell_comedy donttellcomedy.com HTML + LLM Pop-up “secret” comedy shows
funny_bone_tampa tampa.funnybone.com HTML + LLM Funny Bone Tampa show listings
straz_center strazcenter.org HTML + LLM Playwright solves Incapsula challenge automatically
tampa_theatre tampatheatre.org HTML + LLM Live events page + detail pages

Browser-powered sources: The following adapters use Playwright (headless Chromium) to bypass WAF challenges or render JS-heavy pages:

Slug Site Strategy Notes
dunedin_gov dunedinfl.net Browser render + AI listings Akamai; JS-rendered calendar
unation unation.com Browser render Cloudflare bot challenge
feverup feverup.com Browser render + AI listings Full SPA
straz_center strazcenter.org Browser cookies Incapsula WAF
funny_bone_tampa tampa.funnybone.com Browser cookies DataDome WAF
visit_tampa_bay visittampabay.com Browser render (listing only) JS-rendered calendar

Local setup

# 1. Install dependencies (postinstall runs `prisma generate`)
npm install

# 2. Configure environment
cp .env.example .env.local
# Edit .env.local — required for scrapes:
#   DATABASE_URL, OPENAI_API_KEY, CRON_SECRET
# Optional / feature-specific (see .env.example comments):
#   TICKETMASTER_API_KEY, GOOGLE_MAPS_API_KEY, BLOB_READ_WRITE_TOKEN,
#   STYTCH_PROJECT_ID, STYTCH_SECRET, STYTCH_ENVIRONMENT

# 3. Push the schema to the database
npm run db:push

# 4. Seed it with one full scrape
npm run scrape
# or scrape a single source:
npm run scrape eventbrite

# 5. Start the dev server
npm run dev

Open http://localhost:3000.

Provisioning Prisma Postgres

  1. Sign in at console.prisma.io and create a Prisma Postgres database.
  2. Copy the connection string — it looks like prisma+postgres://accelerate.prisma-data.net/?api_key=... — and paste it into DATABASE_URL in .env.local.
  3. Run npm run db:push to materialize the schema. (Use npm run db:migrate:dev once you want versioned migrations.)

The same URL handles both the live query path (via Accelerate, with edge caching) and migrations, so you don't need a separate "direct URL".

Useful scripts

Command What it does
npm run dev Next.js dev server (clears .next first)
npm run build Production next build (run npm install first so postinstall generates Prisma Client)
npm run typecheck tsc --noEmit
npm run lint ESLint (eslint .)
npm run db:push Push schema to the database (dev only)
npm run db:migrate:dev Generate + apply a development migration
npm run db:migrate Apply existing migrations (production)
npm run db:studio Open Prisma Studio
npm run scrape [slug] Run the event scrape pipeline once, locally
npm run places:discover Run the Google Places + editorial discovery pipeline (--help for flags)
npm run cleanup:expired Delete stale events (and old orphan places unless --skip-places)
npm run ci:scrape-matrix Emit the GitHub Actions scrape matrix JSON from the DB (CI only)

Deployment

Production splits Vercel (HTTP app) from GitHub Actions (scheduled writes).

Vercel (read-only web app)

  1. Push to GitHub and import the repo into Vercel.
  2. In Project Settings → Environment Variables, set at minimum DATABASE_URL. Add CRON_SECRET if you want the manual POST /api/cron/scrape route bearer-gated. For profiles, places imagery, and SMS auth, add BLOB_READ_WRITE_TOKEN, GOOGLE_MAPS_API_KEY, and the STYTCH_* variables from .env.example as needed.
  3. Deploy. There are no Vercel cron schedules — vercel.json is intentionally empty.

GitHub Actions — daily event scrape

The workflow at .github/workflows/scrape.yml runs every day at 12:00 UTC. It builds a matrix from enabled sources rows that match a code adapter (npm run ci:scrape-matrix), then runs npm run scrape -- <slug> per cell. Each job parses the [scrape:result] line into the step summary and uploads scripts/.last-html/ plus scrape.log when it fails. Playwright browsers are cached and Chromium is installed per job.

Secrets for scrape:

Secret Required Notes
DATABASE_URL yes Same Prisma Postgres URL Vercel uses
OPENAI_API_KEY yes Used only by HTML+LLM fallbacks
OPENAI_BASE_URL optional OpenAI-compatible proxy (Poe / Groq / …)
OPENAI_EXTRACT_MODEL optional Override default gpt-4.1-mini
TICKETMASTER_API_KEY optional Free Discovery API key. Without it the ticketmaster adapter errors and the rest of the matrix continues

GitHub Actions — weekly places discovery

.github/workflows/discover-places.yml runs Sundays at 08:00 UTC (and supports workflow_dispatch with optional city and type inputs). It executes npm run places:discover.

Additional secrets: GOOGLE_MAPS_API_KEY, BLOB_READ_WRITE_TOKEN, plus the same OpenAI variables as the scrape workflow for the editorial pass.

GitHub Actions — expired rows cleanup

.github/workflows/cleanup.yml runs daily at 06:00 UTC and executes npm run cleanup:expired with DATABASE_URL only.

Manual scrape trigger

  • workflow_dispatch on the scrape workflow accepts an optional source input. Leave it blank to run every enabled adapter in parallel.

  • Scheduled GitHub runs may start 5–30 minutes late under platform load; use workflow_dispatch or the Vercel cron lever below when you need tighter timing.

  • POST /api/cron/scrape on Vercel is preserved as a manual lever:

    curl -H "Authorization: Bearer $CRON_SECRET" \
         https://your-app.vercel.app/api/cron/scrape

    It returns immediately and continues the work in the background via next/after. GitHub Actions remains the source of truth for daily scrapes.

Project layout

proxy.ts                  Next.js proxy — guest profile cookie bootstrap
src/
  app/
    events/               Event list + `/events/[id]` detail
    places/               Curated places index + `/places/[slug]` detail
    api/cron/scrape/      Bearer-gated manual scrape trigger
    serwist/              Serwist worker route (PWA)
  components/             RSC + client UI (home, places, design-system, …)
  lib/
    cities.ts             City constants (shared DB enum + UI)
    auth/                 Stytch wiring + session helpers
    db/                   Prisma client (Accelerate) + query helpers
    extract/              OpenAI structured-output extraction
    pipeline/             Event orchestrator; `discoverPlaces` for places
    places/               Google Places client + discovery types
    scrapers/             Per-source adapters + shared fetch/reduce/browser
    time/                 America/New_York-aware window helpers
    utils.ts              Shared helpers (`cn`, formatting, …)
prisma/
  schema.prisma           Source, ScrapeRun, Event, Place, profiles, …
  migrations/             SQL migrations (after `db:migrate:dev`)
prisma.config.ts          Prisma 7 datasource URL for CLI
scripts/
  scrape.ts               `npm run scrape`
  places-discover.ts      `npm run places:discover`
  cleanup-expired.ts      `npm run cleanup:expired`
  ci-scrape-matrix.ts     Matrix builder for GitHub Actions

Cost & rate posture

  • Per-host pacing is 1 request / 1.1 seconds with concurrent extraction at 4 in flight.
  • Structured-first: adapters with JSON-LD / ICS / vendor JSON skip the LLM entirely via tryStructured. The HTML+LLM fallback runs only when no structured surface exists for a given event.
  • Content-hash short-circuit: events whose structured payload (or reduced HTML) hasn't changed never re-hit the LLM.
  • Reduced HTML is capped at 16k characters (well under 4k tokens) before being sent to gpt-4.1-mini.
  • A daily run currently touches ~100–200 unique events. With Phase 2 structured-first enabled, most adapters do zero LLM calls per event; only tampa_bay_times, tampa_bay_markets, and safety_harbor consistently use the LLM, plus any detail page where structured extraction returned null.
  • Accelerate-backed read helpers use cacheStrategy: { ttl: 60, swr: 300 } where caching applies (src/lib/db/queries.ts, queriesPlaces.ts), so repeated reads can hit the edge cache.

Attribution & ToS

This project respects each source's robots.txt and only fetches public listing and detail pages. Event cards link back to the original URL, and the site footer lists enabled sources from the database. If a publisher requests removal, contact them or open an issue and we'll disable the relevant adapter.

About

A modern, AI-curated guide to live music, festivals, food, and family fun across the Tampa Bay area — Tampa, St. Petersburg, Clearwater, Brandon, and Bradenton.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages