Live reliability dashboard for free NVIDIA NIM API endpoints.
Probes every available endpoint continuously and surfaces throughput, latency, uptime, and congestion — so you can pick a model that actually works right now, without trial and error.
NIM Stats is a public operational dashboard for the free NVIDIA NIM chat-completion endpoints (Llama, Mistral, Gemma, Phi, Qwen, DeepSeek, and more). A background worker streams a small probe against each endpoint on a schedule, measures real time-to-first-token and decode throughput, classifies the result, and persists it. The dashboard reads that telemetry and renders fleet health that's scannable in about five seconds — no login, no setup.
- Real-time fleet status — every endpoint classified
healthy/busy/jammedfrom live probes, with color + shape + text (color-blind safe). - Deep reliability metrics — TTFT, throughput, uptime, congestion, p95/p99 latency, timeout rate, session reliability, volatility, routing confidence, and queue pressure.
- Trends & history — fleet performance chart (12h / 24h / 7d), per-model uptime calendar, time-of-day latency heatmap, and SLA windows (1d / 7d / 30d).
- Incident feed — state transitions (degradation, congestion, recovery) recorded as the worker observes them.
- Explore the fleet — search, provider/status filters, favorites/watchlist, saved filter presets, shareable URL state, and CSV export.
- Public status page at
/status— a read-only, at-a-glance health summary. - Anomaly & quota detection — TTFT spikes and reliability drops vs. a 7-day baseline, plus rate-limit proximity, exposed via internal APIs.
The collector is decoupled from the web app: it writes telemetry to Postgres, and the dashboard server-renders straight from the database.
flowchart LR
A["Worker<br/>(GitHub Actions, every 5 min)"] -->|streamed probe| B["NVIDIA NIM API"]
A -->|write samples + incidents| C[("Postgres<br/>(Supabase)")]
D["Next.js app<br/>(Vercel)"] -->|read + cache| C
E["Browser"] -->|SSR dashboard| D
- Worker (
scripts/probe-once.ts) discovers active endpoints, probes each one (rate-capped under NIM's 40 req/min limit), classifies the operational state, and stores aModelSample. A daily pass prunes old samples and retires dead endpoints. - Database holds raw samples, the latest snapshot per model, and incidents. Derived analytics are computed at read time.
- Web app renders Server Components directly from the database, wrapped in a short-lived data cache so concurrent traffic collapses to roughly one query per window.
Note
The dashboard only shows data once the worker has run at least once. Locally that means running npm run worker; in production, GitHub Actions handles it on a schedule.
| Layer | Choice |
|---|---|
| Framework | Next.js 16 (App Router, Turbopack), React 19 |
| Styling | Tailwind CSS v4, shadcn/ui + Radix |
| Charts | Recharts |
| Data | Prisma 7 + PostgreSQL (@prisma/adapter-pg) |
| Collector | Node + node-cron (local) / a one-shot script (CI) |
- Node.js 20+
- A PostgreSQL database (local or hosted)
- A free NVIDIA NIM API key from build.nvidia.com (
nvapi-…)
# 1. Install dependencies
npm install
# 2. Configure environment
cp .env.example .env
# then set NIM_API_KEY and DATABASE_URL in .env
# 3. Create the schema
npx prisma migrate deploy
# 4. Start the collector (terminal 1) — required for data
npm run worker
# 5. Start the dashboard (terminal 2)
npm run devOpen http://localhost:3000. Data appears within a minute of the worker's first cycle.
| Command | Description |
|---|---|
npm run dev |
Start the dev server on localhost:3000 |
npm run build |
Production build (runs prisma generate first) |
npm run worker |
Run the always-on collector (local dev) |
npm run probe:once |
Run a single probe cycle and exit (used by CI) |
npm run lint |
Lint with eslint-config-next |
Set in .env (see .env.example for the full list):
| Variable | Required | Description |
|---|---|---|
NIM_API_KEY |
yes | NVIDIA NIM API key (nvapi-…) |
DATABASE_URL |
yes | PostgreSQL connection string |
NIM_API_URL |
no | Defaults to https://integrate.api.nvidia.com |
INTERNAL_API_TOKEN |
prod | Locks down non-browser API routes; sent as Authorization: Bearer <token> |
PROBE_MAX_RPM |
no | Outbound probe rate cap (default 30; NIM allows 40) |
RETENTION_DAYS |
no | Prune samples older than this (default 30) |
Public (no auth):
| Route | Description |
|---|---|
GET /api/fleet/trend?range=12h|24h|7d |
Fleet-wide time series |
GET /api/fleet/reliability |
Per-model uptime / heatmap / SLA breakdown |
GET /api/health |
Liveness + last probe time |
Internal (require INTERNAL_API_TOKEN in production): /api/fleet/anomalies, /api/fleet/quota, /api/fleet/overview, /api/models, /api/models/[id], /api/providers.
NIM Stats is designed to run on entirely free tiers — Vercel (web), Supabase (Postgres), and GitHub Actions (the worker, on a public repo). See PRODUCTION.md for the architecture, the data-volume math, and step-by-step deploy instructions.
Important
The worker runs as a scheduled GitHub Actions job (Vercel has no always-on process). Use Supabase's pooled connection for the app and the direct/session connection for the worker and migrations.