A small middleware layer that sits between your app and an LLM provider. It routes each request to a Perplexity model based on prompt complexity (simple, medium, complex) and records every call in PostgreSQL: tokens, cost, latency, model, prompt, and response. One place to see what you spend and where.
This repo uses Perplexity as the reference provider. The design is provider-agnostic so you can plug in another API by implementing one interface and updating config.
Without a central layer you rarely know how much you spend per model or per day. Sending every request to the most capable model is wasteful. Here, simple queries go to a lighter model and complex ones to a research model. All usage is stored so you get a single view of cost and volume.
Provider and models
Perplexity models used: sonar (simple), sonar-pro (medium), sonar-deep-research (complex). Pricing is configurable in the backend. The provider interface lives in backend/services/providers/ so you can add another API without changing tracking or analytics.
flowchart TB
subgraph Client
app[App Dashboard]
end
subgraph Router[LLM Router]
classify[Classify prompt]
choose[Choose model]
store[Store request]
end
app --> classify
classify --> choose
choose --> store
store --> app
choose --> pplx[Perplexity API]
store --> pg[(PostgreSQL)]
store --> prom[Prometheus]
prom --> grafana[Grafana]
Each prompt is classified into simple, medium, or complex:
-
Token count (estimate:
len(prompt.split())):- < 50 tokens → lean simple
- 50–200 tokens → lean medium
- > 200 tokens → lean complex
-
Keyword signals (case-insensitive):
- Complex: "research", "analyze", "comprehensive", "detailed analysis", "in-depth", "investigate", "evaluate", "architecture", "explain in detail"
- Simple: "what is", "define", "hello", "hi", "thanks", "yes", "no", "list"
- Medium: "explain", "describe", "steps", "how would", "how do", "pros and cons", "compare", "structure", "deploy", and similar
-
Final tier: Keyword match overrides token count when a strong signal is present.
-
Route:
- simple → sonar
- medium → sonar-pro
- complex → sonar-deep-research
-
Override: If the request includes
{"model_override": "sonar-pro"}, that model is used regardless of routing.
| Tier | Model | Use case |
|---|---|---|
| simple | sonar | Simple queries |
| medium | sonar-pro | Reasoning tasks |
| complex | sonar-deep-research | Research/analysis |
Every request records:
| Field | Description |
|---|---|
| request_id | UUID |
| timestamp | UTC |
| prompt / response | Full text |
| model_requested / model_used | Routed model / actual model |
| prompt_tokens, completion_tokens, total_tokens | From API usage |
| prompt_cost_usd, completion_cost_usd, total_cost_usd | Calculated from config pricing |
| citation_tokens, search_queries_count | Deep-research only (else 0) |
| citation_cost_usd, search_cost_usd | Deep-research only |
| reasoning_tokens, reasoning_cost_usd | From API when present (e.g. reasoning-pro), $3/1M |
| latency_ms | Wall clock request→response |
| routing_tier, routing_reason | simple/medium/complex + short reason |
| project_tag | Optional multi-project tag |
| cached, provider, status, error_message | Placeholder / provider / success or error |
- Actual cost: Sum of
total_cost_usdfor all requests (using routed model pricing). - If all sonar-pro: For the same token counts, cost if every request had used sonar-pro pricing.
- Savings = (if all sonar-pro) − (actual). Exposed at
GET /analytics/cost-savings.
Dashboard and demo totals are estimates from our configurable pricing (input, output, citation, search). They do not include reasoning tokens. Perplexity may bill deep-research under a different product name and charge for reasoning separately, so your real invoice can be higher. Use the router for relative comparison (savings, trends) and check Perplexity’s billing for actual spend.
- Implement the abstract
Providerinbackend/services/providers/base.py: implementcall(model, messages)returningcontent,usage, and optionalcitation_tokens/search_queries_count. - Add a new file under
backend/services/providers/(e.g.openai.py) and implement the interface. - Update config (e.g. model list and pricing) and wire the new provider in the chat router (e.g. via env or config key).
- No change to tracking or analytics — they remain provider-agnostic.
| Layer | Technology |
|---|---|
| Backend | FastAPI, uvicorn |
| DB | PostgreSQL 16, SQLAlchemy 2 async, asyncpg |
| HTTP | httpx |
| Metrics | Prometheus, prometheus-client |
| Dashboards | Grafana (pre-provisioned) |
| Frontend | React 18, Vite, Tailwind, recharts |
| Tests | pytest, pytest-asyncio |
| Orchestration | Docker Compose |
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
| POST | /v1/chat/completions | OpenAI-compatible chat (routed) |
| GET | /analytics/summary | Total cost, tokens, requests today, running totals |
| GET | /analytics/cost-by-model | Cost and count per model |
| GET | /analytics/cost-by-day?days=30 | Cost per day |
| GET | /analytics/cost-by-project | Cost per project_tag |
| GET | /analytics/cost-savings | Savings vs all sonar-pro |
| GET | /analytics/requests?limit=&offset= | Paginated request list |
| GET | /analytics/requests/:request_id | Full request detail |
| GET | /admin/models | List valid models |
| GET | /admin/pricing | Current pricing config |
| POST | /admin/pricing | Update pricing (body: pricing dict) |
| GET | /metrics | Prometheus metrics |
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is Python?"}],"stream":false}'{
"id": "uuid",
"model": "sonar",
"choices": [{"message": {"role": "assistant", "content": "..."}}],
"usage": {"prompt_tokens": 45, "completion_tokens": 120, "total_tokens": 165},
"cost": {
"prompt_cost_usd": 0.000045,
"completion_cost_usd": 0.00012,
"total_cost_usd": 0.000165,
"routing_tier": "simple",
"routing_reason": "keyword:simple"
}
}React app on port 3000: Dashboard (summary cards, cost by day, cost by model), Requests (table and detail), Models, Projects, Cost Savings. Grafana on port 3001 with a pre-provisioned Prometheus datasource and an LLM Cost Overview dashboard (total cost, cost per day, requests/min, tokens by model, latency, cost by tier).
You need Docker and Docker Compose. Copy .env.example to .env and set PPLX_API_KEY for real LLM calls.
# Start all services (backend, frontend, postgres, prometheus, grafana)
make up
# Optional: seed 100 synthetic rows for the dashboard
make seed
# Optional: send 5 real prompts and print routing and cost (requires PPLX_API_KEY)
make demo
# Run tests (23 tests)
make testURLs: React dashboard at http://localhost:3000, Grafana at http://localhost:3001 (admin / admin). Health check: curl -s http://localhost:8000/health.
Other commands: make status (list containers), make down (stop and remove volumes), make summary (analytics summary), make savings (cost savings). Use make clean-data to truncate stored requests so only new data appears.
The router adds a small overhead (classification, DB write, metrics). Most of the latency comes from the LLM provider. The middleware usually adds under 50 ms.
- Auth: Add API keys or JWT for
/v1/chat/completionsand/analytics//admin. - Multi-tenant: Use
project_tagand filter analytics by tenant. - Caching: The
cachedfield is a placeholder. Add response caching to cut cost and latency. - Budget alerts: Use Prometheus/Grafana alerts or a cron that checks
GET /analytics/summaryand notifies when thresholds are exceeded.
.
├── LICENSE
├── Makefile
├── README.md
├── SECURITY.md
├── backend
│ ├── Dockerfile
│ ├── core.py
│ ├── main.py
│ ├── metrics
│ │ └── prometheus.py
│ ├── models
│ │ ├── database.py
│ │ └── schemas.py
│ ├── requirements.txt
│ ├── routers
│ │ ├── admin.py
│ │ ├── analytics.py
│ │ ├── chat.py
│ │ └── health.py
│ ├── scripts
│ │ ├── clear_data.py
│ │ ├── demo.py
│ │ ├── migrate_add_reasoning.py
│ │ └── seed_data.py
│ └── services
│ ├── providers
│ │ ├── base.py
│ │ └── perplexity.py
│ └── router_service.py
├── docker-compose.yml
├── docs
│ └── screenshots
│ ├── grafana-dashboard.png
│ ├── make-demo.png
│ ├── make-test.png
│ ├── react-cost-savings.png
│ ├── react-dashboard.png
│ ├── react-models.png
│ └── react-requests.png
├── frontend
│ ├── Dockerfile
│ ├── index.html
│ ├── package.json
│ ├── src
│ │ ├── App.jsx
│ │ ├── api.js
│ │ ├── components
│ │ │ ├── CostSavings.jsx
│ │ │ ├── Dashboard.jsx
│ │ │ ├── ModelStats.jsx
│ │ │ ├── ProjectStats.jsx
│ │ │ ├── RequestDetail.jsx
│ │ │ └── RequestsTable.jsx
│ │ ├── index.css
│ │ └── main.jsx
│ ├── tailwind.config.js
│ └── vite.config.js
├── grafana
│ ├── dashboards
│ │ └── llm-cost-overview.json
│ └── provisioning
│ ├── dashboards
│ │ └── dashboard.yml
│ └── datasources
│ └── datasources.yml
├── prometheus
│ └── prometheus.yml
├── scripts
│ ├── demo.py
│ └── seed_data.py
└── tests
├── conftest.py
├── test_analytics.py
├── test_chat.py
├── test_health.py
└── test_router.py
| Variable | Description |
|---|---|
| DATABASE_URL | PostgreSQL URL with asyncpg (default in compose: postgresql+asyncpg://user:pass@postgres:5432/llmrouter) |
| PPLX_API_KEY | Perplexity API key (required for real calls) |
| PPLX_BASE_URL | Perplexity API base (default: https://api.perplexity.ai) |
This project is for learning and portfolio use. It is not production-ready. Use at your own risk. Pricing and model names follow the Perplexity reference (March 2026). Update config for your provider. See SECURITY.md.
License: MIT.






