LLM Router

A small middleware layer that sits between your app and an LLM provider. It routes each request to a Perplexity model based on prompt complexity (simple, medium, complex) and records every call in PostgreSQL: tokens, cost, latency, model, prompt, and response. One place to see what you spend and where.

This repo uses Perplexity as the reference provider. The design is provider-agnostic so you can plug in another API by implementing one interface and updating config.

Problem statement

Without a central layer you rarely know how much you spend per model or per day. Sending every request to the most capable model is wasteful. Here, simple queries go to a lighter model and complex ones to a research model. All usage is stored so you get a single view of cost and volume.

Provider and models Perplexity models used: sonar (simple), sonar-pro (medium), sonar-deep-research (complex). Pricing is configurable in the backend. The provider interface lives in backend/services/providers/ so you can add another API without changing tracking or analytics.

Architecture

flowchart TB
  subgraph Client
    app[App Dashboard]
  end
  subgraph Router[LLM Router]
    classify[Classify prompt]
    choose[Choose model]
    store[Store request]
  end
  app --> classify
  classify --> choose
  choose --> store
  store --> app
  choose --> pplx[Perplexity API]
  store --> pg[(PostgreSQL)]
  store --> prom[Prometheus]
  prom --> grafana[Grafana]

Routing logic

Each prompt is classified into simple, medium, or complex:

Token count (estimate: len(prompt.split())):
- < 50 tokens → lean simple
- 50–200 tokens → lean medium
- > 200 tokens → lean complex
Keyword signals (case-insensitive):
- Complex: "research", "analyze", "comprehensive", "detailed analysis", "in-depth", "investigate", "evaluate", "architecture", "explain in detail"
- Simple: "what is", "define", "hello", "hi", "thanks", "yes", "no", "list"
- Medium: "explain", "describe", "steps", "how would", "how do", "pros and cons", "compare", "structure", "deploy", and similar
Final tier: Keyword match overrides token count when a strong signal is present.
Route:
- simple → sonar
- medium → sonar-pro
- complex → sonar-deep-research
Override: If the request includes {"model_override": "sonar-pro"}, that model is used regardless of routing.

Tier	Model	Use case
simple	sonar	Simple queries
medium	sonar-pro	Reasoning tasks
complex	sonar-deep-research	Research/analysis

Cost tracking

Every request records:

Field	Description
request_id	UUID
timestamp	UTC
prompt / response	Full text
model_requested / model_used	Routed model / actual model
prompt_tokens, completion_tokens, total_tokens	From API usage
prompt_cost_usd, completion_cost_usd, total_cost_usd	Calculated from config pricing
citation_tokens, search_queries_count	Deep-research only (else 0)
citation_cost_usd, search_cost_usd	Deep-research only
reasoning_tokens, reasoning_cost_usd	From API when present (e.g. reasoning-pro), $3/1M
latency_ms	Wall clock request→response
routing_tier, routing_reason	simple/medium/complex + short reason
project_tag	Optional multi-project tag
cached, provider, status, error_message	Placeholder / provider / success or error

Cost savings

Actual cost: Sum of total_cost_usd for all requests (using routed model pricing).
If all sonar-pro: For the same token counts, cost if every request had used sonar-pro pricing.
Savings = (if all sonar-pro) − (actual). Exposed at GET /analytics/cost-savings.

Cost estimate vs Perplexity billing

Dashboard and demo totals are estimates from our configurable pricing (input, output, citation, search). They do not include reasoning tokens. Perplexity may bill deep-research under a different product name and charge for reasoning separately, so your real invoice can be higher. Use the router for relative comparison (savings, trends) and check Perplexity’s billing for actual spend.

Adding a new provider

Implement the abstract Provider in backend/services/providers/base.py: implement call(model, messages) returning content, usage, and optional citation_tokens / search_queries_count.
Add a new file under backend/services/providers/ (e.g. openai.py) and implement the interface.
Update config (e.g. model list and pricing) and wire the new provider in the chat router (e.g. via env or config key).
No change to tracking or analytics — they remain provider-agnostic.

Tech stack

Layer	Technology
Backend	FastAPI, uvicorn
DB	PostgreSQL 16, SQLAlchemy 2 async, asyncpg
HTTP	httpx
Metrics	Prometheus, prometheus-client
Dashboards	Grafana (pre-provisioned)
Frontend	React 18, Vite, Tailwind, recharts
Tests	pytest, pytest-asyncio
Orchestration	Docker Compose

API reference

Method	Endpoint	Description
GET	/health	Health check
POST	/v1/chat/completions	OpenAI-compatible chat (routed)
GET	/analytics/summary	Total cost, tokens, requests today, running totals
GET	/analytics/cost-by-model	Cost and count per model
GET	/analytics/cost-by-day?days=30	Cost per day
GET	/analytics/cost-by-project	Cost per project_tag
GET	/analytics/cost-savings	Savings vs all sonar-pro
GET	/analytics/requests?limit=&offset=	Paginated request list
GET	/analytics/requests/:request_id	Full request detail
GET	/admin/models	List valid models
GET	/admin/pricing	Current pricing config
POST	/admin/pricing	Update pricing (body: pricing dict)
GET	/metrics	Prometheus metrics

Sample request/response

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is Python?"}],"stream":false}'

{
  "id": "uuid",
  "model": "sonar",
  "choices": [{"message": {"role": "assistant", "content": "..."}}],
  "usage": {"prompt_tokens": 45, "completion_tokens": 120, "total_tokens": 165},
  "cost": {
    "prompt_cost_usd": 0.000045,
    "completion_cost_usd": 0.00012,
    "total_cost_usd": 0.000165,
    "routing_tier": "simple",
    "routing_reason": "keyword:simple"
  }
}

Dashboard

React app on port 3000: Dashboard (summary cards, cost by day, cost by model), Requests (table and detail), Models, Projects, Cost Savings. Grafana on port 3001 with a pre-provisioned Prometheus datasource and an LLM Cost Overview dashboard (total cost, cost per day, requests/min, tokens by model, latency, cost by tier).

How to run

You need Docker and Docker Compose. Copy .env.example to .env and set PPLX_API_KEY for real LLM calls.

# Start all services (backend, frontend, postgres, prometheus, grafana)
make up

# Optional: seed 100 synthetic rows for the dashboard
make seed

# Optional: send 5 real prompts and print routing and cost (requires PPLX_API_KEY)
make demo

# Run tests (23 tests)
make test

URLs: React dashboard at http://localhost:3000, Grafana at http://localhost:3001 (admin / admin). Health check: curl -s http://localhost:8000/health.

Other commands: make status (list containers), make down (stop and remove volumes), make summary (analytics summary), make savings (cost savings). Use make clean-data to truncate stored requests so only new data appears.

React dashboard

Requests table

Grafana LLM Cost Overview

Demo output

Tests passing

Models tab

Cost Savings tab

Performance

The router adds a small overhead (classification, DB write, metrics). Most of the latency comes from the LLM provider. The middleware usually adds under 50 ms.

Production considerations

Auth: Add API keys or JWT for /v1/chat/completions and /analytics//admin.
Multi-tenant: Use project_tag and filter analytics by tenant.
Caching: The cached field is a placeholder. Add response caching to cut cost and latency.
Budget alerts: Use Prometheus/Grafana alerts or a cron that checks GET /analytics/summary and notifies when thresholds are exceeded.

Project structure

.
├── LICENSE
├── Makefile
├── README.md
├── SECURITY.md
├── backend
│   ├── Dockerfile
│   ├── core.py
│   ├── main.py
│   ├── metrics
│   │   └── prometheus.py
│   ├── models
│   │   ├── database.py
│   │   └── schemas.py
│   ├── requirements.txt
│   ├── routers
│   │   ├── admin.py
│   │   ├── analytics.py
│   │   ├── chat.py
│   │   └── health.py
│   ├── scripts
│   │   ├── clear_data.py
│   │   ├── demo.py
│   │   ├── migrate_add_reasoning.py
│   │   └── seed_data.py
│   └── services
│       ├── providers
│       │   ├── base.py
│       │   └── perplexity.py
│       └── router_service.py
├── docker-compose.yml
├── docs
│   └── screenshots
│       ├── grafana-dashboard.png
│       ├── make-demo.png
│       ├── make-test.png
│       ├── react-cost-savings.png
│       ├── react-dashboard.png
│       ├── react-models.png
│       └── react-requests.png
├── frontend
│   ├── Dockerfile
│   ├── index.html
│   ├── package.json
│   ├── src
│   │   ├── App.jsx
│   │   ├── api.js
│   │   ├── components
│   │   │   ├── CostSavings.jsx
│   │   │   ├── Dashboard.jsx
│   │   │   ├── ModelStats.jsx
│   │   │   ├── ProjectStats.jsx
│   │   │   ├── RequestDetail.jsx
│   │   │   └── RequestsTable.jsx
│   │   ├── index.css
│   │   └── main.jsx
│   ├── tailwind.config.js
│   └── vite.config.js
├── grafana
│   ├── dashboards
│   │   └── llm-cost-overview.json
│   └── provisioning
│       ├── dashboards
│       │   └── dashboard.yml
│       └── datasources
│           └── datasources.yml
├── prometheus
│   └── prometheus.yml
├── scripts
│   ├── demo.py
│   └── seed_data.py
└── tests
    ├── conftest.py
    ├── test_analytics.py
    ├── test_chat.py
    ├── test_health.py
    └── test_router.py

Environment variables

Variable	Description
DATABASE_URL	PostgreSQL URL with asyncpg (default in compose: postgresql+asyncpg://user:pass@postgres:5432/llmrouter)
PPLX_API_KEY	Perplexity API key (required for real calls)
PPLX_BASE_URL	Perplexity API base (default: https://api.perplexity.ai)

This project is for learning and portfolio use. It is not production-ready. Use at your own risk. Pricing and model names follow the Perplexity reference (March 2026). Update config for your provider. See SECURITY.md.

License: MIT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Router

Problem statement

Architecture

Routing logic

Cost tracking

Cost savings

Cost estimate vs Perplexity billing

Adding a new provider

Tech stack

API reference

Sample request/response

Dashboard

How to run

React dashboard

Requests table

Grafana LLM Cost Overview

Demo output

Tests passing

Models tab

Cost Savings tab

Performance

Production considerations

Project structure

Environment variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
grafana		grafana
prometheus		prometheus
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

LLM Router

Problem statement

Architecture

Routing logic

Cost tracking

Cost savings

Cost estimate vs Perplexity billing

Adding a new provider

Tech stack

API reference

Sample request/response

Dashboard

How to run

React dashboard

Requests table

Grafana LLM Cost Overview

Demo output

Tests passing

Models tab

Cost Savings tab

Performance

Production considerations

Project structure

Environment variables

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages