Unified AI API Gateway — Manage all your LLM providers in one place.
One API key → OpenAI, Anthropic, Google, DeepSeek, Qwen, and more.
中文文档 · Quick Start · Features · Architecture · Docs · License
Managing multiple AI API providers is painful — scattered keys, no visibility into costs, no failover when a provider goes down. WebRouter gives you a single control plane for all your LLM traffic.
- Tired of hardcoding provider URLs? → One gateway endpoint, auto-routed to the best provider
- Worried about provider outages? → Automatic health checks, cooldowns, and failover
- No idea how much you're spending? → Per-model cost tracking, quotas, and billing reports
- Sharing API keys across the team? → Token management with per-member quotas and model whitelists
Set model: "auto" and WebRouter picks the optimal model based on request complexity — simple queries get fast/cheap models, complex reasoning gets powerful ones.
Automatic health checks with latency tracking. Dead providers enter cooldown; traffic shifts to healthy alternatives — no manual intervention needed.
Real-time cost accounting per model, per token, per team. Billing reports, quota management, and budget alerts built in.
Built-in desensitization engine strips PII (phone numbers, ID cards, emails) from requests before they reach upstream providers.
Invite team members, assign quotas, restrict model access. Each member gets a unique API key with scoped permissions.
The wr-proxy Go gateway handles request forwarding, retry with backoff, streaming, and metering — all with minimal latency overhead.
| Type | Description | Health | Latency | Cost |
|---|---|---|---|---|
direct |
Official APIs (OpenAI, Anthropic, Google...) | ✅ | ✅ | — |
aggregate |
Aggregator platforms (OhMyGPT, API2D...) | ✅ | ✅ | Manual |
litellm |
LiteLLM proxy | ✅ | ✅ | — |
custom |
Any OpenAI-compatible gateway | ✅ | ✅ | — |
Clients can use @recall or X-Recall-Session header to automatically recover and inject conversation history from the server — no manual context management needed.
Built-in enterprise-grade retrieval-augmented generation. Auto-captures conversations, extracts structured knowledge via LLM, and injects relevant context into every request.
Token compression, session compression, and dynamic content reordering reduce upstream token consumption and improve prompt cache hit rates automatically.
One-click export of environment variables and config for Claude Code, Codex, Cursor, Continue, and more.
Don't want to install anything? Try the live demo at demo.webrouter.tech (login: admin / admin123456).
- Python 3.8+
- Go 1.21+ (only if building wr-proxy from source; pre-built binaries included)
- 2 GB+ RAM
git clone https://github.com/<org>/webrouter.git
cd webrouter
bash deploy/install.shThe install script auto-detects your OS and architecture, sets up a virtual environment, installs dependencies, and starts both services.
open http://localhost:5050
# Default login: admin / admin123- Go to Providers → + Add
- Select type
direct, paste your OpenAI base URL and API key - Click 🔍 Check to verify connectivity
- Your gateway is ready at
http://localhost:5051/v1/chat/completions
cd webrouter
docker compose -f deploy/docker-compose.yml up -dFull documentation is available at webrouter.tech/docs/:
| Guide | Topics |
|---|---|
| Getting Started | Quick Start, Installation, Deployment |
| Core Concepts | Architecture, Providers, Tokens & Teams |
| Smart Routing | Auto model selection, fallback strategies |
| Memory & Knowledge | Session Recall, Knowledge Base & RAG |
| Operations | Monitoring, Alerting, Billing, Desensitization |
| API Reference | Full API documentation |
┌─────────────┐ HTTP ┌─────────────────┐
│ Browser/CLI │ ───────────→ │ WebRouter │
│ │ ←──────────── │ (Flask) │
└─────────────┘ │ :5050 │
└──────┬──────────┘
│
┌──────▼──────┐
│ wr-proxy │
│ (Go) :5051 │
└──────┬──────┘
│
┌───────────────┼───────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ direct │ │ aggregate │ │ custom │
│ (Official) │ │ (Aggregator)│ │ (Gateway) │
└─────────────┘ └─────────────┘ └─────────────┘
| Component | Stack | Role |
|---|---|---|
| WebRouter (backend) | Python Flask | Admin panel, REST API, database models, scheduler |
| wr-proxy | Go 1.22 | High-performance API proxy: routing, retry, desensitization, metering |
Both components share a SQLite database (MySQL/PostgreSQL also supported) for configuration and request logs.
webrouter/
├── backend/ # Flask backend
│ ├── app.py # Application factory
│ ├── config.py # Configuration
│ ├── models/ # Database models
│ ├── routes/ # 12 API blueprints (/api/*)
│ ├── services/ # Business logic
│ ├── static/ # Frontend SPA
│ │ ├── index.html
│ │ ├── js/ # 21 page modules
│ │ ├── css/
│ │ └── i18n/ # en.json, zh-CN.json
│ └── start.py # Process manager
├── wr-proxy/ # Go proxy gateway
│ ├── main.go
│ ├── proxy.go # HTTP forwarding
│ ├── smart_model.go # Smart routing
│ ├── retry.go # Retry with backoff
│ ├── desensitize.go # PII stripping
│ ├── meter.go # Cost tracking
│ └── ...
├── deploy/ # Deployment configs
│ ├── install.sh
│ ├── Dockerfile
│ ├── docker-compose.yml
│ └── nginx.conf
├── docs/ # Documentation
├── data/ # Runtime data
└── .env # Environment config (auto-generated)
All settings are managed via the .env file (auto-generated on first install):
| Variable | Description | Default |
|---|---|---|
SESSION_SECRET |
Flask session key | Auto-generated |
DATABASE_URI |
Database connection string | SQLite |
REDIS_URL |
Redis connection (optional, for caching) | — |
FLASK_ENV |
Runtime environment | production |
FLASK_HOST |
Listen address | 0.0.0.0 |
FLASK_PORT |
Flask port | 5050 |
WR_PROXY_PORT |
wr-proxy port | 5051 |
ENABLE_SCHEDULER |
Run health checks & alerts on schedule | 0 (off in debug) |
python3 backend/start.py start # Start all services
python3 backend/start.py stop # Stop all services
python3 backend/start.py restart # Restart
python3 backend/start.py status # Check status
python3 backend/start.py logs # Tail logsOr use the generated shell scripts:
./start.sh # Start
./stop.sh # Stop- Plugin SDK — Extensible plugin interface for EE modules
- SSO / SAML / OIDC — Enterprise single sign-on
- Audit logging — Tamper-proof operation audit trail
- Cluster mode — Multi-instance with shared state
- Cloud hosted version — Zero-ops managed service
- Advanced routing DSL — Custom routing rules by department, project, or tag
See LICENSING.md for the Community vs Enterprise edition feature matrix.
We welcome contributions! Before submitting a PR, please:
- Sign the Contributor License Agreement (CLA) — this grants us the right to re-license the project in the future
- Follow the existing code style
- Test your changes locally
See CONTRIBUTING.md for full guidelines.
WebRouter is available in two editions. See the full comparison on our website.
| Feature | Community | Enterprise |
|---|---|---|
| Price | Free | Custom |
| Max concurrent | 50 | Customizable |
| SSO / SAML / OIDC | — | ✅ |
| Cluster mode | — | ✅ |
| Audit logging | Basic | Custom rules |
| Knowledge Base & RAG | Basic | Advanced |
| License | BSL 1.1 → Apache 2.0 (2029) | Proprietary EULA |
See LICENSE for the full text and LICENSING.md for the dual-edition strategy.
WebRouter is built with:
- Flask — Python web framework
- modernc.org/sqlite — Pure-Go SQLite (no CGO)
- APScheduler — Job scheduling
- Font Awesome — Icons
One gateway. All AI APIs.




