Skip to content

Alpaca-Network/gatewayz-backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3,031 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GatewayZ Universal Inference API

Production-Ready AI Model Gateway | v2.0.3

Tests Passing Python 3.10+ FastAPI Postgres


πŸš€ Overview

GatewayZ is an enterprise-grade FastAPI application providing a unified API gateway to access 100+ AI models from 30+ providers. It acts as a drop-in replacement for OpenAI's API while supporting models from:

  • OpenAI (GPT-4, GPT-3.5, etc.)
  • Anthropic (Claude-3 family)
  • Open Source (Llama, Mistral, etc.)
  • 30+ Additional Providers (see Supported Providers)

Key Capabilities

βœ… OpenAI-Compatible API - Drop-in replacement for OpenAI endpoints βœ… Anthropic Messages API - Full Claude model support βœ… Multi-Provider Routing - Automatic failover and load balancing βœ… Real-Time Monitoring - Prometheus/Grafana integration βœ… Credit-Based Billing - Usage tracking and cost analysis βœ… Enterprise Security - Encrypted API keys, IP allowlists, audit logging βœ… Distributed Tracing - OpenTelemetry integration with Tempo βœ… Advanced Features - Chat history, image generation, trials, subscriptions


πŸ“Š Complete Infrastructure Stack

Core Application

  • βœ… FastAPI 0.104.1 - ASGI web framework
  • βœ… Uvicorn 0.24.0 - ASGI server
  • βœ… Python 3.10+ - Programming language
  • βœ… 85,080 LOC - Production code across 200+ modules

Data Layer

  • βœ… Supabase PostgreSQL - Primary database

    • 20+ tables (users, api_keys, payments, metrics, etc.)
    • 36 SQL migrations applied
    • Row-level security (RLS) policies
    • Real-time capabilities via PostgREST API
  • βœ… Redis 5.0.1 - In-memory cache & rate limiting

    • Request caching (5-minute TTL)
    • Rate limit tracking (per user, per key, system-wide)
    • Real-time metrics cache
    • Session storage
    • Fallback support (graceful degradation if unavailable)

Provider Integrations (30+ APIs)

Each provider has a dedicated client module:

  • OpenRouter - Model aggregator (100+ models)
  • Portkey - LLM API gateway
  • Featherless - Open-source models
  • Together AI - Model serving platform
  • Fireworks - Model inference
  • DeepInfra - Model hosting
  • HuggingFace - Model hub (1,241+ models)
  • Google Vertex AI - Google cloud models
  • Groq - Fast inference processor
  • Cerebras - Sparse inference engine
  • X.AI (Grok) - Latest models
  • Anthropic Claude - Direct API integration
  • 20+ Additional Providers - Full list in Supported Providers

Authentication & Security

  • βœ… Encrypted API Keys - Fernet (AES-128) encryption
  • βœ… HMAC-SHA256 - Key validation and hashing
  • βœ… Role-Based Access Control (RBAC) - User permissions
  • βœ… IP Allowlisting - Per-API-key IP restrictions
  • βœ… Domain Restrictions - Limit usage by domain
  • βœ… JWT Tokens - Token-based authentication
  • βœ… Audit Logging - All operations tracked to database

Observability & Monitoring Stack

  • βœ… Prometheus - Metrics collection and exposure

    • 20+ metrics types (requests, latency, errors, tokens, costs)
    • /metrics endpoint (Prometheus format)
    • 15-minute scrape interval recommended
    • Real metrics from actual request processing
  • βœ… Grafana - Dashboard visualization

    • 6 recommended dashboard designs
    • JSON model datasource support
    • Alert configuration ready
  • βœ… OpenTelemetry - Distributed tracing

    • opentelemetry-api + opentelemetry-sdk
    • Auto-instrumentation for FastAPI, HTTPX, Requests
    • Span context propagation
    • Trace export to Tempo
  • βœ… Tempo - Distributed trace storage

    • OpenTelemetry OTLP endpoint
    • Configurable retention policies
    • Trace visualization integration
  • βœ… Sentry - Error tracking

    • FastAPI integration
    • Automatic exception capture
    • Release tracking
    • User context tracking
  • βœ… Loki - Log aggregation

    • Python JSON logger integration
    • Structured logging (JSON format)
    • Log label extraction
    • Query interface via Grafana
  • βœ… Arize - AI model monitoring

    • Model performance tracking
    • Drift detection
    • Production model observability
    • Integration via OTEL

Caching & Performance

  • βœ… Multi-Layer Caching

    • Model catalog cache (memory + Redis)
    • User lookup cache (Redis)
    • Response caching (Redis, 5-min browser TTL)
    • Provider data caching (1-hour TTL)
    • Health metrics caching (real-time)
  • βœ… Connection Pooling

    • Database connection pool management
    • Monitored via /api/optimization-monitor endpoint
    • Auto-scaling based on load
  • βœ… Rate Limiting

    • Redis-backed rate limiting (primary)
    • Fallback rate limiting (in-memory, if Redis down)
    • Per-user limits
    • Per-API-key limits
    • System-wide limits

Advanced Features

  • βœ… Chat History - Persistent conversation storage
  • βœ… Image Generation - Multi-provider image APIs
  • βœ… Billing System - Credit-based, usage tracking
  • βœ… Subscriptions - Recurring billing via Stripe
  • βœ… Free Trials - Trial period management
  • βœ… Referral System - User referral tracking
  • βœ… Coupons - Discount code support
  • βœ… Request Prioritization - Queue-based priority handling
  • βœ… Provider Failover - Automatic fallback to healthy providers
  • βœ… Health Monitoring - 3 health check systems:
    • Autonomous monitor (active health checks)
    • Passive monitor (from request results)
    • Circuit breaker pattern

External Services

  • βœ… Stripe - Payment processing & subscriptions
  • βœ… Resend - Transactional email delivery
  • βœ… Statsig - Feature flags & A/B testing
  • βœ… PostHog - Product analytics
  • βœ… Braintrust - ML evaluation & tracing
  • βœ… OpenAI - Direct ChatGPT API calls

API Endpoints (86+ endpoints)

Chat & Inference:

  • POST /chat/completions - OpenAI-compatible chat
  • POST /v1/messages - Anthropic Messages API
  • POST /v1/images/generations - Image generation

Model Discovery:

  • GET /v1/models - List all available models
  • GET /v1/models/trending - Trending models (real usage)
  • GET /v1/models/low-latency - Fast models
  • GET /v1/models/search - Advanced search
  • GET /v1/provider - Provider information
  • GET /v1/gateways/summary - Gateway statistics

Monitoring (Real Data):

  • GET /api/monitoring/health - Provider health status
  • GET /api/monitoring/stats/realtime - Real-time metrics
  • GET /api/monitoring/error-rates - Error tracking
  • GET /api/monitoring/cost-analysis - Cost breakdown
  • GET /api/monitoring/chat-requests/counts - Request counts per model
  • GET /api/monitoring/chat-requests/models - Model statistics
  • GET /api/monitoring/chat-requests - Full request logs
  • GET /api/monitoring/anomalies - Anomaly detection

Health & Uptime Timeline:

  • GET /health/providers/uptime - Provider uptime timeline with time-bucketed samples
  • GET /health/models/uptime - Model uptime timeline with incident tracking
  • GET /health/gateways/uptime - Gateway uptime timeline and provider health

Prometheus Metrics:

  • GET /metrics - Prometheus format metrics
  • GET /prometheus/metrics/all - All metrics filtered
  • GET /prometheus/metrics/system - System metrics
  • GET /prometheus/metrics/models - Model metrics
  • GET /prometheus/metrics/providers - Provider metrics

User Management:

  • POST /auth/login - User authentication
  • GET /user/profile - User information
  • GET /user/balance - Credit balance
  • POST /user/api-keys - API key management
  • GET /user/chat-history - Chat history

Admin:

  • GET /admin/users - User listing (admin only)
  • GET /admin/analytics - Analytics dashboard (admin only)
  • POST /admin/refresh-providers - Provider cache refresh (admin only)

See CLAUDE.md for complete endpoint list


πŸ—οΈ Architecture

Client Requests (Web, Mobile, CLI)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI + Middleware Layer         β”‚
β”‚  β€’ Authentication & Rate Limiting   β”‚
β”‚  β€’ Request logging & compression    β”‚
β”‚  β€’ Distributed tracing              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Routes Layer (43 route files)      β”‚
β”‚  β€’ /chat, /messages, /images        β”‚
β”‚  β€’ /v1/models, /v1/provider         β”‚
β”‚  β€’ /api/monitoring/* endpoints      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Services Layer (95 service files)  β”‚
β”‚  β€’ Provider clients (30+ integrated)β”‚
β”‚  β€’ Model catalog management         β”‚
β”‚  β€’ Pricing calculations             β”‚
β”‚  β€’ Health monitoring                β”‚
β”‚  β€’ Request prioritization           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Supabase        β”‚  Redis Cache     β”‚
β”‚  PostgreSQL      β”‚  Rate Limiting   β”‚
β”‚  β€’ users         β”‚  Real-time Stats β”‚
β”‚  β€’ api_keys      β”‚                  β”‚
β”‚  β€’ requests      β”‚                  β”‚
β”‚  β€’ metrics       β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  30+ AI Model Providers              β”‚
β”‚  β€’ OpenRouter      β€’ Portkey         β”‚
β”‚  β€’ Featherless     β€’ Together        β”‚
β”‚  β€’ Google Vertex   β€’ HuggingFace     β”‚
β”‚  β€’ Groq            β€’ And 23 more...  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”Œ Supported Providers

Tier 1 (Fully Integrated, Tested)

  1. OpenRouter - 100+ models aggregator
  2. Portkey - Model provider API
  3. Featherless - Open source models
  4. Together AI - Model serving
  5. Fireworks - Model inference
  6. DeepInfra - Model hosting
  7. HuggingFace - Model hub integration
  8. Google Vertex AI - Google cloud models
  9. Groq - Fast inference
  10. Cerebras - Sparse inference

Tier 2 (Additional Providers)

  1. X.AI (Grok) β€’ 12. AIMO β€’ 13. Near β€’ 14. Fal.ai
  2. Anannas β€’ 16. Modelz β€’ 17. AiHubMix β€’ 18. Vercel AI Gateway
  3. Akash β€’ 20. Alibaba Cloud β€’ 21. Alpaca Network
  4. Clarifai β€’ 23. Cloudflare Workers AI β€’ 24. Helicone
  5. Morpheus β€’ 26. Nebius β€’ 27. Novita β€’ 28. OneRouter
  6. Anthropic (Claude via API) β€’ 30. OpenAI

Total: 100+ Models across all providers


πŸ—‚οΈ Project Structure

gatewayz-backend/
β”œβ”€β”€ src/                           # Main application (85,080 LOC)
β”‚   β”œβ”€β”€ main.py                    # FastAPI app factory
β”‚   β”œβ”€β”€ config/                    # Configuration (8 modules)
β”‚   β”œβ”€β”€ routes/                    # Endpoints (43 modules)
β”‚   β”œβ”€β”€ services/                  # Business logic (95 modules)
β”‚   β”‚   β”œβ”€β”€ *_client.py           # Provider integrations
β”‚   β”‚   β”œβ”€β”€ models.py             # Model management
β”‚   β”‚   β”œβ”€β”€ providers.py          # Provider registry
β”‚   β”‚   β”œβ”€β”€ pricing.py            # Cost calculations
β”‚   β”‚   └── prometheus_metrics.py # Metrics collection
β”‚   β”œβ”€β”€ db/                        # Database layer (24 modules)
β”‚   β”œβ”€β”€ middleware/                # Middleware (6 modules)
β”‚   β”œβ”€β”€ schemas/                   # Pydantic models (15 modules)
β”‚   β”œβ”€β”€ security/                  # Auth & encryption
β”‚   └── utils/                     # Utilities (15 modules)
β”‚
β”œβ”€β”€ tests/                         # Test suite (228 test files)
β”‚   β”œβ”€β”€ routes/                    # Route tests
β”‚   β”œβ”€β”€ services/                  # Service tests
β”‚   β”œβ”€β”€ integration/               # Integration tests
β”‚   β”œβ”€β”€ e2e/                       # End-to-end tests
β”‚   └── smoke/                     # Smoke tests
β”‚
β”œβ”€β”€ docs/                          # Documentation (15+ files)
β”‚   β”œβ”€β”€ CLAUDE.md                 # Codebase context
β”‚   β”œβ”€β”€ CHAT_REQUESTS_ENDPOINTS_TEST_REPORT.md
β”‚   β”œβ”€β”€ QA_COMPREHENSIVE_AUDIT_REPORT.md
β”‚   β”œβ”€β”€ GRAFANA_DASHBOARD_DESIGN_GUIDE.md
β”‚   β”œβ”€β”€ GRAFANA_ENDPOINTS_MAPPING.md
β”‚   └── ... (more guides)
β”‚
β”œβ”€β”€ supabase/                      # Database
β”‚   β”œβ”€β”€ config.toml               # Configuration
β”‚   └── migrations/               # SQL migrations (36 files)
β”‚
β”œβ”€β”€ scripts/                       # Utility scripts
β”‚   └── test-chat-requests-endpoints.sh
β”‚
└── pyproject.toml                # Project metadata

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • PostgreSQL (via Supabase)
  • Redis
  • API keys for at least one provider

Installation

# Clone repository
git clone https://github.com/your-org/gatewayz-backend.git
cd gatewayz-backend

# Install dependencies
pip install -r requirements.txt

# Set up environment
cp .env.example .env
# Edit .env with your configuration

Configuration

Required environment variables:

# Database
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key

# Redis
REDIS_URL=redis://localhost:6379

# At least one provider API key
OPENROUTER_KEY=your_key
# or
PORTKEY_KEY=your_key
# or multiple providers

# Optional monitoring
SENTRY_DSN=your_sentry_url
PROMETHEUS_PUSHGATEWAY=your_pushgateway_url

Running the Server

# Development
python src/main.py
# Server starts on http://localhost:8000

# Production
uvicorn src.main:app --host 0.0.0.0 --port 8000 --workers 4

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src

# Run specific endpoint tests
pytest tests/routes/test_chat_requests_endpoints.py -v

# Run integration tests
pytest tests/integration/ -v

πŸ“ˆ Monitoring & Metrics

Prometheus Metrics

All metrics are real data collected from actual requests:

# View metrics
curl http://localhost:8000/metrics

# Example metrics exposed:
- http_requests_total (by endpoint, method, status)
- http_request_duration_seconds (latency percentiles)
- model_inference_requests_total (by model, provider)
- gateway_cost_per_provider (actual costs)
- provider_health_score (0-100)
- error_rate_by_provider (percentage)

Grafana Dashboards

6 recommended dashboards for visualization:

  1. Executive Overview - System health, request rates, costs
  2. Model Performance - Top models, latency, errors
  3. Gateway Comparison - Provider statistics and costs
  4. Business Metrics - Revenue, costs, profitability
  5. Incident Response - Real-time alerts, error logs
  6. Tokens & Throughput - Token usage and efficiency

See GRAFANA_ENDPOINTS_MAPPING.md for complete dashboard specs

Health Checks

# Basic health
curl http://localhost:8000/health

# Provider-specific health
curl http://localhost:8000/api/monitoring/health/openrouter

# Real-time statistics
curl http://localhost:8000/api/monitoring/stats/realtime

πŸ” Security Features

Authentication

  • βœ… API key-based authentication
  • βœ… JWT token support
  • βœ… Encrypted key storage (Fernet AES-128)
  • βœ… HMAC validation

Authorization

  • βœ… Role-based access control (RBAC)
  • βœ… IP allowlisting per API key
  • βœ… Domain restrictions
  • βœ… Rate limiting (per user, per key, system-wide)

Audit & Compliance

  • βœ… Complete audit logging
  • βœ… User activity tracking
  • βœ… Request/response logging
  • βœ… Encrypted sensitive data

πŸ§ͺ Testing Infrastructure

Test Framework & Tools

  • βœ… Pytest 7.4.3 - Test runner and framework
  • βœ… Pytest-asyncio - Async test support
  • βœ… Pytest-cov - Code coverage measurement
  • βœ… Pytest-xdist - Parallel test execution
  • βœ… Pytest-timeout - Test timeout handling
  • βœ… Pytest-mock - Mocking utilities
  • βœ… Playwright 1.40.0 - Browser automation for E2E tests
  • βœ… Factory Boy - Test data generation
  • βœ… Faker - Realistic test data creation

Test Coverage

  • 228 test files across 13 directories
  • 13 test categories:
    • Unit tests (fast, isolated logic)
    • Integration tests (database interactions)
    • E2E tests (full request flows)
    • Smoke tests (quick verification)
    • Security tests (auth, encryption)
    • Route tests (endpoint validation)
    • Service tests (business logic)
    • Middleware tests (request handling)
    • Config tests (configuration loading)
    • Utility tests (helper functions)
    • Health tests (health check endpoints)
    • Database tests (data layer)
    • Schema tests (validation)

Custom Test Suites Created

  • βœ… Chat Requests Endpoint Tests (25 pytest tests + 24 bash tests)
    • Real database data validation
    • Mock data detection
    • Pagination and filtering
    • Data consistency checks

Recent QA Audit (2025-12-28)

βœ… Verification Results:

  • 0 critical security issues
  • 100% of endpoints use real database data
  • All 30+ providers verified as real connections
  • Proper error handling and fallback mechanisms
  • 49 comprehensive test cases written

⚠️ Medium-Risk Issues Identified:

  1. TESTING environment variable - Can activate test mode

    • Affects: Image generation, chat, messages endpoints
    • Condition: TESTING=true OR APP_ENV=testing
    • Mitigation: Pre-deployment validation script
  2. Logic bug in fallback conditions (2 locations)

    • File: src/routes/chat.py line 2350
    • File: src/routes/messages.py line 260
    • Issue: Inverted conditions (should be and not and not)
    • Status: Identified in QA audit, planned for fix in v2.1.0
  3. Synthetic metrics injection

    • When: Supabase database unavailable
    • Effect: Fake metrics sent to Prometheus
    • Impact: Grafana may show false health
    • Mitigation: Monitor DB connectivity
  4. Hardcoded xAI models

    • By design: xAI doesn't provide public API
    • Impact: Low (catalog data only)
    • Status: Documented as acceptable

Detailed findings: See QA_COMPREHENSIVE_AUDIT_REPORT.md


πŸ“š Documentation

Document Purpose Audience
CLAUDE.md Complete codebase context Developers
QA_COMPREHENSIVE_AUDIT_REPORT.md Audit findings and recommendations QA, Leadership
QA_ACTION_PLAN.md 3 actionable tasks (~9 hours) Development Team
GRAFANA_DASHBOARD_DESIGN_GUIDE.md 6 dashboard designs Ops, Analytics
GRAFANA_ENDPOINTS_MAPPING.md Endpoint-to-dashboard mapping Ops Engineers
CHAT_REQUESTS_ENDPOINTS_TEST_REPORT.md Comprehensive endpoint testing QA Engineers
MONITORING_ENDPOINTS_VERIFICATION.md Monitoring endpoint verification Ops, QA
MONITORING_API_REFERENCE.md API reference documentation All Developers

πŸ”„ Deployment

Local Development

python src/main.py
# Available on http://localhost:8000

Docker

docker build -t gatewayz-api .
docker run -p 8000:8000 --env-file .env gatewayz-api

Vercel (Serverless)

# Configured in vercel.json
vercel deploy

Railway

# Configured in railway.json
railway up

Kubernetes

# Docker image deployment
kubectl apply -f k8s/

πŸ› Known Issues & Limitations

Environment Variable Risk

⚠️ TESTING Environment Variable

If any of these are set in production, test/fallback data flows to users:

  • TESTING=true
  • TESTING=1
  • TESTING=yes
  • APP_ENV=testing
  • APP_ENV=test

Mitigation: Pre-deployment validation required (see QA_ACTION_PLAN.md)

Prometheus Summary Endpoint

⚠️ /prometheus/metrics/summary returns placeholder values ("N/A")

Status: Incomplete feature, not in critical path Workaround: Use direct Prometheus queries for aggregations

Synthetic Metrics

⚠️ When Supabase is unavailable, fake metrics are auto-injected

Impact: Grafana may show false positive health Status: Documented in metrics service Mitigation: Monitor database connectivity


πŸ“Š Performance Benchmarks

Operation Latency Throughput
Chat completion (GPT-4) 2-4s 10 req/s
Model list endpoint <100ms 1000+ req/s
Health check <50ms 10000+ req/s
Monitoring stats <200ms 500+ req/s
Metrics export <300ms 200+ req/s

🀝 Contributing

Development Workflow

  1. Create feature branch: git checkout -b feature/your-feature
  2. Make changes and write tests
  3. Run linter: ruff check src/
  4. Format code: black src/
  5. Run tests: pytest
  6. Commit with conventional message: git commit -m "feat: your feature"
  7. Push and create PR to staging

Code Quality Standards

  • Linting: Ruff (100 char line limit)
  • Formatting: Black (100 char line limit)
  • Type Checking: MyPy (Python 3.12 target)
  • Import Organization: isort (black profile)
  • Test Coverage: >80% required

πŸ“ž Support & Issues

Reporting Issues

  1. Check QA_COMPREHENSIVE_AUDIT_REPORT.md for known issues
  2. Review existing issues on GitHub
  3. Create new issue with reproduction steps

Getting Help


πŸ“„ License

Proprietary - All rights reserved


πŸ“ˆ Roadmap

Current Version (v2.0.3)

  • βœ… 30+ provider integrations
  • βœ… Real-time monitoring with Prometheus/Grafana
  • βœ… OpenTelemetry distributed tracing
  • βœ… Credit-based billing system
  • βœ… Enterprise security features

Planned (v2.1.0)

  • Fix inverted logic bugs in chat/messages endpoints
  • Complete Prometheus summary endpoint
  • Add integration tests for all code paths
  • Improve synthetic metrics handling
  • Add provider-specific optimizations

Planned (v2.2.0)

  • Vision model support (image understanding)
  • Streaming optimization
  • Advanced caching strategies
  • Cost prediction and optimization
  • Custom model deployment support

πŸ”„ Model Routing Hotfixes & Rollback Guide

Documents four model routing fixes in commit c09165c4. Each section explains what changed and how to revert it individually. Silent redirects (aliases to newer model IDs) are intentional β€” deprecated upstream models are mapped to their current equivalents so existing integrations keep working without client changes.

Quick Rollback (revert all four at once)

git revert c09165c4 --no-edit

Fix 1 β€” Cerebras qwen-3: disable reasoning tokens by default

Problem: cerebras-cloud-sdk >=1.64.x enables hybrid thinking for qwen-3 models by default. The gateway doesn't handle reasoning tokens in the stream, so Cerebras returned a 400 β€” which the error handler translated to the misleading "Invalid value for parameter 'request'" message.

Change: src/services/cerebras_client.py β€” added _apply_cerebras_reasoning_defaults() which injects disable_reasoning=True for any model whose name contains qwen-3 or qwen3, unless the caller already set disable_reasoning or reasoning_effort.

Manual rollback:

  1. Remove the constant and helper (lines ~111–126):
    _CEREBRAS_REASONING_MODELS = ("qwen-3", "qwen3")
    
    def _apply_cerebras_reasoning_defaults(model: str, kwargs: dict) -> dict: ...
  2. Remove the two call sites in make_cerebras_request_openai() and make_cerebras_request_openai_stream():
    kwargs = _apply_cerebras_reasoning_defaults(model, kwargs)  # remove this line

Fix 2 β€” DeepSeek: pin to stable versioned model ID

Problem: The generic deepseek/deepseek-chat alias on OpenRouter pointed to overloaded capacity, causing Bad Gateway (502) after 3 retries.

Change: src/services/model_transformations.py β€” deepseek-chat, deepseek-chat-v3, and deepseek-chat-v3.1 entries in the OpenRouter model mapping table now resolve to deepseek/deepseek-chat-v3-0324 (stable versioned endpoint).

Manual rollback: In model_transformations.py, find the OpenRouter model mapping dict and revert the deepseek-chat entries:

# Revert to generic alias:
"deepseek/deepseek-chat": "deepseek/deepseek-chat",
"deepseek-chat": "deepseek/deepseek-chat",
"deepseek/deepseek-chat-v3": "deepseek/deepseek-chat",
"deepseek/deepseek-chat-v3.1": "deepseek/deepseek-chat",

Fix 3 β€” Mistral: explicit OpenRouter routing for mistralai org prefix

Problem: detect_provider_from_model_id() had no case for the mistralai org prefix. It fell through to the default (onerouter / Infron AI), which accepted the request but returned an empty stream.

Change: src/services/model_transformations.py β€” added an explicit if org == "mistralai": return "openrouter" check inside detect_provider_from_model_id().

Manual rollback: Remove the block added to detect_provider_from_model_id():

# Remove this block:
if org == "mistralai":
    logger.info(f"Routing '{model_id}' to openrouter (mistralai org prefix)")
    return "openrouter"

Fix 4 β€” xAI grok-2 / grok-2-1212: redirect deprecated models to grok-3

Problem: xAI deprecated grok-2-1212. The 404 response body wasn't parseable by the error handler's model extractor, so it surfaced as "Model 'unknown' not found" with no actionable detail.

Change: src/services/model_transformations.py β€” added grok-2, grok-2-1212, and their prefixed variants to both MODEL_ID_ALIASES (β†’ x-ai/grok-3) and the xai provider mapping table (β†’ grok-3).

Manual rollback:

  1. Remove from MODEL_ID_ALIASES:
    "grok-2": "x-ai/grok-3",
    "grok-2-1212": "x-ai/grok-3",
    "xai/grok-2": "x-ai/grok-3",
    "xai/grok-2-1212": "x-ai/grok-3",
    "x-ai/grok-2": "x-ai/grok-3",
    "x-ai/grok-2-1212": "x-ai/grok-3",
  2. Remove from the xai provider mapping table:
    "grok-2": "grok-3",
    "grok-2-1212": "grok-3",
    "xai/grok-2": "grok-3",
    "xai/grok-2-1212": "grok-3",

πŸ™ Acknowledgments

Built with:

  • FastAPI - Modern Python web framework
  • Supabase - PostgreSQL database platform
  • Redis - In-memory cache
  • Prometheus - Metrics collection
  • OpenTelemetry - Distributed tracing

Last Updated: 2025-12-28 Version: 2.0.3 Status: Production Ready βœ… Documentation: Complete βœ