Scalable agent runtime demonstrating system design, resilience patterns, and production best practices.
- Circuit Breaker Pattern: Prevents cascading failures with automatic recovery
- Exponential Backoff Retry: Resilient operations with configurable retry logic
- Rate Limiting: Token bucket algorithm for API throttling (500 req/min)
- Caching Layer: In-memory cache with TTL for performance optimization
- Distributed Tracing: Correlation IDs and structured logging for observability
- Graceful Shutdown: Proper resource cleanup and signal handling
- Comprehensive Testing: 95%+ test coverage with pytest
- Performance Benchmarking: Automated load testing suite
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI + Middleware Layer β
β (Correlation ID β Logging β Rate Limiting) β
ββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Orchestrator β
β (Async Routing)β
ββββββββββ¬βββββββββ
β
ββββββββββ΄ββββββββββββββββββββ
β β
βββββΌβββββββββ βββββββββΌβββββββββ
β LangGraph β β CrewAI β
β Workflow β β Multi-Agent β
β β β (Parallel) β
βββββ¬βββββββββ βββββββββ¬βββββββββ
β β
βββββββββββ¬ββββββββββββββββββ
β
βββββββββββΌβββββββββββ
β Resilience Layer β
β - Circuit Breaker β
β - Retry Logic β
β - Rate Limiter β
β - Cache Manager β
βββββββββββ¬βββββββββββ
β
βββββββββββΌβββββββββββ
β Memory Manager β
β (Stateful Store) β
ββββββββββββββββββββββ
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v --cov=src
# Start server
python main.py
# Run benchmarks (in another terminal)
python benchmark.py
# Run demo
python examples/demo.pyR/
βββ src/ # Core modules
β βββ agents.py # Agent orchestration with retry/circuit breaker
β βββ memory.py # Stateful memory management
β βββ inference.py # Inference layer
β βββ telemetry.py # Metrics and observability
β βββ resilience.py # Circuit breaker, retry, rate limit, cache
β βββ middleware.py # Logging, correlation IDs, rate limiting
β βββ config.py # Configuration management
βββ tests/ # Comprehensive test suite
β βββ test_resilience.py # Resilience pattern tests
β βββ test_agents.py # Agent functionality tests
β βββ test_memory.py # Memory management tests
β βββ conftest.py # Pytest fixtures
βββ examples/
β βββ demo.py # Interactive demo
βββ main.py # FastAPI server with middleware
βββ benchmark.py # Performance benchmarking suite
βββ requirements.txt # Minimal dependencies
- Circuit Breaker: Auto-recovery from failures (3 failures β open, 60s recovery)
- Retry with Exponential Backoff: Configurable retry logic with jitter
- Graceful Degradation: Fail-safe mechanisms throughout
- Rate Limiting: Token bucket algorithm (500 req/min) prevents overload
- Caching: In-memory cache with TTL reduces redundant computation
- Async/Parallel Execution: CrewAI agents run concurrently
- Connection Pooling: Efficient resource utilization
- Structured Logging: JSON logs with contextual information
- Correlation IDs: End-to-end request tracing
- Metrics Collection: Real-time performance metrics
- Health Checks: Liveness and readiness probes
- Graceful Shutdown: SIGTERM/SIGINT handling with cleanup
- Error Handling: Comprehensive exception handling with context
- Configuration Management: Environment-based settings
- API Versioning: Semantic versioning support
- Unit Tests: 95%+ code coverage
- Integration Tests: End-to-end testing
- Performance Benchmarks: Automated load testing
- Test Fixtures: Reusable test components
| Endpoint | Method | Description | Features |
|---|---|---|---|
/api/agent/execute |
POST | Execute agent task | Retry, Circuit Breaker, Caching |
/api/memory/store |
POST | Store in memory | TTL support |
/api/memory/retrieve |
POST | Retrieve from memory | Pagination |
/api/metrics |
GET | System metrics | Real-time stats |
/health |
GET | Health check | Readiness probe |
/docs |
GET | OpenAPI docs | Interactive API |
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=html
# Run specific test suite
pytest tests/test_resilience.py -v
# Run with markers
pytest tests/ -v -m asyncio- Resilience Patterns: Circuit breaker, retry, rate limiting
- Agent Functionality: LangGraph, CrewAI orchestration
- Memory Operations: Short-term, long-term, concurrent access
- Error Scenarios: Failure handling, recovery
Benchmark results on M1 Mac (8GB RAM):
| Operation | Avg Latency | P95 | P99 | Throughput |
|---|---|---|---|---|
| Health Check | 2.88ms | 6.27ms | 7.80ms | 347 req/s |
| Agent Execute | 56.34ms | 61.01ms | 75.34ms | 17.75 req/s |
| Memory Store | 3.51ms | 7.07ms | 12.23ms | 284 req/s |
| Concurrent (10) | 124.33ms | - | - | 48.52 req/s |
- Rate Limiting: Prevents DoS attacks (500 req/min default)
- Input Validation: Pydantic models for request validation
- Error Sanitization: No sensitive data in error responses
- Correlation IDs: Audit trail for all requests
Structured logs include:
- Request/response timing
- Error rates and types
- Memory usage statistics
- Cache hit/miss rates
- Circuit breaker state changes
# src/config.py
class Settings:
host: str = "0.0.0.0"
port: int = 8000
max_concurrent_agents: int = 10
agent_timeout: int = 300Prevents cascading failures when inference layer is slow/down. Automatically recovers without manual intervention.
Smooth traffic distribution vs hard limits. Allows burst traffic while maintaining average rate.
Essential for distributed tracing. Links all operations in a request chain for debugging.
Machine-parseable logs enable better alerting and analytics. Critical for production systems.
MIT License - Feel free to use for portfolio/learning