A modern, intelligent AI Agent Service framework built with FastAPI & FastMCP that demonstrates how to implement agent tool management, prompt handling, and multi-provider AI integration from scratch. This project showcases a production-ready implementation of agent-specific tool filtering, dynamic system prompts, and unified provider interfaces without relying on abstraction frameworks. Built with Docker, comprehensive logging, and enterprise-grade features. The service now includes comprehensive streaming support across all providers and API endpoints, enabling real-time response delivery and enhanced user experience.
- Framework Design - Complete implementation showing how to build AI agents from the ground up
- FastAPI Framework - Modern, fast web framework with automatic API documentation
- AI Agent Capabilities - Intelligent automation and decision-making
- Health Check Endpoints - Built-in monitoring and status endpoints
- Multi-Provider AI Support - Azure OpenAI, Ollama, OpenRouter with unified interface
- MCP Integration - Model Context Protocol for external tools using fastmcp library
- OpenAI-Compatible API - Full OpenAI protocol compliance with streaming support
- Streaming Support - Real-time response streaming across all providers and API endpoints
- Tool Filtering - Agent-specific permissions and authorization
- Prompt Management - Dynamic system prompts with tool integration
- Agent Resource Manager - Per-agent resource access control and automatic resource creation
- Resource Management - Global resource lifecycle management with agent-specific filtering
- Model Configuration - Flexible model selection and parameter management
- CLI Parameter Overrides - Runtime model and setting customization
- Environment Configuration - Flexible settings with environment variable support
- Hot Reload - Development mode with automatic code reloading
- Memory Persistence - PostgreSQL-based conversation history with automatic cleanup
- Memory Compression - Intelligent conversation history management with AI-powered summarization
- Knowledge Base - Vector-based RAG system with document ingestion, semantic search, and reranking
- Vector Storage - Pluggable vector provider architecture with PGVector implementation
- Document Chunking - Multiple strategies for optimal document processing (simple, semantic, token-aware)
- Response Processing - Automatic response cleaning and formatting for memory storage
- Agent Performance Assessment - DeepEval integration for comprehensive evaluation
- Synthetic Test Generation - Create golden datasets for consistent testing
- Metric-Based Evaluation - Tool correctness, hallucination detection, answer relevancy
- Docker Support - Multi-stage builds with development and production targets
- Structured Logging - Comprehensive logging setup for debugging and monitoring
- Type Safety - Full type hints throughout the codebase
- Auto-Generated Docs - Interactive API documentation with Swagger UI and ReDoc
- Rate Limit Resilience - Automatic retry with exponential backoff for API rate limits
# Clone the repository
git clone https://github.com/ScottRBK/ai-agent-service
cd ai-agent-service
# Run in development mode
cd docker
docker-compose --profile dev up --buildThe service will be available at:
- API: http://localhost:8001
- Health Check: http://localhost:8001/health
- API Docs: http://localhost:8001/docs
- ReDoc: http://localhost:8001/redoc
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the application
python -m app.mainai-agent-service/
├── app/
│ ├── api/
│ │ └── routes/
│ │ ├── __init__.py
│ │ ├── health.py # Health check endpoints
│ │ ├── agents.py # Agent management API
│ │ └── openai_compatible.py # OpenAI-compatible API with streaming
│ ├── core/
│ │ ├── agents/
│ │ │ ├── base_agent.py # Base agent class with direct resource composition
│ │ │ ├── agent_tool_manager.py # Agent tool filtering with fastmcp
│ │ │ ├── prompt_manager.py # System prompt management
│ │ │ ├── cli_agent.py # CLI agent implementation (inherits from BaseAgent)
│ │ │ ├── api_agent.py # API agent implementation with streaming (inherits from BaseAgent)
│ │ │ └── memory_compression_agent.py # Memory compression agent (inherits from BaseAgent)
│ │ ├── providers/
│ │ │ ├── base.py # Base provider interface
│ │ │ ├── azureopenapi.py # Azure OpenAI (Responses API) with streaming and retry logic
│ │ │ ├── azureopenapi_cc.py # Azure OpenAI (Chat Completions) with streaming and retry logic
│ │ │ ├── ollama.py # Ollama provider with streaming
│ │ │ └── openrouter.py # OpenRouter provider with OpenAI-compatible API and retry logic
│ │ ├── resources/
│ │ │ ├── base.py # Base resource interface
│ │ │ ├── memory.py # PostgreSQL memory resource
│ │ │ ├── memory_compression_manager.py # Memory compression logic
│ │ │ ├── knowledge_base.py # Vector-based knowledge base with RAG capabilities
│ │ │ ├── vector_providers/ # Vector storage providers
│ │ │ │ ├── base.py # Vector provider interface
│ │ │ │ └── pgvector_provider.py # PostgreSQL + pgvector implementation
│ │ │ └── chunking/ # Document chunking strategies
│ │ │ ├── base.py # Chunking strategy interface
│ │ │ ├── simple.py # Basic text splitting
│ │ │ ├── semantic.py # AI-powered semantic boundaries
│ │ │ ├── token_aware.py # Token-conscious splitting
│ │ │ └── document_specific.py # Format-aware chunking
│ │ └── tools/
│ │ ├── tool_registry.py # Tool management
│ │ └── function_calls/ # Built-in tools
│ ├── config/
│ │ └── settings.py # Application configuration
│ ├── models/
│ │ ├── agents.py # Agent API models
│ │ └── resources/
│ │ ├── memory.py # Memory data models
│ │ └── knowledge_base.py # Knowledge base data models
│ └── utils/
│ ├── logging.py # Logging configuration
│ ├── chat_utils.py # Response cleaning utilities
│ └── retry_utils.py # Rate limit retry with exponential backoff
├── tests/
│ ├── test_core/
│ │ ├── test_agents/ # Agent unit tests
│ │ ├── test_providers/ # Provider tests with streaming
│ │ ├── test_resources/ # Resource tests
│ │ └── test_tools/ # Tool tests
│ ├── test_api/
│ │ ├── test_agents.py # Agent API tests
│ │ └── test_openai_compatible_integration.py # OpenAI API tests with streaming
│ └── test_integration/ # End-to-end tests
├── app/
│ ├── evaluation/
│ │ ├── config.py # Evaluation configuration models
│ │ ├── runner.py # Evaluation execution engine
│ │ ├── dataset.py # Golden dataset management
│ │ ├── evaluation_utils.py # Result analysis utilities
│ │ └── evals/ # Agent-specific evaluations
│ │ ├── cli_agent.py # CLI agent evaluation example
│ │ ├── simple_eval.py # Basic evaluation with tool correctness
│ │ └── temporal_awareness.py # Time-based information evaluation
├── examples/
│ └── run_agent.py # CLI agent runner
├── docker/
│ ├── Dockerfile # Multi-stage Docker build
│ └── docker-compose.yml # Development environment
├── agent_config.json # Agent configurations
├── prompts/ # System Prompt Files per agent
├── mcp.json # MCP server config
└── requirements.txt
The service uses environment-based configuration with sensible defaults. Key configuration areas include:
- Environment Variables: Service settings, ports, logging levels
- Agent Configuration: Tool permissions, model settings, resources
- MCP Server Authorization: Secure token management with
${VARIABLE_NAME}substitution - Docker Volume Mounts: Flexible configuration file management
For detailed configuration options, environment variables, and examples, see Deployment Guide.
The service includes a comprehensive memory system providing:
- PostgreSQL-based persistence - Conversation history with session isolation
- AI-powered compression - Intelligent summarization when conversations exceed token thresholds
- Cross-session context - Automatic retrieval of relevant context from past conversations
- Knowledge base archival - Compressed conversations archived for enhanced context awareness
- Graceful error handling - Tool iteration limits handled without exceptions
For detailed memory configuration, compression settings, and usage examples, see Memory Documentation.
The service provides comprehensive REST and OpenAI-compatible APIs:
- Agent Management: Create, configure, and interact with AI agents
- OpenAI Compatibility: Standard
/v1/chat/completionsand/v1/modelsendpoints - Memory Management: Conversation history and session management
- Streaming Support: Real-time response streaming across all endpoints
For detailed API documentation with examples, see API Reference.
# Development mode
cd docker
docker-compose --profile dev up --build# Production build
docker build -f docker/Dockerfile --target production -t ai-agent-service:latest .
docker run -p 8000:8000 ai-agent-service:latestFor comprehensive deployment instructions, environment configuration, and production setup, see Deployment Guide.
The service provides a comprehensive agent framework with:
- Multi-Provider Support: Azure OpenAI, Ollama, OpenRouter with unified interface
- Agent-Specific Tool Filtering: Granular control over tool access per agent
- MCP Integration: HTTP and command-based Model Context Protocol servers
- Memory Management: PostgreSQL-based conversation persistence with compression
- Knowledge Base System: Vector-based RAG with document ingestion, semantic search, and reranking
- Direct Resource Composition: Simplified architecture with agents managing their own resources
- Dynamic Configuration: Runtime model and parameter overrides
- Rate Limit Resilience: Automatic retry logic with exponential backoff
For detailed agent configuration, MCP server setup, and provider information, see Usage Examples.
# Run research agent with web search capabilities
python examples/run_agent.py research_agent azure_openai_cc
# Run CLI agent with full tool access and memory
python examples/run_agent.py cli_agent azure_openai_cc
# Override model and settings
python examples/run_agent.py cli_agent ollama --model qwen3:4b --setting temperature 0.7- research_agent: Web research with search tools
- cli_agent: Interactive CLI with full tool access
- api_agent: Optimized for web API usage
- mcp_agent: MCP tools only
For comprehensive usage examples, agent configurations, and detailed CLI options, see Usage Examples.
For troubleshooting common issues, configuration problems, and deployment guidance, see Deployment Guide.
- 285+ comprehensive tests covering all core functionality
- Unit Testing: Agent architecture, tool filtering, memory management
- Integration Testing: End-to-end workflows, MCP servers, streaming
- Performance Testing: Memory compression, response times
- DeepEval Integration: Agent performance assessment
- Tool Correctness: Validates appropriate tool usage
- Hallucination Detection: Factual accuracy measurement
- RAG Metrics: Faithfulness, Contextual Relevancy, Contextual Recall, Contextual Precision
- Custom Metrics: GEval with observability and tracing
# CLI agent evaluation
python app/evaluation/evals/cli_agent.py --generate # Generate golden test cases
python app/evaluation/evals/cli_agent.py --verbose # Run evaluation with detailed output
# Knowledge agent evaluation with RAG metrics
python app/evaluation/evals/knowledge_agent.py --generate # Generate RAG test cases
python app/evaluation/evals/knowledge_agent.py --verbose # Run RAG evaluation# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=app --cov-report=htmlFor detailed testing examples and evaluation framework documentation, see Evaluation Framework.
The service uses a modular architecture with clean separation of concerns:
- BaseAgent Architecture: Unified agent foundation with direct resource composition
- Provider Abstraction: Support for multiple AI providers with embedding and reranking capabilities
- Resource System: Direct resource management with memory and knowledge base support
- Tool System: Plugin-based MCP and function tools
- Vector Storage: Extensible vector provider architecture with PostgreSQL integration
- Hot Reload: Development mode with automatic code reloading
- Retry Patterns: Built-in exponential backoff for rate limit handling
For detailed development patterns, custom agent creation, and code examples, see Usage Examples.
- FastAPI - Modern async web framework with automatic API documentation
- FastMCP - Model Context Protocol integration with HTTP and command support
- PostgreSQL - Conversation memory persistence, vector storage, and knowledge base management
- PGVector - PostgreSQL extension for high-performance vector operations
- DeepEval - AI agent evaluation and performance assessment
- OpenRouter - Access to diverse AI models through unified API
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or issues:
- Check the API documentation when running locally
- Review the logs:
docker logs <container-name> - Open an issue in this repository
Happy coding! 🚀
