Add support for rate limiting
Overview
Add support for rate limiting with agents for both model and tool calls. This will enable users to control costs, prevent runaway agents, and enforce usage constraints at various levels (per invocation, per user, per time window). Rate limiting is essential for production deployments and multi-tenant environments.
Requirements
R1: Model Call Rate Limiting
Users should be able to configure rate limiting for model calls, setting token budgets. This includes:
Token budget limits per invocation (input + output tokens)
Token budget limits per user (daily/monthly)
Token budget limits per agent (daily/monthly)
Graceful handling when token budgets are exceeded
Real-time token usage tracking and enforcement
Different limits for different model providers (see Issue Add support for alternate LLM providers #74 )
Warning thresholds before hitting hard limits
R2: Tool Call Rate Limiting
Users should be able to configure rate limiting for tool calls, constraining the amount of tool calls that are made. This includes:
Maximum tool calls per invocation
Maximum tool calls per user (daily/hourly)
Maximum tool calls per agent (daily/hourly)
Per-tool-type limits (e.g., limit expensive operations like web fetches)
Graceful handling when tool call limits are exceeded
Real-time tool call counting and enforcement
Circuit breaker patterns for repeated failures
Technical Considerations
Rate Limit Types
Per-Invocation Limits : Hard caps on a single agent invocation
Token budget (input + output)
Tool call count
Time limit (max execution time)
Per-User Limits : Quotas across all invocations by a user
Daily/monthly token budget
Hourly/daily tool call limits
Concurrent invocation limits
Per-Agent Limits : Quotas for a specific agent
Daily/monthly token budget for that agent
Tool call limits specific to agent behavior
Per-Tool Limits : Constraints on specific tools
Web fetch limits (prevent scraping abuse)
Bash execution limits (prevent resource exhaustion)
File operation limits
Configuration Management
Update etc/environment.sh to include rate limit configuration:
Default token budget per invocation
Default tool call limit per invocation
Per-user daily/monthly limits
Per-agent limits
Warning threshold percentages (e.g., warn at 80% of limit)
Support environment-specific limits (dev vs production)
Store user-specific overrides in database (RDS)
Allow admin users to configure custom limits per user/agent
Backend Changes
Rate Limit Store
Token Budget Tracking : Track token usage in real-time
Store current usage in memory (Redis/ElastiCache for distributed)
Persist to database for historical tracking
Reset counters at appropriate intervals (daily/monthly)
Tool Call Tracking : Track tool call counts
Per-invocation counter (in-memory)
Per-user counter (Redis/database)
Per-tool-type counters
Enforcement Layer
Middleware : Create rate limiting middleware that:
Checks limits before processing requests
Updates counters during execution
Enforces hard limits and rejects requests when exceeded
Returns clear error messages with limit details
Streaming Support : For streaming responses, enforce limits mid-stream
Stop generation when token budget is reached
Append truncation message to response
Tool Call Gating : Before executing any tool call:
Check if tool call limit would be exceeded
Check if specific tool has per-tool limits
Allow or deny with clear error message
Database Schema
New tables or columns in existing tables:
rate_limits table:
user_id (or agent_id)
limit_type (token_daily, token_monthly, tool_calls_hourly, etc.)
current_usage
limit_value
reset_at (timestamp for when counter resets)
created_at, updated_at
Update invocations table:
tokens_used (input + output)
tool_calls_count
rate_limited (boolean flag)
limit_exceeded_type (which limit was hit)
Frontend Changes
Usage Dashboard : New page or section showing:
Current token usage vs limits (progress bars)
Current tool call usage vs limits
Historical usage trends (charts)
Cost implications of usage
Invocation Details : Show rate limit information:
Tokens used / Token budget
Tool calls made / Tool call limit
Warning indicators when approaching limits
Rate Limit Errors : User-friendly error messages:
"Token budget exceeded (5,000 / 5,000). Increase your limit or try a shorter conversation."
"Tool call limit reached (20 / 20). Wait 1 hour before retrying."
Admin Panel : For managing user/agent limits (if applicable)
Monitoring & Alerting
CloudWatch Metrics :
Rate limit hits by type
Token usage by user/agent
Tool call counts by type
Limit breach attempts
Alerts :
Notify users when approaching limits (80%, 90%)
Alert admins of unusual usage patterns
Track cost implications of usage
Security Considerations
Prevent Abuse : Rate limits prevent:
Cost runaway from buggy or malicious agents
Resource exhaustion attacks
Unintentional infinite loops in agent logic
Fair Usage : Multi-tenant environments need per-user isolation
Admin Overrides : Support emergency limit increases for critical use cases
Acceptance Criteria
AC1: Token Budget Limits
AC2: Tool Call Limits
AC3: Configuration
AC4: Frontend
AC5: Database & Persistence
AC6: Testing
Implementation Notes
Suggested Approach
Phase 1: Infrastructure & Configuration (Week 1)
Design database schema for rate limit tracking
Create migration scripts using SQLAlchemy
Add configuration parameters to etc/environment.sh
Implement in-memory counter service (consider Redis for distributed)
Create rate limit middleware framework
Phase 2: Token Budget Enforcement (Week 1-2)
Implement token tracking in model invocation layer
Add token budget checks before/during invocations
Handle streaming response truncation
Update cost tracking to include limit information
Add per-user token limit enforcement
Unit tests for token budget enforcement
Phase 3: Tool Call Enforcement (Week 2)
Implement tool call counter in tool execution layer
Add tool call limit checks before tool execution
Implement per-tool-type limits
Add circuit breaker for repeated failures
Unit tests for tool call limit enforcement
Phase 4: Frontend Integration (Week 2-3)
Create usage dashboard component
Add progress bars for token and tool call usage
Update invocation detail pages with limit info
Implement error message UI for limit breaches
Add real-time usage updates
Phase 5: Monitoring & Admin Tools (Week 3)
Add CloudWatch metrics for rate limit events
Create alert rules for unusual usage
Build admin API for limit management (if needed)
Add usage analytics and reporting
Documentation for operators and users
Key Files to Modify/Create
Backend:
backend/middleware/ (new directory):
rate_limiter.py: Core rate limiting logic
token_budget.py: Token budget tracking
tool_call_limiter.py: Tool call limiting
backend/models/rate_limit.py: Database models for limits
backend/services/:
counter_service.py: In-memory counter management
limit_enforcer.py: Enforcement logic
backend/api/: Update endpoints to check limits
backend/database/migrations/: Add rate_limits table migration
Configuration:
etc/environment.sh: Add rate limit configuration
backend/config.py: Load and validate rate limit settings
Frontend:
frontend/src/components/UsageDashboard.tsx: Usage visualization
frontend/src/components/RateLimitProgress.tsx: Progress bar component
frontend/src/components/RateLimitError.tsx: Error message component
frontend/src/pages/InvocationDetailPage.tsx: Add limit information
frontend/src/api/rate-limits.ts: API client for rate limit data
Infrastructure:
iac/: SAM templates for ElastiCache Redis (if using for distributed counters)
Tests:
backend/tests/middleware/test_rate_limiter.py
backend/tests/services/test_counter_service.py
backend/tests/integration/test_rate_limits.py
Example Configuration
# etc/environment.sh additions
# Per-invocation limits
export DEFAULT_TOKEN_BUDGET=10000
export DEFAULT_TOOL_CALL_LIMIT=50
export DEFAULT_INVOCATION_TIMEOUT=300 # seconds
# Per-user limits
export USER_DAILY_TOKEN_LIMIT=100000
export USER_MONTHLY_TOKEN_LIMIT=1000000
export USER_HOURLY_TOOL_CALL_LIMIT=200
# Warning thresholds
export TOKEN_BUDGET_WARNING_THRESHOLD=80 # percent
export TOOL_CALL_WARNING_THRESHOLD=80
# Per-tool limits
export WEB_FETCH_LIMIT_PER_INVOCATION=10
export BASH_EXEC_LIMIT_PER_INVOCATION=20
Error Response Format
{
"error" : " RateLimitExceeded" ,
"message" : " Token budget exceeded" ,
"details" : {
"limit_type" : " token_budget_per_invocation" ,
"current_usage" : 10000 ,
"limit_value" : 10000 ,
"reset_at" : null
}
}
Database Schema Example
CREATE TABLE rate_limits (
id SERIAL PRIMARY KEY ,
user_id VARCHAR (255 ),
agent_id VARCHAR (255 ),
limit_type VARCHAR (50 ) NOT NULL ,
current_usage INTEGER DEFAULT 0 ,
limit_value INTEGER NOT NULL ,
reset_at TIMESTAMP ,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_rate_limits_user_id ON rate_limits(user_id);
CREATE INDEX idx_rate_limits_agent_id ON rate_limits(agent_id);
CREATE INDEX idx_rate_limits_reset_at ON rate_limits(reset_at);
Testing Strategy
Unit Tests : Mock counters and test enforcement logic
Integration Tests : Test with real database and counters
Load Tests : Verify limits under concurrent load
Boundary Tests : Test counter resets at time boundaries
Streaming Tests : Verify mid-stream truncation works correctly
Multi-Tenant Tests : Verify user isolation of limits
Cost Tracking Integration
Rate limiting directly impacts costs. Update the existing cost tracking to:
Show usage as percentage of limits
Project monthly costs based on current usage
Alert when costs approach budget limits
Maintain consistency with formatCost functions in:
CostDashboardPage.tsx
InvocationDetailPage.tsx
LatencySummary.tsx
InvocationTable.tsx
References
Dependencies
Priority
High - Critical for cost control and production deployments
Estimated Effort
Large (3 weeks) - Requires middleware, database changes, frontend work, and comprehensive testing
Add support for rate limiting
Overview
Add support for rate limiting with agents for both model and tool calls. This will enable users to control costs, prevent runaway agents, and enforce usage constraints at various levels (per invocation, per user, per time window). Rate limiting is essential for production deployments and multi-tenant environments.
Requirements
R1: Model Call Rate Limiting
Users should be able to configure rate limiting for model calls, setting token budgets. This includes:
R2: Tool Call Rate Limiting
Users should be able to configure rate limiting for tool calls, constraining the amount of tool calls that are made. This includes:
Technical Considerations
Rate Limit Types
Configuration Management
etc/environment.shto include rate limit configuration:Backend Changes
Rate Limit Store
Enforcement Layer
Database Schema
New tables or columns in existing tables:
rate_limitstable:user_id(or agent_id)limit_type(token_daily, token_monthly, tool_calls_hourly, etc.)current_usagelimit_valuereset_at(timestamp for when counter resets)created_at,updated_atinvocationstable:tokens_used(input + output)tool_calls_countrate_limited(boolean flag)limit_exceeded_type(which limit was hit)Frontend Changes
Monitoring & Alerting
Security Considerations
Acceptance Criteria
AC1: Token Budget Limits
AC2: Tool Call Limits
AC3: Configuration
etc/environment.shAC4: Frontend
AC5: Database & Persistence
AC6: Testing
Implementation Notes
Suggested Approach
Phase 1: Infrastructure & Configuration (Week 1)
etc/environment.shPhase 2: Token Budget Enforcement (Week 1-2)
Phase 3: Tool Call Enforcement (Week 2)
Phase 4: Frontend Integration (Week 2-3)
Phase 5: Monitoring & Admin Tools (Week 3)
Key Files to Modify/Create
Backend:
backend/middleware/(new directory):rate_limiter.py: Core rate limiting logictoken_budget.py: Token budget trackingtool_call_limiter.py: Tool call limitingbackend/models/rate_limit.py: Database models for limitsbackend/services/:counter_service.py: In-memory counter managementlimit_enforcer.py: Enforcement logicbackend/api/: Update endpoints to check limitsbackend/database/migrations/: Add rate_limits table migrationConfiguration:
etc/environment.sh: Add rate limit configurationbackend/config.py: Load and validate rate limit settingsFrontend:
frontend/src/components/UsageDashboard.tsx: Usage visualizationfrontend/src/components/RateLimitProgress.tsx: Progress bar componentfrontend/src/components/RateLimitError.tsx: Error message componentfrontend/src/pages/InvocationDetailPage.tsx: Add limit informationfrontend/src/api/rate-limits.ts: API client for rate limit dataInfrastructure:
iac/: SAM templates for ElastiCache Redis (if using for distributed counters)Tests:
backend/tests/middleware/test_rate_limiter.pybackend/tests/services/test_counter_service.pybackend/tests/integration/test_rate_limits.pyExample Configuration
Error Response Format
{ "error": "RateLimitExceeded", "message": "Token budget exceeded", "details": { "limit_type": "token_budget_per_invocation", "current_usage": 10000, "limit_value": 10000, "reset_at": null } }Database Schema Example
Testing Strategy
Cost Tracking Integration
Rate limiting directly impacts costs. Update the existing cost tracking to:
formatCostfunctions in:CostDashboardPage.tsxInvocationDetailPage.tsxLatencySummary.tsxInvocationTable.tsxReferences
Dependencies
Priority
High - Critical for cost control and production deployments
Estimated Effort
Large (3 weeks) - Requires middleware, database changes, frontend work, and comprehensive testing