Skip to content

[ENHANCEMENT] Add retry logic with exponential backoff for LLM API calls #190

@harikapadia999

Description

@harikapadia999

Problem

LLM API calls can fail due to:

  • Rate limiting (429 errors)
  • Temporary network issues
  • Service unavailability (503 errors)
  • Timeout errors

Currently, these failures cause immediate workflow termination without retry attempts.

Proposed Solution

Implement retry logic with exponential backoff for transient failures:

# flo_ai/llm/retry.py
import time
from typing import Callable, TypeVar, Optional
from functools import wraps

T = TypeVar('T')

class RetryConfig:
    def __init__(
        self,
        max_retries: int = 3,
        initial_delay: float = 1.0,
        max_delay: float = 60.0,
        exponential_base: float = 2.0,
        jitter: bool = True
    ):
        self.max_retries = max_retries
        self.initial_delay = initial_delay
        self.max_delay = max_delay
        self.exponential_base = exponential_base
        self.jitter = jitter

def with_retry(config: Optional[RetryConfig] = None):
    """Decorator for retrying LLM API calls with exponential backoff"""
    if config is None:
        config = RetryConfig()
    
    def decorator(func: Callable[..., T]) -> Callable[..., T]:
        @wraps(func)
        def wrapper(*args, **kwargs) -> T:
            last_exception = None
            
            for attempt in range(config.max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except (RateLimitError, TimeoutError, ServiceUnavailableError) as e:
                    last_exception = e
                    
                    if attempt == config.max_retries:
                        raise
                    
                    # Calculate delay with exponential backoff
                    delay = min(
                        config.initial_delay * (config.exponential_base ** attempt),
                        config.max_delay
                    )
                    
                    # Add jitter to prevent thundering herd
                    if config.jitter:
                        delay *= (0.5 + random.random() * 0.5)
                    
                    logger.warning(
                        f"Attempt {attempt + 1}/{config.max_retries} failed: {e}. "
                        f"Retrying in {delay:.2f}s..."
                    )
                    time.sleep(delay)
            
            raise last_exception
        
        return wrapper
    return decorator

Usage

# In LLM client
class OpenAIClient:
    @with_retry(RetryConfig(max_retries=3, initial_delay=1.0))
    def generate(self, prompt: str) -> str:
        return self.client.chat.completions.create(...)

YAML Configuration

agents:
  - name: "my_agent"
    model: "gpt-4"
    retry:
      enabled: true
      max_retries: 3
      initial_delay: 1.0
      max_delay: 60.0
      exponential_base: 2.0
      jitter: true

Benefits

  1. ✅ Improved reliability for production workflows
  2. ✅ Automatic recovery from transient failures
  3. ✅ Configurable per-agent
  4. ✅ Prevents cascading failures
  5. ✅ Better user experience (no manual retries)

Implementation Checklist

  • Create retry decorator with exponential backoff
  • Add retry configuration to agent schema
  • Integrate with all LLM clients (OpenAI, Anthropic, Gemini)
  • Add retry metrics/logging
  • Update documentation
  • Add tests for retry logic

Related Issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions