feat: add retry with backoff and job timeouts to event worker

## Summary

Add retry logic and timeout handling to the event processing worker. This builds on top of #35 (event processing redesign) to improve reliability.

## Features

### 1. Retry with Exponential Backoff

Failed events should automatically retry with increasing delays:

```
Attempt 1: fails → wait 1 min → retry
Attempt 2: fails → wait 5 min → retry
Attempt 3: fails → wait 30 min → retry
Attempt N: fails → move to dead letter (or stop retrying)
```

**Schema changes:**
```sql
ALTER TABLE events ADD COLUMN attempts INTEGER DEFAULT 0;
ALTER TABLE events ADD COLUMN max_attempts INTEGER DEFAULT 5;
ALTER TABLE events ADD COLUMN next_retry_at TIMESTAMP WITH TIME ZONE;
```

**Query change:**
```sql
-- fetch_pending() becomes:
SELECT * FROM events 
WHERE delivery_status = 'pending'
   OR (delivery_status = 'failed' AND next_retry_at < now() AND attempts < max_attempts)
FOR UPDATE SKIP LOCKED
LIMIT :batch_size
```

**Backoff calculation:**
```python
def calculate_next_retry(attempts: int) -> datetime:
    # Exponential backoff: 1min, 5min, 30min, 2hr, 12hr
    delays = [60, 300, 1800, 7200, 43200]
    delay = delays[min(attempts, len(delays) - 1)]
    return datetime.now(UTC) + timedelta(seconds=delay)
```

### 2. Job Timeouts

Long-running jobs should be killed and marked as failed:

```python
async def _dispatch(self, event: Event) -> None:
    try:
        async with asyncio.timeout(self._job_timeout):
            await listener.handle(event)
    except asyncio.TimeoutError:
        await outbox.mark_failed(event.id, "Job timed out")
```

**Configuration:**
```python
class WorkerConfig:
    job_timeout: int = 300  # 5 minutes default
```

### 3. Dead Letter Handling

Events that exceed max_attempts should be moved to a dead letter state:

```python
if event.attempts >= event.max_attempts:
    await outbox.mark_dead_letter(event.id)
```

Could add a `delivery_status = 'dead_letter'` or separate table for inspection.

## Tasks

- [ ] Add `attempts`, `max_attempts`, `next_retry_at` columns (migration)
- [ ] Update `fetch_pending()` to include retriable failed events
- [ ] Add `calculate_next_retry()` backoff logic
- [ ] Update `mark_failed()` to set `next_retry_at` and increment `attempts`
- [ ] Add timeout wrapper in worker dispatch
- [ ] Add dead letter status/handling
- [ ] Add configuration for timeout and max_attempts
- [ ] Update tests

## Depends On

- #35 (event processing redesign) - should be implemented first

## Acceptance Criteria

- [ ] Failed events automatically retry up to max_attempts
- [ ] Retry delays increase exponentially
- [ ] Jobs exceeding timeout are killed and marked failed
- [ ] Events exceeding max_attempts stop retrying (dead letter)
- [ ] Retry behavior is configurable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add retry with backoff and job timeouts to event worker #38

Summary

Features

1. Retry with Exponential Backoff

2. Job Timeouts

3. Dead Letter Handling

Tasks

Depends On

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: add retry with backoff and job timeouts to event worker #38

Description

Summary

Features

1. Retry with Exponential Backoff

2. Job Timeouts

3. Dead Letter Handling

Tasks

Depends On

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions