Problem Statement
The health check (api/src/health/health.controller.ts) only verifies database connectivity. There's no distinction between liveness (is the process alive?) and readiness (can it serve traffic?). There are no checks for WebSocket gateway health, memory usage, or upstream dependency status.
Evidence
// api/src/health/health.controller.ts — single endpoint
@Get()
async check(): Promise<HealthCheckResponseDto> {
await this.healthCheckService.check([
async () => this.databaseHealthIndicator.isHealthy("database"),
])
return { status: "ok", timestamp: new Date().toISOString() }
}
Impact
Kubernetes cannot distinguish between a process that is alive but unable to serve traffic (e.g., DB connection lost temporarily) and a healthy process. Downtime detection is delayed.
Proposed Solution
- Add
GET /health/live — liveness probe (always returns ok if process is running)
- Update
GET /health — readiness probe (checks all dependencies: DB, memory threshold, event loop lag)
- Add NestJS memory usage check and event loop lag check
Acceptance Criteria
File Map
api/src/health/health.controller.ts — add /live and enhanced /health
api/src/health/memory.health-indicator.ts — new
api/src/health/health.module.ts — update providers
Labels: observability, infrastructure
Priority: Medium | Difficulty: Beginner | Estimated Effort: 1d
Labels: observability,infrastructure
Priority: Medium | Difficulty: Beginner | Estimated Effort: 1d
Backlog ID: REPO-035
Problem Statement
The health check (
api/src/health/health.controller.ts) only verifies database connectivity. There's no distinction between liveness (is the process alive?) and readiness (can it serve traffic?). There are no checks for WebSocket gateway health, memory usage, or upstream dependency status.Evidence
Impact
Kubernetes cannot distinguish between a process that is alive but unable to serve traffic (e.g., DB connection lost temporarily) and a healthy process. Downtime detection is delayed.
Proposed Solution
GET /health/live— liveness probe (always returns ok if process is running)GET /health— readiness probe (checks all dependencies: DB, memory threshold, event loop lag)Acceptance Criteria
File Map
api/src/health/health.controller.ts— add /live and enhanced /healthapi/src/health/memory.health-indicator.ts— newapi/src/health/health.module.ts— update providersLabels: observability, infrastructure
Priority: Medium | Difficulty: Beginner | Estimated Effort: 1d
Labels: observability,infrastructure
Priority: Medium | Difficulty: Beginner | Estimated Effort: 1d
Backlog ID: REPO-035