Skip to content

jusso-dev/GoUpTime

Repository files navigation

UpTime

A self-hosted uptime and synthetic monitoring platform built with Go, Gin, PostgreSQL, Redis, worker pools, a first-party web console, and incident notifications.

UpTime started as a small uptime-check API. This rebuild turns the same idea into a backend-first portfolio project with real persistence, scheduler/worker separation, incident handling, API-key auth, metrics, and Docker Compose.

Features

  • First-party React console at GET /app for monitors, incidents, agents, status pages, on-call, runbooks, and check events
  • Gin REST API with GET /health, legacy GET /health-check, and legacy POST /ping-endpoint
  • HTTP, API assertion, keyword, TCP, UDP, DNS, TLS, domain-expiry, ping, and heartbeat checks
  • HTTP timing details through httptrace: DNS, TCP connect, TLS handshake, first byte, total duration
  • PostgreSQL tables for monitors, check results, incidents, notification channels, API keys, and audit logs
  • Worker process with goroutines, channels, context cancellation, duplicate-check avoidance, and graceful shutdown
  • Incident lifecycle with acknowledgement, investigation, severity, impact, timeline evidence, comments, post-mortems, and action items
  • Alert-quality controls for regional quorum, dependency suppression, grouping, flapping cooldown, and maintenance suppression
  • Webhook, Slack, push, SMTP, chat, Twilio SMS/voice, and AWS SNS SMS notification channels for incident events
  • Monitor tags, services, maintenance windows, public status pages, and uptime reports
  • Status page subscribers, public announcements, and automatic incident updates
  • Remote/private agents that poll assigned checks and submit regional results
  • On-call schedules, overrides, escalation policies, runbooks, and browser synthetic artifacts
  • API key authentication with hashed stored keys and a bootstrap admin key
  • Built-in event stream from checks, incidents, workers, agents, and status communication
  • Optional scrape metrics for teams that want an external metrics stack
  • Docker Compose stack with API, worker, Postgres, Redis, and optional browser worker

Screenshots

These screenshots were captured from the live Docker Compose stack.

Health endpoint showing PostgreSQL and Redis status

Authenticated monitor API returning monitors updated by the worker

Legacy ping endpoint returning HTTP tracing timings

Prometheus scraping the API and worker metrics endpoints

Grafana dashboard provisioned for UpTime metrics

Architecture

flowchart LR
  User[User / API Client] --> API[Go Gin API]
  Console[React Console] --> API
  API --> Postgres[(PostgreSQL)]
  API --> Redis[(Redis)]
  Worker[Go Worker Pool] --> Postgres
  Worker --> Redis
  Worker --> Targets[Websites / TCP / DNS / TLS Targets]
  Worker --> Notify[Webhook Notifications]
  Browser[Browser Worker] --> Redis
  Browser --> Artifacts[(Artifacts)]
Loading

Tech Stack

  • Go 1.22+
  • Gin
  • PostgreSQL via GORM
  • Redis
  • React and Vite for the embedded console
  • Optional Prometheus client library
  • Structured logging with slog
  • Docker Compose

Local Setup

Run the full stack:

make docker-up

API: http://localhost:8008

Console: http://localhost:8008/app

Optional external metrics stack:

make metrics-up

Prometheus: http://localhost:9090

Grafana: http://localhost:3000 with admin / admin

Run without Docker for Go processes:

export DATABASE_URL='postgres://uptime:uptime@localhost:5432/uptime?sslmode=disable'
export REDIS_URL='redis://localhost:6379/0'
export UPTIME_BOOTSTRAP_API_KEY='dev_admin_key'

make migrate # runs GORM-managed schema migration
go run ./cmd/api
go run ./cmd/worker

Run the console dev server:

make console-dev

The Vite server proxies API requests to http://localhost:8008.

Environment

Variable Default Description
APP_ENV development Runtime environment (production enforces stricter defaults)
APP_PORT 8008 API port
METRICS_PORT 8009 Worker Prometheus metrics port
DATABASE_URL local Postgres PostgreSQL connection string (postgres:// or postgresql://)
REDIS_URL local Redis Redis connection string (redis:// or rediss://)
UPTIME_BOOTSTRAP_API_KEY dev_admin_key (dev only) Bootstrap bearer token. Required in production; must be ≥ 16 chars
ALLOW_PRIVATE_TARGETS false Allow localhost/private targets for checks/webhooks (forbidden in production)
CHECK_WORKER_COUNT 10 Worker goroutine count (1–1024)
DEFAULT_CHECK_TIMEOUT_SECONDS 10 Default check timeout (1–300)
SCHEDULER_TICK_SECONDS 5 How often the scheduler polls for due monitors (1–60)
LOG_LEVEL info debug, info, warn, or error
TLS_EXPIRY_WARN_DAYS 14 Days before expiry that TLS checks report degraded
WEBHOOK_SIGNING_SECRET empty If set, webhook bodies are HMAC-SHA256 signed in X-UpTime-Signature
WEBHOOK_TIMEOUT_SECONDS 10 Per-attempt webhook timeout
WEBHOOK_MAX_RETRIES 3 Additional webhook attempts after the first failure (0–10)
SHUTDOWN_TIMEOUT_SECONDS 15 Graceful shutdown deadline
API_READ_HEADER_TIMEOUT_SECONDS 5 API http.Server read header timeout
API_WRITE_TIMEOUT_SECONDS 30 API http.Server write timeout
MAX_REQUEST_BODY_BYTES 1048576 Maximum accepted request body size in bytes

API Examples

Health:

curl http://localhost:8008/health

Manual legacy check:

curl -X POST http://localhost:8008/ping-endpoint \
  -H "Content-Type: application/json" \
  -d '{"endpoint":"https://example.com"}'

Create a monitor:

curl -X POST http://localhost:8008/api/v1/monitors \
  -H "Authorization: Bearer dev_admin_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Example Website",
    "type": "http",
    "target": "https://example.com",
    "method": "GET",
    "expectedStatus": 200,
    "timeoutSeconds": 10,
    "intervalSeconds": 60,
    "failureThreshold": 3,
    "enabled": true
  }'

Run a monitor now:

curl -X POST http://localhost:8008/api/v1/monitors/00000000-0000-0000-0000-000000000101/check-now \
  -H "Authorization: Bearer dev_admin_key"

Create an API key:

curl -X POST http://localhost:8008/api/v1/api-keys \
  -H "Authorization: Bearer dev_admin_key" \
  -H "Content-Type: application/json" \
  -d '{"name":"local dev"}'

Scheduler And Worker

cmd/worker periodically loads enabled monitors from PostgreSQL. It schedules checks by intervalSeconds, skips monitors already in flight, and fans jobs out to a fixed goroutine pool. Each job uses context timeouts, stores a check result, updates monitor status, and applies incident rules.

Redis is part of the local stack and health reporting. The current worker uses local in-process scheduling; Redis-backed distributed locks/queues are a natural next step for multiple worker replicas.

Check Types

  • http: validates URL, blocks private targets by default, supports GET/HEAD, expected status, redirects disabled, body snippets, and timing breakdowns.
  • api: HTTP check with methods, headers, body, bearer/basic auth, and JSON assertion config.
  • keyword: HTTP check plus expected keyword matching.
  • tcp: checks host:port reachability with net.Dialer.
  • udp: sends a datagram payload and can validate an expected response snippet.
  • dns: resolves a hostname with Go's resolver.
  • tls: connects to a TLS endpoint and marks certificates near expiry as degraded.
  • domain: checks domain expiration through RDAP.
  • ping: TCP reachability ping for environments where raw ICMP is not available.
  • heartbeat: records inbound pings and opens incidents when check-ins are late or missing.
  • browser: submits a Playwright transaction job to the optional browser worker sidecar and records screenshots, console errors, network failures, and artifact references.

Incident Lifecycle

Checks are stored in check_results. A monitor opens an incident only after failureThreshold consecutive failures, any configured regional quorum is met, no parent dependency is already down, and the monitor is not flapping. A succeeding check resolves the active incident.

Incidents support open, acknowledged, investigating, identified, monitoring, and resolved states, plus severity (info, warning, minor, major, critical) and impact (none, degraded, partial_outage, full_outage). Timeline events capture state changes, check evidence, comments, escalation decisions, and recovery context with sensitive keys redacted.

Post-mortems can be attached to resolved incidents and exported as Markdown. Action items track owner, due date, and completion state.

Communication And Response

Status pages can collect confirmed subscribers, publish announcements, and auto-publish incident updates for affected components. Subscriber confirmation and unsubscribe links use hashed one-time tokens.

Remote/private agents are provisioned with scoped tokens. Agents call /api/v1/agent/jobs for assigned checks, submit results to /api/v1/agent/results, and heartbeat through /api/v1/agent/heartbeat.

On-call schedules rotate participants from a timezone-aware handoff time, support temporary overrides, and expose current/upcoming shift APIs. Escalation policies can route by monitor, service, tag, severity, and impact, and incident timelines record the selected policy.

Browser Transactions

Browser monitors use a Redis job contract so the Go worker remains lightweight. Run the optional Playwright sidecar with:

docker compose --profile browser up browser-worker

The sidecar executes saved scripts in an isolated Playwright context, captures failure screenshots, console errors, network failures, and emits artifact metadata. Artifact records include retention timestamps and authenticated download endpoints.

Observability

The first-party console uses API data directly: monitor state, check results, incident timelines, worker heartbeats, private agents, status announcements, and on-call schedules. GET /app is the default operational view.

GET /metrics exposes API metrics. The worker exposes metrics on :8009/metrics.

The external metrics stack is optional. Run make metrics-up when you want Prometheus and Grafana in addition to the built-in console.

Worker dashboard

A minimal job UI is served by the API at GET /workers. It polls GET /api/v1/workers/status every 2 seconds and shows, per worker instance: host, started/last-seen, active and queued jobs, in-flight monitor IDs, and the most recent 50 check results. Workers write their state into worker_heartbeats every 5 seconds, so the same view also reflects crashed or restarting instances (rows older than ~20 seconds are flagged stale).

The HTML page is unauthenticated; it prompts for an API key client-side and uses it as a Bearer token for the protected status XHR.

Security

  • /api/v1/* endpoints require Authorization: Bearer <key> or X-API-Key
  • Raw generated API keys are shown once; only SHA-256 hashes are stored
  • URLs and webhooks block localhost/private/link-local targets unless ALLOW_PRIVATE_TARGETS=true
  • Checks use context timeouts and bounded response snippets
  • Logs avoid raw API keys and webhook payload secrets

Testing

make test
make check

The test suite covers HTTP checker success, timeout, expected-status mismatch, SSRF blocking, TCP success/failure, DNS success/failure, TLS expiry classification, API key hashing, incident open/resolve rules, regional quorum, dependency suppression, flapping suppression, SMS payload construction, and on-call rotation math.

Roadmap

  • API/UI polish for the growing response workflows
  • Terraform, Helm, and CLI automation
  • OIDC SSO, RBAC hardening, encrypted secrets, and audit UI
  • OpenTelemetry export and long-term artifact storage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors