temporal-server: drop hardcoded postgres IP, survive HAProxy/patroni failover

## Where
- `nomad/jobs/infrastructure/temporal/files/temporal-env.tpl` - uses consul-template to read `service/munchbox-postgres/leader`, regex-extracts the leader's hostname from its patroni member-JSON `conn_url`, and feeds that to `POSTGRES_SEEDS`. Restarts temporal on every leader change (`change_mode = restart` on the template).

## Issue
Two layered problems:

1. **HAProxy is silently bypassed.** The env template sets `POSTGRES_PORT=5433` (HAProxy port). The `temporalio/server:1.29.1` config template reads `DB_PORT`, not `POSTGRES_PORT`, so the port falls through to the default `5432` and temporal connects straight to patroni. HAProxy's whole "detect role change + kill stale TCP + transparent reconnect" superpower (introduced in commit 7cb39eb for exactly this) is never engaged.

2. **The leader-tracking workaround is what bypasses it.** Because the temporal config template only takes a single `POSTGRES_SEEDS` value (no multi-host parsing), the operator can't point at `haproxy-postgres.service.consul` directly (multi-A). The current workaround manually picks one leader IP via the KV regex dance, then restarts on every patroni promotion. The restart adds 30-60s of downtime per failover where transparent reconnect via HAProxy would add 2-5s.

The visible failover symptom: temporal is glued to whichever IP rendered, regardless of patroni promotions. When patroni demotes that node to replica, writes fail; restart+reconnect cycle is required.

## Fix direction
Smallest viable:
- Rename `POSTGRES_PORT=5433` to `DB_PORT=5433` in `temporal-env.tpl` so temporal actually hits HAProxy on the leader's node.
- Once HAProxy is in the path, transparent patroni failover works on that side. The KV-leader extraction can be simplified or dropped depending on appetite (a static healthy HAProxy IP is enough if you trust HAProxy to route).

Full HA (separate follow-up):
- Add a keepalived VIP that follows whichever node has a healthy HAProxy alloc; point temporal at the VIP. Covers the case where the HAProxy host itself dies, not just patroni primary-flip.

## Related
#35 - prior temporal DNS cleanup, different scope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

temporal-server: drop hardcoded postgres IP, survive HAProxy/patroni failover #145

Where

Issue

Fix direction

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

temporal-server: drop hardcoded postgres IP, survive HAProxy/patroni failover #145

Description

Where

Issue

Fix direction

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions