Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .agents/rules/ssh-keys-register-once-across-scopes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Env-scoped SSH keys must register once across all scopes

`program.Run` builds one `registry.BuildRegistry` — and therefore one
`hetzner.HetznerCompute` — per realization scope (the region-less global slice plus one per
region), all sharing the same Pulumi `ctx`. Hetzner SSH keys are **account-global** and named with no
region slug (`wardnet-<env>-key-user` / `wardnet-<env>-key-deploy`, via `naming.GlobalResource`). If
each scope's compute provider registers them independently, Pulumi fails the whole preview/up with
`Duplicate resource URN '…hcloud:index/sshKey:SshKey::wardnet-<env>-key-user'`.

This is the sibling of `registry-provider-names-are-region-scoped.md`. There the fix was to make each
provider URN unique per scope (every scope legitimately needs its own provider). Here it is the
opposite: the SSH keys are a single env-scoped resource pair that must be created **exactly once
total**, not once per scope.

So the SSH key cache must be **shared across every `HetznerCompute` of one program run**:
`program.Run` creates one `registry.NewSSHKeyCache()` and threads it through both `BuildRegistry`
calls → `registry.Compute` → `hetzner.NewCompute`. `ensureSshKeys` dedups against that shared
`*hetzner.SSHKeyCache`, so the first scope to provision a server registers the keys and every other
scope reuses the same key objects.

### Owning provider must be scope-order-independent

Which scope wins the dedup race depends on realization order (global slice first when it provisions a
server, else the sorted-first region) — so the keys must **not** be registered under that winning
scope's region-scoped provider (`hcloud-<region>`). If they were, adding a region that sorts earlier,
removing the first region, or adding a compute-bearing global slice later would move the keys' provider
reference between runs, and Pulumi can replace an account-global resource every server depends on when
its provider changes. Instead `ensureSshKeys` registers them under a **dedicated, fixed-name** provider
(`hcloud-ssh-keys`, `SSHKeyCache.provider`), created once on the cache. Because the cache already
dedups key creation to a single caller, this provider is registered exactly once and never collides on
URN the way a per-scope `hcloud-<region>` provider would — it is the deliberate exception to
`registry-provider-names-are-region-scoped.md`, valid **only** because it is account-global and
single-registration.

This assumes **one Hetzner account per env**: the keys' env-scoped names carry no account dimension,
and the dedicated provider is created with a single token (the winning scope's). `regions.yaml` can
technically set a different `hetzner.apiToken` per region, but that is unsupported — the key would be
created in only one account and other-account servers would fail to reference it by name.

## Applies to

- `providers/hetzner/compute.go` — `SSHKeyCache` (incl. its dedicated `provider`), `NewSSHKeyCache`,
`HetznerCompute.sshKeys`, `ensureSshKeys`, `sshKeyProvider`. The keys carry no region-slug or
container label (`tags.HetznerLabels(..., "", "", ...)`) since they are env-scoped, so the single
shared key is labelled identically regardless of which scope creates it. Per-instance maps
(`firewalls`, `instanceCounters`) stay per-instance: firewalls are region-scoped (their names embed
the slug) so they never collide across scopes.
- `internal/registry/registry.go` — `BuildRegistry`'s `sshKeys` parameter and the `NewSSHKeyCache`
wrapper, threaded into `Compute` → `hetzner.NewCompute`.
- `program/program.go` — the single `registry.NewSSHKeyCache()` passed to every `BuildRegistry`.

When adding any new account-global (region-less, env-scoped) Hetzner resource, give it the same
treatment: a shared cache so it registers once, a fixed (non-region-scoped) name, and a
scope-order-independent owning provider. Do **not** give it a region-scoped name like a regional
resource.

## Why

A single-scope config (one region, or global-only) never exercised the collision — one
`HetznerCompute`, one registration. The first config to provision a server in **both** the global
slice and a region — or in two regions — registered the env-scoped SSH keys twice under one URN and
failed at preview before any resource was touched. This was hit by the first real cross-scope
deployment (`wardnet-infrastructure` PR #31).
19 changes: 17 additions & 2 deletions internal/registry/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ type registry struct {
dns *regions.DnsAuthority
ssh types.SSHConfig
regionTable regions.Table
// sshKeys is the cross-scope SSH key cache shared by every registry of one
// program run so env-scoped Hetzner SSH keys register under a single URN.
sshKeys *hetzner.SSHKeyCache

hetznerProviderOnce sync.Once
hetznerProvider *hcloud.Provider
Expand Down Expand Up @@ -76,8 +79,10 @@ type registry struct {
// to label cloud resources; eph carries the ADR-0028 ephemeral-env labels (zero
// value for a static env). ctx is stored and used lazily when provider objects
// are first constructed — it must be the context passed to the Pulumi program's
// run function.
func BuildRegistry(ctx *pulumi.Context, config map[string]map[string]any, dns *regions.DnsAuthority, ssh types.SSHConfig, regionTable regions.Table, project, env, region string, eph tags.Ephemeral) ProviderRegistry {
// run function. sshKeys is the cross-scope SSH key cache shared by every
// registry of one program run (see NewSSHKeyCache); pass nil for a standalone
// registry (e.g. tests) and hetzner.NewCompute allocates a private cache.
func BuildRegistry(ctx *pulumi.Context, config map[string]map[string]any, dns *regions.DnsAuthority, ssh types.SSHConfig, regionTable regions.Table, project, env, region string, eph tags.Ephemeral, sshKeys *hetzner.SSHKeyCache) ProviderRegistry {
slug, _ := regionTable.Slug(region) // already validated by loader
return &registry{
ctx: ctx,
Expand All @@ -90,9 +95,18 @@ func BuildRegistry(ctx *pulumi.Context, config map[string]map[string]any, dns *r
dns: dns,
ssh: ssh,
regionTable: regionTable,
sshKeys: sshKeys,
}
}

// NewSSHKeyCache returns the shared SSH key cache program.Run threads into every
// BuildRegistry call so env-scoped Hetzner SSH keys register exactly once across
// all realization scopes. It wraps hetzner.NewSSHKeyCache so callers need not
// import the provider package directly.
func NewSSHKeyCache() *hetzner.SSHKeyCache {
return hetzner.NewSSHKeyCache()
}

// hetznerProv lazily creates the shared hcloud.Provider for the Hetzner
// provider implementations. Subsequent calls return the cached instance.
func (r *registry) hetznerProv() *hcloud.Provider {
Expand Down Expand Up @@ -141,6 +155,7 @@ func (r *registry) Compute(name string) (types.ComputeProvider, error) {
r.slug,
r.eph,
realizations,
r.sshKeys,
)
})
return r.hetznerComp, nil
Expand Down
131 changes: 129 additions & 2 deletions internal/registry/registry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ func TestProviderResourceNamesAreScopePerRegistry(t *testing.T) {
// Mirror program.Run: a global-slice registry and a regional registry over
// the same context. Distinct region strings → distinct scopes.
for _, region := range []string{"global", "us-east-1"} {
reg := BuildRegistry(ctx, config, &regions.DnsAuthority{Zone: "z"}, types.SSHConfig{}, regions.Table{}, "proj", "prd", region, tags.Ephemeral{})
reg := BuildRegistry(ctx, config, &regions.DnsAuthority{Zone: "z"}, types.SSHConfig{}, regions.Table{}, "proj", "prd", region, tags.Ephemeral{}, nil)
// Compute("hetzner") forces hcloud.NewProvider; DNS("cloudflare") forces
// cf.NewProvider. Both are the singleton provider resources that collide.
if _, cerr := reg.Compute("hetzner"); cerr != nil {
Expand All @@ -90,6 +90,133 @@ func TestProviderResourceNamesAreScopePerRegistry(t *testing.T) {
}
}

// sshKeyMocks records, for every SSH key resource registered, its name and the
// provider reference it was registered under (args.Provider, whose URN embeds the
// provider resource name). NewResource also returns inline the IDs/outputs the
// compute Create path needs (numeric firewall ID, server IPv4), but those are not
// retained.
type sshKeyMocks struct {
mu sync.Mutex
sshNames []string
sshProviders []string
}

func (m *sshKeyMocks) Call(pulumi.MockCallArgs) (resource.PropertyMap, error) {
return resource.PropertyMap{}, nil
}

func (m *sshKeyMocks) NewResource(args pulumi.MockResourceArgs) (string, resource.PropertyMap, error) {
switch args.TypeToken {
case "hcloud:index/sshKey:SshKey":
m.mu.Lock()
m.sshNames = append(m.sshNames, args.Name)
m.sshProviders = append(m.sshProviders, args.Provider)
m.mu.Unlock()
return args.Name + "-id", resource.PropertyMap{
"name": resource.NewStringProperty(args.Name),
"publicKey": resource.NewStringProperty("ssh-ed25519 AAAA test"),
}, nil
case "hcloud:index/firewall:Firewall":
// Firewall ID is parsed as an int in compute.Create.
return "42", resource.PropertyMap{"name": resource.NewStringProperty(args.Name)}, nil
case "hcloud:index/server:Server":
return args.Name + "-id", resource.PropertyMap{
"name": resource.NewStringProperty(args.Name),
"ipv4Address": resource.NewStringProperty("1.2.3.4"),
}, nil
}
return args.Name + "-id", resource.PropertyMap{"name": resource.NewStringProperty(args.Name)}, nil
}

// TestSshKeysRegisterOncePerEnvAcrossScopes guards the sibling of the
// provider-URN regression (TestProviderResourceNamesAreScopePerRegistry): Hetzner
// SSH keys are account-global and named with no region slug
// (`wardnet-<env>-key-{user,deploy}`, via naming.GlobalResource). program.Run
// builds one registry — and one HetznerCompute — per scope (the region-less global
// slice plus each region) over the same Pulumi context. If each scope's compute
// provider registered the keys independently, the second registration would reuse
// the URN and the real engine would fail with
// `Duplicate resource URN '…SshKey::wardnet-<env>-key-user'` — exactly the failure
// the first real cross-scope deployment hit.
//
// program.Run threads ONE shared SSH key cache into every BuildRegistry call. This
// drives two registries over one context through the real Compute().Create() path
// and asserts each env-scoped key is registered exactly once. (The mock monitor
// does not reject duplicate URNs, so we count registrations rather than rely on
// RunErr returning an error — same approach as the provider-name test above.)
//
// Reverting to a per-instance cache makes each scope register the keys again, so
// each name's count becomes 2 and this test fails.
func TestSshKeysRegisterOncePerEnvAcrossScopes(t *testing.T) {
config := map[string]map[string]any{
"hetzner": {
"apiToken": "t",
"location": "ash",
"network_zone": "us-east",
"serverTypes": map[string]any{"SMALL": "cx23"},
"images": map[string]any{"ubuntu-24.04": "ubuntu-24.04"},
},
}
// "global" is absent from the table → empty slug (region-less global slice),
// exactly as program.Run resolves it; "us-east-1" → "use1".
regionTable := regions.Table{"us-east-1": {Slug: "use1"}}
mocks := &sshKeyMocks{}

net := types.NetworkOutputs{
NetworkID: pulumi.String("99").ToStringOutput(),
SubnetID: pulumi.String("12345").ToStringOutput(),
}
spec := types.ComputeSpec{
Name: "bridge",
Kind: "vm",
Container: "vpc",
Provider: "hetzner",
Network: "vpc",
Size: "SMALL",
Image: "ubuntu-24.04",
InstanceCount: 1,
}

err := pulumi.RunErr(func(ctx *pulumi.Context) error {
// One shared cache, threaded into every registry exactly as program.Run does.
cache := NewSSHKeyCache()
for _, region := range []string{"global", "us-east-1"} {
reg := BuildRegistry(ctx, config, nil, types.SSHConfig{}, regionTable, "proj", "prd", region, tags.Ephemeral{}, cache)
comp, cerr := reg.Compute("hetzner")
if cerr != nil {
return cerr
}
// abstractRegion == the scope key, so ExtractRegionConfigs/ResolveRegion
// resolve the realization under that key. Distinct slugs keep the server
// and firewall names unique; only the env-scoped SSH keys collide.
if _, err := comp.Create(ctx, spec, net, "prd", region, "bridge.example.com", "", types.FirewallPorts{}); err != nil {
return err
}
}
return nil
}, pulumi.WithMocks("proj", "stack", mocks))
require.NoError(t, err)

counts := map[string]int{}
for _, n := range mocks.sshNames {
counts[n]++
}
assert.Equal(t, 1, counts["wardnet-prd-key-user"],
"user SSH key must register once across scopes, got %v (duplicate-URN regression)", mocks.sshNames)
assert.Equal(t, 1, counts["wardnet-prd-key-deploy"],
"deploy SSH key must register once across scopes, got %v (duplicate-URN regression)", mocks.sshNames)

// Each key must be registered under the dedicated, scope-independent provider
// (hcloud-ssh-keys) — NOT the winning scope's region-scoped provider
// (hcloud-global / hcloud-us-east-1). Pinning to a fixed provider name keeps
// the account-global key's owning provider stable when the scope set or
// realization order changes, avoiding a provider-change replacement.
for _, p := range mocks.sshProviders {
assert.Contains(t, p, "hcloud-ssh-keys",
"SSH keys must register under the dedicated hcloud-ssh-keys provider, got %q", p)
}
}

func dedupe(in []string) []string {
seen := map[string]struct{}{}
var out []string
Expand All @@ -105,7 +232,7 @@ func dedupe(in []string) []string {
func TestRegistryUnknownProvider(t *testing.T) {
// nil ctx + nil regionTable: providers are built lazily, so construction is
// not triggered during this test. Only the "unknown provider" paths are hit.
r := BuildRegistry(nil, map[string]map[string]any{"hetzner": {"apiToken": "x"}}, nil, types.SSHConfig{}, nil, "test-project", "test", "us-east-1", tags.Ephemeral{})
r := BuildRegistry(nil, map[string]map[string]any{"hetzner": {"apiToken": "x"}}, nil, types.SSHConfig{}, nil, "test-project", "test", "us-east-1", tags.Ephemeral{}, nil)

// "hetzner" is now a known network provider — must succeed.
np, err := r.Network("hetzner")
Expand Down
2 changes: 1 addition & 1 deletion program/gate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ func TestTLSAndServiceShareOneGate(t *testing.T) {
err := pulumi.RunErr(func(ctx *pulumi.Context) error {
reg := registry.BuildRegistry(ctx,
map[string]map[string]any{"hetzner": {"apiToken": "x"}},
nil, types.SSHConfig{DeployPrivateKey: "priv"}, nil, "proj", "prd", "us-east-1", tags.Ephemeral{})
nil, types.SSHConfig{DeployPrivateKey: "priv"}, nil, "proj", "prd", "us-east-1", tags.Ephemeral{}, nil)
res := types.Resources{
Compute: []types.ComputeSpec{
{Name: "bridge", Provider: "hetzner", InstanceCount: 1, DeployUser: &types.DeployUserSpec{Name: "deploy"}},
Expand Down
4 changes: 2 additions & 2 deletions program/global_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ func TestCreateInfraGlobalScope(t *testing.T) {
},
"neon": {"apiKey": "k", "region": "aws-us-east-2"},
}
reg := registry.BuildRegistry(ctx, config, nil, types.SSHConfig{}, regions.Table{}, "proj", "prd", globalScope, tags.Ephemeral{})
reg := registry.BuildRegistry(ctx, config, nil, types.SSHConfig{}, regions.Table{}, "proj", "prd", globalScope, tags.Ephemeral{}, nil)

res := types.Resources{
Network: []types.NetworkSpec{{
Expand Down Expand Up @@ -174,7 +174,7 @@ func TestGlobalServiceRealizesRegionLess(t *testing.T) {
reg := registry.BuildRegistry(ctx,
map[string]map[string]any{"hetzner": {"apiToken": "x"}},
&regions.DnsAuthority{Provider: "cloudflare", Zone: "z"},
types.SSHConfig{DeployPrivateKey: "priv"}, nil, "proj", "prd", globalScope, tags.Ephemeral{})
types.SSHConfig{DeployPrivateKey: "priv"}, nil, "proj", "prd", globalScope, tags.Ephemeral{}, nil)
res := types.Resources{
Compute: []types.ComputeSpec{
{Name: "tenants", Provider: "hetzner", InstanceCount: 1, DeployUser: &types.DeployUserSpec{Name: "deploy"}},
Expand Down
10 changes: 8 additions & 2 deletions program/program.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,9 +155,15 @@ func Run(ctx *pulumi.Context) error {
}
ctx.Export("appDeployDescriptor", pulumi.Any(appDesc))

// Env-scoped Hetzner SSH keys (wardnet-<env>-key-{user,deploy}) carry no
// region slug, so every scope's compute provider would register the same URN.
// Share one cache across all registries this run builds so the keys register
// exactly once (see .agents/rules/ssh-keys-register-once-across-scopes.md).
sshKeyCache := registry.NewSSHKeyCache()

registries := make(map[string]registry.ProviderRegistry, len(regionNames))
for _, region := range regionNames {
registries[region] = registry.BuildRegistry(ctx, regionTable[region].Providers, regionTable[region].Dns, vars.SSH, regionTable, ctx.Project(), env, region, eph)
registries[region] = registry.BuildRegistry(ctx, regionTable[region].Providers, regionTable[region].Dns, vars.SSH, regionTable, ctx.Project(), env, region, eph, sshKeyCache)
}

// networkOutputs: region → specName+"/"+subnetName → NetworkOutputs. The
Expand Down Expand Up @@ -202,7 +208,7 @@ func Run(ctx *pulumi.Context) error {
if authority == nil && len(derivedRecords(globalRes, env, "", vars.BaseDomain, ephemeralSlug)) > 0 {
return fmt.Errorf("global slice realizes DNS records but its placementRegion %q has no dns: authority — declare a dns: block on that region in regions.yaml", globalBlock.PlacementRegion)
}
globalReg := registry.BuildRegistry(ctx, globalBlock.Providers, authority, vars.SSH, regionTable, ctx.Project(), env, globalScope, eph)
globalReg := registry.BuildRegistry(ctx, globalBlock.Providers, authority, vars.SSH, regionTable, ctx.Project(), env, globalScope, eph, sshKeyCache)
scopes = append(scopes, scope{key: globalScope, slug: "", reg: globalReg, authority: authority, res: globalRes})
}
for _, region := range regionNames {
Expand Down
Loading