Skip to content

cloud-cluster: clean Headscale nodes on deletion #43

Merged
samcday merged 2 commits into
mainfrom
tailnet-node-cleaner
May 7, 2026
Merged

cloud-cluster: clean Headscale nodes on deletion #43
samcday merged 2 commits into
mainfrom
tailnet-node-cleaner

Conversation

@samcday

@samcday samcday commented May 7, 2026

Copy link
Copy Markdown
Owner

No description provided.

@samcday

samcday commented May 7, 2026

Copy link
Copy Markdown
Owner Author

@claude review

@claude

claude Bot commented May 7, 2026

Copy link
Copy Markdown

Claude finished @samcday's task in 2m 27s —— View job


PR Review: cloud-cluster: clean Headscale nodes on deletion

  • Gather context and read changed files
  • Review Dockerfile
  • Review Kubernetes manifest (headscale-node-cleaner.yaml)
  • Review kustomization and Flux image config changes
  • Review CI workflow changes
  • Post final review

Overall: This is a solid implementation of a finalizer-based cleanup pattern for Headscale nodes. The shell script logic is well-structured, the RBAC is appropriately scoped, and the shell-operator approach fits cleanly into the existing patterns in the repo. A few things worth looking at below.


Dockerfile

apps/headscale-node-cleaner/Dockerfile

The multi-stage build copying the headscale binary into the shell-operator image is clean. One minor point: binaries in container images from ko-app/ are already executable, so the RUN chmod +x is a no-op. Harmless, but could be dropped.


headscale-node-cleaner.yaml — Shell Script

plaintext: true against port 50443

The headscale CLI config sets plaintext: true while the target address is headscale-generic.headscale.svc.cluster.hub.internal:50443. Port 50443 is typically Headscale's gRPC+TLS port. With plaintext: true, the CLI will attempt an unencrypted gRPC connection and will likely fail if the server expects TLS. If TLS is terminated at a mesh/proxy layer before hitting the gRPC listener, this makes sense — but it's worth a sanity-check. Fix this →

Extra Headscale API call when no nodes found

In cleanup_headscale, when ids is empty the function logs a message, skips the for loop, then still calls matching_headscale_ids a second time to populate remaining. Since no deletions ran, this verification call is unnecessary. A short-circuit after the echo would avoid the extra round-trip:

if [[ ${#ids[@]} -eq 0 ]]; then
  echo "node $node_name: no matching Headscale node found"
  return 0
fi

server_url: http://127.0.0.1 in the CLI config

This field is required by the headscale config schema but unused in CLI mode. The current value is fine as a placeholder but http://127.0.0.1 might raise eyebrows — http://localhost or a comment would make the intent clearer (though functionally identical).

noise.private_key_path: /tmp/headscale/noise_private.key

The /tmp volume is an emptyDir, so this key is regenerated on every pod restart. Headscale CLI operations should work fine with an ephemeral noise key (it's used for the encrypted channel with the server, not for node identity), but if the Headscale server version is strict about key continuity this could be worth verifying.

RBAC looks correct. The ClusterRole scopes to resourceNames: [admin-apikey], and hub_kubectl uses env -u KUBECONFIG kubectl to fall back to in-cluster config — a clean pattern for the two-cluster setup.


Kubernetes Deployment

The Recreate strategy is the right call — shell-operator should not run as multiple replicas. Resources look reasonable (requests 20m/128Mi, limit 256Mi memory, no CPU limit).

defaultMode: 493 (= 0755) on the hooks ConfigMap is required for shell-operator to execute the hook. defaultMode: 256 (= 0400) on the kubeconfig secret is appropriately restrictive.


Flux / Image automation

The additions to image-repositories.yaml and image-policies.yaml follow the existing pattern exactly. The CI matrix entry in images.yaml is correct. No issues here.


Summary

Blocker Verify plaintext: true vs TLS on port 50443
Minor Short-circuit cleanup_headscale when ids is empty
Nit chmod +x in Dockerfile is redundant

@samcday samcday merged commit f476f3c into main May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant