feat: machineconfig-backup capability + PKI kubeconfig fix (session/24)#40
Merged
Conversation
…rd, tenant machineConfigPaths, lineage status
Governor directive (session/21): CODEBASE.md eliminated from all repos. The graphify knowledge graph at ~/ontai/graphify-out/graph.json is the sole authoritative source for codebase understanding. See root CONTEXT.md and CLAUDE.md for the Graphify Source of Truth Protocol.
…tarts Two bugs in hardeningApplyHandler that could destroy cluster nodes or take down the VIP: 1. VIP filtering in EndpointsFromTalosconfig (adapters.go) Adds clusterEndpoint field to talosConfigCtx. When set, the VIP is excluded from the endpoint fallback list before returning. Without this, the VIP address was included in the per-node iteration, causing GetMachineConfig to read from the VIP-holding node and ApplyConfiguration to apply only to that node -- silently skipping all other control-plane nodes. If the talosconfig contains only the VIP after filtering, an error is returned rather than an empty list that would silently skip all nodes. 2. Stabilization wait between nodes (platform_security.go) After applying machineconfig patches to a node, waitForNodeStable polls Health() until the node is responsive before proceeding to the next node. No-reboot applies can briefly restart kubelet or other services. Without the wait, sequential rapid application across all control-plane nodes can produce overlapping restarts, losing etcd quorum and taking down the VIP. The wait is skipped after the last node. New tests: TestEndpointsFromTalosconfig_ClusterEndpointFiltered, TestEndpointsFromTalosconfig_ClusterEndpointOnlyReturnsError, TestHardeningApply_StabilizationWaitBetweenNodes.
Execute mode dispatches via Resolve; agent mode uses RegisteredNames for the capability manifest only and never calls Execute. One registry keeps the manifest and implementation set in sync by construction.
Replace /tmp/envtest-bins/1.35.0 (ephemeral, stale version) with the canonical ontai root Makefile target: make envtest-setup && export KUBEBUILDER_ASSETS=$(make -s envtest-path). Pinned to K8s 1.32.x.
…session/24)
- machineconfig-backup named capability: iterates all cluster nodes via
EndpointsFromTalosconfig + NodeContext, reads GetMachineConfig per node,
uploads to S3 at {cluster}/machineconfigs/{TIMESTAMP}/{hostname}.yaml.
Hostname extracted from config YAML; sanitized node IP as fallback.
5 unit tests cover nil clients, missing CR, success, upload failure, no-hostname.
CapabilityMachineConfigBackup constant added to runnerlib.
- PKI rotation kubeconfig: removed secondary target-cluster-kubeconfig write
from pkiRotateHandler (upsertKubeconfigSecret now writes only
seam-mc-{cluster}-kubeconfig per PLATFORM-BL-KUBECONFIG-CANONICAL).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
machineconfig-backupnamed capability inplatform_machineconfig.go. Iterates nodes viaEndpointsFromTalosconfig+NodeContext, callsGetMachineConfigper node, uploads tos3://{bucket}/{cluster}/machineconfigs/{TIMESTAMP}/{hostname}.yaml. Hostname extracted from config YAML with sanitized-IP fallback.CapabilityMachineConfigBackup = "machineconfig-backup"added torunnerlib/constants.go. Registered instubs.go.pkiRotateHandlernow writes onlyseam-mc-{cluster}-kubeconfig; removed secondarytarget-cluster-kubeconfigwrite.Test plan
go test ./...-- all tests greenTalosMachineConfigBackuplive: Conductor executor Job completes and uploads per-node YAML to S3 (requires ccs-mgmt recovery)