fix: stackUpgradeHandler constructs OCI image ref from talosVersion#42
Merged
Conversation
talosImage was being set to the raw version string from the UpgradePolicy (e.g., "v1.12.7") and passed directly to TalosClient.Upgrade, which then tried to pull "docker.io/library/v1.12.7:latest". talosUpgradeHandler correctly builds "ghcr.io/siderolabs/installer:<version>"; stack handler now follows the same pattern. Rename talosImage to talosVersion when reading from the UpgradePolicy, then compute talosImage := "ghcr.io/siderolabs/installer:" + talosVersion. Discovered during live ccs-dev stack upgrade (session/25d).
Stage=true left both Talos and kubelet changes sitting on disk indefinitely; nodes required manual reboots to apply them. New behaviour mirrors talosUpgradeHandler: per node, stage the kubelet image (staged mode so it co-applies on the Talos reboot), then trigger Talos upgrade with stage=false (immediate reboot), then wait for recovery before moving to the next node. Drop talosconfig-path node enumeration in favour of TalosClient.Nodes() (same source; cleaner and already tested via the stub). Require at least one node (validation failure otherwise). Tests: rename TestStackUpgrade_RunsBothUpgradeSteps to two tests -- TestStackUpgrade_NoNodesReturnsValidationFailure TestStackUpgrade_RollingUpgrade_AllNodes (verifies per-node loop, upgradeCallCount == node count, all ApplyConfiguration calls use staged mode)
…stry ghcr.io is not accessible from lab nodes. The docker.io registry mirror (docker.io → 10.20.0.1:5000) is the only configured mirror. Using docker.io/ image references allows Talos to resolve installer and kubelet images through the local registry mirror during node upgrades. Affects talos-upgrade, kube-upgrade, and stack-upgrade capabilities.
All imports of github.com/ontai-dev/conductor/pkg/runnerlib updated to github.com/ontai-dev/conductor-sdk/runnerlib across 37 files. Internal pkg/runnerlib deleted. go.mod updated with replace directive pointing to ../conductor-sdk and require entry. go mod tidy completed. All unit tests pass: go build ./... and go test ./test/unit/... green before deletion.
…patcher types Update all GVR references, scheme registrations, and import paths in conductor to consume the migrated dispatcher types from wrapper/api/seam: PackDelivery (was InfrastructureClusterPack), PackExecution, PackInstalled (was InfrastructurePackInstance), PackReceipt, PackLog (was PackOperationResult). packDeliveryRef field replaces clusterPackRef in pack_receipt_drift_loop.go and all associated tests. compileLaunchBundle now embeds wrapper CRDs via wrappercrd.FS so agents receive the seam.ontai.dev CRD bundle at startup.
Updates all dynamic-client GVR references from infrastructure.ontai.dev/ infrastructuretalosclusters to seam.ontai.dev/talosclusters. Updates kind strings from InfrastructureTalosCluster to TalosCluster. Updates pack execution GVR to seam.ontai.dev/packexecutions. All tests updated to match.
…day-2 operation records
Replace seam-core -> seam and wrapper -> dispatcher in go.mod replace/require. Update all Go import paths accordingly. Add seam-sdk replace + require. Update conductor RunnerConfigSpec references and compile_launch.go/test assertions for post-MIGRATION-3.8 CRD names (lineagerecords, runnerconfigs under seam.ontai.dev).
…tories Replace ../seam-core with ../seam and ../wrapper with ../dispatcher following the seam-core -> seam and wrapper -> dispatcher filesystem renames. Module paths were already updated in Phase 4.
…onductor Update all guardian.ontai.dev API group references in conductor: - compile_enable.go, compile_launch.go: enable bundle apiVersion strings, webhook names - catalog.go and all 5 catalog YAML entries: apiVersion strings in rendered RBACProfiles - capability/guardian.go, adapters.go: GVR Group fields for snapshot/profile/policy - agent pull loops (rbacpolicy, rbacprofile, receipt, signing): GVR Group fields - All unit, integration, and e2e test fixtures: GVR/GVK Group strings and apiVersion values
…ductor-sdk - Dockerfile.compiler/execute/agent: seam-core/ -> seam/, wrapper/ -> dispatcher/ - Add COPY conductor-sdk/ and seam-sdk/ to all three builder stages - cmd/conductor/main.go: fix stale "seam-core scheme" panic message to "seam" - docs/conductor-schema.md: update InfrastructureRunnerConfig -> RunnerConfig, infrastructure.ontai.dev -> seam.ontai.dev throughout Steps 6.1, 6.3, 6.4 were already complete (single binary entrypoint at cmd/conductor/, single build target, go.mod already imports conductor-sdk).
Fresh documentation from current codebase. runner.ontai.dev claim removed (conductor owns no API group). pkg/runnerlib replaced with conductor-sdk reference. seam-core replaced with seam. All three image modes documented accurately. Capability table rebuilt from conductor-sdk/runnerlib/constants.go.
…am-sdk/conductor-sdk); fix integration test GVR and CRD for RunnerConfig post-migration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stackUpgradeHandlerwas passing the raw version string (e.g.,"v1.12.7") directly toTalosClient.Upgradeas the image referencedocker.io/library/v1.12.7:latest, which does not existtalosUpgradeHandlercorrectly buildsghcr.io/siderolabs/installer:<version>; this fix applies the same pattern tostackUpgradeHandlertalosImagetotalosVersionat the point of reading the UpgradePolicy, thentalosImageis computed from itRoot cause
Both
talosUpgradeHandlerandstackUpgradeHandlerread a target version string from the UpgradePolicy CR. Thetalos-upgradehandler always appended the version to the installer image base; thestack-upgradehandler used the variable namedtalosImageand read directly from the CR into it without constructing the full OCI reference. A naming accident allowed the bug to hide in plain sight.Test plan
stackUpgradeHandlerunit tests pass (go test ./internal/capability/...)status=Succeeded, TCOR recordsSucceeded, UpgradePolicy reachedReady=True