feat: add EtcdBackup CR for on-demand snapshots to object storage (S3/GCS, pluggable)#386
feat: add EtcdBackup CR for on-demand snapshots to object storage (S3/GCS, pluggable)#386xrl wants to merge 9 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @xrl. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Implements the roadmap item "Create on-demand backup of a cluster" via a
new namespaced EtcdBackup CRD that snapshots a referenced EtcdCluster and
uploads the snapshot to an object store.
Architecture:
- EtcdBackup CRD (operator.etcd.io/v1alpha1): destination {s3|gcs} with
bucket/prefix/region/secretRef, optional retention; status reports phase,
snapshotLocation, size, completionTime and a Succeeded condition.
- A dedicated controller execs `etcdctl snapshot save` in a member pod and
streams it through an io.Pipe into the uploader (bounded memory).
- The cloud SDKs (aws-sdk-go-v2, cloud.google.com/go/storage) are isolated
behind pkg/objectstore.Store with a Factory registry; S3 (incl.
S3-compatible endpoints) and GCS are implemented, others register
themselves. The controller imports no cloud SDK directly, so the code is
liftable into a standalone cmd/backup-manager later if desired.
- Credentials default to ambient (IRSA / Workload Identity); secretRef is
optional.
Testing:
- Unit tests for the provider interface (mocked stores, no cloud creds),
the snapshot/upload orchestration, credential resolution, retention, and
CRD-to-destination mapping.
- A MinIO-backed e2e (compile-only, gated by ETCD_E2E_BACKUP) plus sample CRs.
Refs etcd-io#385
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Lange <xrlange@gmail.com>
…rt deterministically Two retention data-loss bugs in the S3 and GCS List implementations: - The list prefix was a raw string prefix, so listing cluster "etcd-a"'s snapshots also matched (and retention then deleted) objects under "etcd-a-2" and any other cluster whose name has "etcd-a" as a string prefix. Append a trailing "/" after JoinKey (which trims trailing slashes) so the prefix matches a directory boundary per cluster. - sort.Slice is unstable and keyed only on LastModified, whose S3 granularity is one second. On a modtime tie the just-uploaded snapshot could be ordered into the objs[RetainCount:] deletion set. Use sort.SliceStable with a descending-key tiebreaker; keys embed a sortable UTC timestamp so the newest is deterministically first. Add regression tests: an S3 list across etcd-a/etcd-a-2 proving the boundary, an all-equal-modtime list proving the stable order, and a full Reconcile with RetainCount=1 proving the just-uploaded snapshot survives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
The in-pod snapshot command redirected etcdctl's output with
">/dev/null 2>&1", discarding all diagnostics. On failure the carefully
captured exec stderr buffer was empty, so every failure (TLS, missing
endpoint, etc.) collapsed to the opaque "empty snapshot produced
(stderr: )". Redirect etcdctl's own output to fd 2 ("1>&2") instead:
etcdctl writes the snapshot to the temp file, not stdout, so only its
progress/error lines move to stderr where they are captured, while the
cat'd snapshot bytes still flow to stdout.
Also correct two misleading comments: the snapshotter doc claimed
"snapshot save /dev/stdout" and "never lands on the operator's disk",
and the controller claimed the snapshot "never lands on the operator's
disk" — true for GCS streaming but false for S3, which spools the stream
to an ephemeral temp file in the operator pod (PutObject needs a known
length).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Lange <xrlange@gmail.com>
EtcdBackupStatus.SnapshotRevision was declared (and surfaced in the CRD) but never written by any code path, so operators always read 0 — which reads as "revision 0", not "unknown". Wiring it properly needs parsing `etcdctl snapshot status`, which is out of scope for this PR. Remove the always-zero v1alpha1 status field before it ships rather than maintain a misleading one; it can be reintroduced wired and tested later. Regenerated the CRD via `make manifests`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
…reds Completes the snapshot backup story with restore and a security pass over the whole backup+restore object-store path. EtcdRestore: - New CRD: restore a snapshot into a NEW/empty EtcdCluster, sourcing it either by a completed EtcdBackup ref or an explicit object-store location. Status phases (Pending/Downloading/Restoring/Completed/Failed), a Succeeded condition, and Events. - New Restorer seam (exec etcdctl snapshot restore into the genesis member, stamping the member identity to match the bootstrapping pod) mirrors the Snapshotter design so orchestration is unit-testable with no cloud/etcd. - Empty-target guarantee: creates the target cluster fresh (owned by the restore) or refuses to restore over a cluster with ready members. objectstore: - Add Download to the Store interface with streaming S3 (GetObject) and GCS (NewReader) impls; missing objects map to a new ErrNotFound sentinel. Security hardening (backup + restore share one seam): - resolveStoreCredentials reads creds ONLY from secretRef (or ambient identity), emits an audit log line with the secret NAME only, never logs or error-echoes credential values, and validates required keys. - validateDestination / validateRestoreSpec reject misconfiguration before any I/O. Tests: restore orchestration (mock store+restorer) covering both sources, empty-target rejection, not-found/download/restore errors and terminal no-op; a cred-never-logged guarantee test (funcr log capture); Download tests for s3/gcs/mem. Sample CRs, docs (restore + security + least-priv IAM), RBAC, and a compile-only restore e2e. Deferred (noted in docs): cron scheduling, encryption-at-rest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
The exec-based snapshotter shelled out with `sh -c "etcdctl snapshot save ...
| cat ..."` inside the member pod. The etcd images the operator deploys
(gcr.io/etcd-development/etcd:v3.6.x) are distroless and ship no shell, so the
backup snapshot exec failed at runtime with:
OCI runtime exec failed: exec: "sh": executable file not found in $PATH
i.e. EtcdBackup could never reach Completed against the operator's own
clusters. Caught by the object-storage e2e (ETCD_E2E_BACKUP=true).
Replace the exec snapshotter with a clientv3 Maintenance Snapshot() stream from
the operator process: it reaches the member over the client port the operator
can already address in-cluster and works regardless of what binaries the member
image carries. The Snapshotter interface is unchanged, so the fake and all
backup/restore unit tests are unaffected.
The restorer still execs `sh -c` (a deeper redesign is needed to write the
member data dir without a shell); that path remains gated/skipped in e2e and is
documented as a known gap.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Lange <xrlange@gmail.com>
Live-validated against kind (ETCD_E2E_BACKUP=true). Improvements:
Backup (now genuinely passes, ~43s):
- Independently verify the snapshot really landed in the bucket: `mc stat`
the object and assert its size == status.SnapshotSizeBytes, plus a minimum
-size floor. Previously only the controller's self-reported status fields
were checked, so a controller that set them without uploading would pass.
- Seed a 50-key content-addressable keyspace (single source of truth shared
with the restore proof) instead of one key.
Reliability / flakiness (standing rule: readiness watches must catch failure):
- Pod waits detect CrashLoopBackOff/ImagePullBackOff/Failed and surface pod
logs instead of spinning to the timeout with no diagnosis.
- Never call t.Fatalf inside a wait.For poll closure (racey from the poller
goroutine); return an error and fail from the calling goroutine.
- Fix the shared-singleton bucket pod collision ("pods mc-mkbucket already
exists"): name the bootstrap pod per-MinIO-instance and clear any stale one.
- Add feature.Teardown to both tests (MinIO, secret, clusters, mc pod) so the
shared namespace does not leak between runs.
- Pin MinIO/mc images to RELEASE tags instead of :latest.
- Tighten poll intervals (5s -> 2-3s).
Restore: the data-path is known-incomplete against distroless etcd images
(restorer execs `sh -c` with no shell; restores to /var/run/etcd not
/var/lib/etcd; no member restart). Gate the restore round-trip behind a second
ETCD_E2E_RESTORE=true flag and document the three product gaps, so the working
backup e2e stays green while the test remains an executable spec. When enabled
it now asserts the FULL keyspace round-trips with exact values, polls for it,
and adds a negative control (a never-written key must be absent) to rule out a
vacuous pass on an empty member.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Lange <xrlange@gmail.com>
…ackup e2e Enable hermetic, credential-free GCS testing and add a comprehensive dual-provider object-storage backup e2e that proves a real, intact etcd snapshot lands in each backend. GCS emulator endpoint support (mirrors the existing S3 endpoint seam): - api: add GCSDestinationSpec.Endpoint (+optional) so a destination can target a GCS-compatible emulator (fake-gcs-server / storage testbench). - objectstore: relax Destination.validate() to accept Endpoint for GCS while still rejecting the S3-only ForcePathStyle. - helpers: map GCS.Endpoint through toObjectStoreDestination. - gcs.go: when Endpoint is set, build the client with option.WithEndpoint + option.WithoutAuthentication (the documented unauthenticated-emulator recipe), leaving the real-Google credential paths untouched. - regenerated CRDs (controller-gen v0.21.0); deepcopy unchanged (scalar field). - unit: validate() now accepts a GCS emulator endpoint and still rejects forcePathStyle. Comprehensive backup e2e (test/e2e, gated by ETCD_E2E_BACKUP=true): - Factor the MinIO blueprint into a backupBackend seam and run the SAME full backup cycle against both providers: MinIO (S3) and an in-cluster fake-gcs-server (GCS), neither needing real cloud creds. - Seed a 53-key keyspace: 50 content-addressable keys (value=hash(key)) plus a ~100 KiB large value and a multibyte UTF-8 value, and a pre-backup provenance sentinel. - Prove the object is present-and-valid independently of controller status: stat the object directly on the backend (size == status), apply the 16 KiB real-snapshot floor, then download the bytes and run `etcdutl snapshot status`, asserting it is an intact snapshot whose totalKey == seeded count. - Adversarial safety guards (live-green; fire before/independent of the known-incomplete restore data-path): restore into a populated cluster is rejected and the existing data is preserved; restore from a missing object fails cleanly within a bounded deadline (no hang). A corrupt-snapshot rejection test is added but honestly gated behind ETCD_E2E_RESTORE, since it cannot be distinguished from the broken restorer until the restore data-path is fixed. - jsonInt64Field tolerates the GCS JSON API's quoted "size" string as well as mc's bare integer. Verified live on kind (ETCD_E2E_BACKUP=true), two independent runs: both provider cycles pass with totalKey=53, plus the populated-target and missing-object guards. go build/vet and unit+envtest are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
…und-trip) The previous execRestorer was dead-on-arrival three ways: it shelled into a distroless etcd image (no `sh`), restored to /var/run/etcd instead of the member's real /var/lib/etcd data dir, and exec'd a live member with no force-new-cluster semantics. Replace it with an init-container restore. Architecture: - EtcdRestore controller now stamps an `operator.etcd.io/restore-source` annotation (JSON: snapshot addressing + creds Secret *name* + operator image + a per-snapshot generation token) onto the target EtcdCluster, then watches the genesis member boot. Completed now truthfully means "a member booted from the restored data" (gated on StatefulSet ReadyReplicas>0), not "spec injected". Missing-object is verified eagerly (terminal, prompt failure); a wedged restore init-container (corrupt snapshot, bad creds) is detected and marked Failed. - The cluster controller injects, ONLY when that annotation is present, a single restore init-container running the operator image as `manager restore-localize`. A normal cluster's pod template is byte-identical (zero init-containers), so the existing cluster e2e is unaffected and no fleet-wide rolling restart is triggered. - restore-localize (shell-free, single image): ordinal-0 guard, per-generation idempotency marker (a pod restart never re-wipes-and-restores), downloads the snapshot via pkg/objectstore, then runs etcd's snapshot-restore library in-process into /var/lib/etcd matching the member identity etcd advertises. A corrupt snapshot fails the library's hash check -> exit non-zero (corrupt rejection now fails for the RIGHT reason). Creds reach this init-container only, via optional secretKeyRefs, never the long-running etcd container. - Dockerfile/Makefile build the whole ./cmd package (not just main.go) so the subcommand is linked; OPERATOR_IMAGE plumbed via flag/env + e2e patch. Validated live on kind (both providers), ungating the full round-trip: - S3 (MinIO) + GCS (fake-gcs-server): 50-key keyspace restored byte-identical, never-snapshotted key absent. - corrupt-snapshot, populated-target, missing-object guards green. go build/vet + unit/envtest (incl. cluster bring-up) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>
e0d4484 to
ff3ef84
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrl The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Rebased onto current main; converting to draft while #363 lands. |
Refs #385
What
Adds a new namespaced
EtcdBackupCRD that takes a point-in-time snapshot of a referencedEtcdClusterand uploads it to an object store. This implements the roadmap item "Create on-demand backup of a cluster" (docs/roadmap.md).Architecture decision (delegated) — and rationale
The task was to choose between a standalone backup operator and growing the core operator, with a recommendation to isolate the heavy cloud SDKs. I chose a dedicated controller + an isolated provider package, wired into the existing operator binary — not a second binary, for now.
Two seams keep the cloud SDKs and exec machinery out of the core reconcile path and make everything unit-testable without cloud creds or a live cluster:
Snapshotter(internal/controller/snapshotter.go) produces the snapshot byte stream; the default execsetcdctl snapshot savein a member pod via the exec subresource and streams stdout.pkg/objectstore.Store(interface.go+s3.go+gcs.go) is the minimalUpload/List/Deletesurface. AFactoryregistry dispatches on the provider name, so a new backend is a new file callingobjectstore.Register. The AWS and GCP SDKs are imported only from this package — the controller and API types pull in zero cloud SDK.The snapshot is streamed snapshotter→uploader through an
io.Pipefor backpressure/bounded memory.Why in-binary rather than
cmd/backup-manager: because the SDKs are already quarantined behind the interface and the controller is a self-contained file set, the code is trivially liftable into a separate binary later if the cloud-SDK CVE surface becomes a problem for the core operator. Doing it in-binary now keeps this one independently-mergeable PR with one Deployment / one RBAC set / one image (the standard kubebuilder multi-controller layout), and avoids paying the operational cost of a second binary before there's a reason to. The trade-off (core image gains the AWS/GCP SDK dependency tree) is documented and reversible. Seedocs/snapshot-backups.md.Credentials
secretRefis optional. Absent it, providers use their ambient chain (IRSA on AWS, Workload Identity on GCP) — the recommended production posture. Present, the keys areaccessKeyID/secretAccessKey(/sessionToken) for s3 andserviceAccountJSONfor gcs.Testing
pkg/objectstore):JoinKey/destination validation, factory dispatch + duplicate-register panic, and the S3 and GCS stores exercised against in-memory fakes of the narrowed SDK interfaces — no cloud credentials required.internal/controller): full reconcile happy-path, missing-cluster / no-ready-members / snapshot-error / upload-error →Failed, terminal no-op, snapshot timeout, credential resolution (s3/gcs/ambient/missing-secret), retention pruning, and CRD→destination mapping — all via a fakeSnapshotter+ fakeStore.test/e2e/backup_test.go): a MinIO-backed (S3-compatible, no real cloud) end-to-end test, compile-verified and gated behindETCD_E2E_BACKUP=trueso it doesn't run in the default kind matrix. All its helpers are local to the file to avoid colliding with other in-flight e2e PRs.config/samples/operator_v1alpha1_etcdbackup.yaml.go build ./...,go vet ./...,golangci-lint run, the objectstore + controller (envtest) suites, andgo test -c ./test/e2e/all pass locally;go mod tidyis a no-op.Notes for reviewers
operator.etcd.io_etcdclusters.yaml(v0.20.1), so that file shows a one-line version-annotation bump — a side effect of using the pinned tool, not a hand edit.new cluster from a backup) and scheduled backups (EtcdBackupSchedule) are intentionally out of scope; this CRD + provider interface are the foundation for both.🤖 Generated with Claude Code
PR series — operability fixes & TLS
Small single-purpose PRs from live kind-cluster testing of the operator. Each stands alone unless an After is listed. → = this PR.
events.k8s.ioRBAC so operator Events are actually recordedmembers[]/leaderIDfrom one health snapshot (consistent leader)altNames.ipAddressesinto certificatesvalidityDuration(365d,100d12h) as documentedDegradedcondition (was empty status)--max-concurrent-reconciles)spec.tls.{peer,client}surfaces (breaking alpha API)TLSReadycondition + TLS lifecycle EventsPeerCANotShared/metricsendpointEtcdBackupCR → object storage (S3/GCS)🟢 ready · ⚪ draft