feat: add EtcdBackup CR for on-demand snapshots to object storage (S3/GCS, pluggable) by xrl · Pull Request #386 · etcd-io/etcd-operator

xrl · 2026-06-18T23:38:03Z

What

Adds a new namespaced EtcdBackup CRD that takes a point-in-time snapshot of a referenced EtcdCluster and uploads it to an object store. This implements the roadmap item "Create on-demand backup of a cluster" (docs/roadmap.md).

apiVersion: operator.etcd.io/v1alpha1
kind: EtcdBackup
metadata:
  name: nightly
spec:
  clusterRef: my-cluster
  destination:
    provider: s3                 # s3 | gcs
    prefix: etcd/backups
    secretRef: { name: s3-creds } # optional; omit for IRSA / Workload Identity
    s3: { bucket: my-backups, region: us-east-1 }
  retention: { retainCount: 7 }
status:
  phase: Completed               # Pending|Snapshotting|Uploading|Completed|Failed
  snapshotLocation: s3://my-backups/etcd/backups/my-cluster/nightly-20260617T010203Z.db
  snapshotSizeBytes: 20480
  completionTime: "2026-06-17T01:02:05Z"

Architecture decision (delegated) — and rationale

The task was to choose between a standalone backup operator and growing the core operator, with a recommendation to isolate the heavy cloud SDKs. I chose a dedicated controller + an isolated provider package, wired into the existing operator binary — not a second binary, for now.

Two seams keep the cloud SDKs and exec machinery out of the core reconcile path and make everything unit-testable without cloud creds or a live cluster:

Snapshotter (internal/controller/snapshotter.go) produces the snapshot byte stream; the default execs etcdctl snapshot save in a member pod via the exec subresource and streams stdout.
pkg/objectstore.Store (interface.go + s3.go + gcs.go) is the minimal Upload/List/Delete surface. A Factory registry dispatches on the provider name, so a new backend is a new file calling objectstore.Register. The AWS and GCP SDKs are imported only from this package — the controller and API types pull in zero cloud SDK.

The snapshot is streamed snapshotter→uploader through an io.Pipe for backpressure/bounded memory.

Why in-binary rather than cmd/backup-manager: because the SDKs are already quarantined behind the interface and the controller is a self-contained file set, the code is trivially liftable into a separate binary later if the cloud-SDK CVE surface becomes a problem for the core operator. Doing it in-binary now keeps this one independently-mergeable PR with one Deployment / one RBAC set / one image (the standard kubebuilder multi-controller layout), and avoids paying the operational cost of a second binary before there's a reason to. The trade-off (core image gains the AWS/GCP SDK dependency tree) is documented and reversible. See docs/snapshot-backups.md.

Credentials

secretRef is optional. Absent it, providers use their ambient chain (IRSA on AWS, Workload Identity on GCP) — the recommended production posture. Present, the keys are accessKeyID/secretAccessKey(/sessionToken) for s3 and serviceAccountJSON for gcs.

Testing

Unit (pkg/objectstore): JoinKey/destination validation, factory dispatch + duplicate-register panic, and the S3 and GCS stores exercised against in-memory fakes of the narrowed SDK interfaces — no cloud credentials required.
Unit/envtest (internal/controller): full reconcile happy-path, missing-cluster / no-ready-members / snapshot-error / upload-error → Failed, terminal no-op, snapshot timeout, credential resolution (s3/gcs/ambient/missing-secret), retention pruning, and CRD→destination mapping — all via a fake Snapshotter + fake Store.
e2e (test/e2e/backup_test.go): a MinIO-backed (S3-compatible, no real cloud) end-to-end test, compile-verified and gated behind ETCD_E2E_BACKUP=true so it doesn't run in the default kind matrix. All its helpers are local to the file to avoid colliding with other in-flight e2e PRs.
Sample CRs in config/samples/operator_v1alpha1_etcdbackup.yaml.

go build ./..., go vet ./..., golangci-lint run, the objectstore + controller (envtest) suites, and go test -c ./test/e2e/ all pass locally; go mod tidy is a no-op.

Notes for reviewers

CRDs/deepcopy/RBAC were regenerated with the repo-pinned controller-gen. That pinned version (v0.21.0) is newer than the one that last generated operator.etcd.io_etcdclusters.yaml (v0.20.1), so that file shows a one-line version-annotation bump — a side effect of using the pinned tool, not a hand edit.
Restore (new cluster from a backup) and scheduled backups (EtcdBackupSchedule) are intentionally out of scope; this CRD + provider interface are the foundation for both.

🤖 Generated with Claude Code

PR series — operability fixes & TLS

Small single-purpose PRs from live kind-cluster testing of the operator. Each stands alone unless an After is listed. → = this PR.

	PR	Lands	After
	Reviewable now — small, any order
🟢	#374	Requeue instead of swallowing the client-cert provisioning error	—
🟢	#391	Grant `events.k8s.io` RBAC so operator Events are actually recorded	—
🟢	#392	Correlate `members[]`/`leaderID` from one health snapshot (consistent leader)	—
🟢	#393	Propagate user-supplied `altNames.ipAddresses` into certificates	—
🟢	#394	Accept day-suffix `validityDuration` (`365d`, `100d12h`) as documented	—
🟢	#395	Surface early reconcile errors as a `Degraded` condition (was empty status)	—
🟢	#379	Configurable reconcile worker pool (`--max-concurrent-reconciles`)	—
🟢	#369	kind-based stress/scale e2e harness (1/3/7 members, churn, quorum watcher)	—
	TLS stack — in order
🟢	#376	Independent `spec.tls.{peer,client}` surfaces (breaking alpha API)	—
⚪	#377	`TLSReady` condition + TLS lifecycle Events	#376
⚪	#378	Multi-member TLS quorum e2e + `PeerCANotShared`	#377
	Parked as drafts pending #363
⚪	#382	Per-cluster domain metrics on the operator `/metrics` endpoint	—
⚪	#384	EtcdCluster admission webhooks (consolidating with #328)	#363
⚪→	#386	`EtcdBackup` CR → object storage (S3/GCS)	#363
⚪	#387	Automatic quorum-loss disaster recovery (bootstrap-latch guarded)	#363

🟢 ready · ⚪ draft

k8s-ci-robot · 2026-06-18T23:38:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xrl
Once this PR has been reviewed and has the lgtm label, please assign jberkus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-06-18T23:38:14Z

Hi @xrl. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Implements the roadmap item "Create on-demand backup of a cluster" via a new namespaced EtcdBackup CRD that snapshots a referenced EtcdCluster and uploads the snapshot to an object store. Architecture: - EtcdBackup CRD (operator.etcd.io/v1alpha1): destination {s3|gcs} with bucket/prefix/region/secretRef, optional retention; status reports phase, snapshotLocation, size, completionTime and a Succeeded condition. - A dedicated controller execs `etcdctl snapshot save` in a member pod and streams it through an io.Pipe into the uploader (bounded memory). - The cloud SDKs (aws-sdk-go-v2, cloud.google.com/go/storage) are isolated behind pkg/objectstore.Store with a Factory registry; S3 (incl. S3-compatible endpoints) and GCS are implemented, others register themselves. The controller imports no cloud SDK directly, so the code is liftable into a standalone cmd/backup-manager later if desired. - Credentials default to ambient (IRSA / Workload Identity); secretRef is optional. Testing: - Unit tests for the provider interface (mocked stores, no cloud creds), the snapshot/upload orchestration, credential resolution, retention, and CRD-to-destination mapping. - A MinIO-backed e2e (compile-only, gated by ETCD_E2E_BACKUP) plus sample CRs. Refs etcd-io#385 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

…rt deterministically Two retention data-loss bugs in the S3 and GCS List implementations: - The list prefix was a raw string prefix, so listing cluster "etcd-a"'s snapshots also matched (and retention then deleted) objects under "etcd-a-2" and any other cluster whose name has "etcd-a" as a string prefix. Append a trailing "/" after JoinKey (which trims trailing slashes) so the prefix matches a directory boundary per cluster. - sort.Slice is unstable and keyed only on LastModified, whose S3 granularity is one second. On a modtime tie the just-uploaded snapshot could be ordered into the objs[RetainCount:] deletion set. Use sort.SliceStable with a descending-key tiebreaker; keys embed a sortable UTC timestamp so the newest is deterministically first. Add regression tests: an S3 list across etcd-a/etcd-a-2 proving the boundary, an all-equal-modtime list proving the stable order, and a full Reconcile with RetainCount=1 proving the just-uploaded snapshot survives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

The in-pod snapshot command redirected etcdctl's output with ">/dev/null 2>&1", discarding all diagnostics. On failure the carefully captured exec stderr buffer was empty, so every failure (TLS, missing endpoint, etc.) collapsed to the opaque "empty snapshot produced (stderr: )". Redirect etcdctl's own output to fd 2 ("1>&2") instead: etcdctl writes the snapshot to the temp file, not stdout, so only its progress/error lines move to stderr where they are captured, while the cat'd snapshot bytes still flow to stdout. Also correct two misleading comments: the snapshotter doc claimed "snapshot save /dev/stdout" and "never lands on the operator's disk", and the controller claimed the snapshot "never lands on the operator's disk" — true for GCS streaming but false for S3, which spools the stream to an ephemeral temp file in the operator pod (PutObject needs a known length). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

EtcdBackupStatus.SnapshotRevision was declared (and surfaced in the CRD) but never written by any code path, so operators always read 0 — which reads as "revision 0", not "unknown". Wiring it properly needs parsing `etcdctl snapshot status`, which is out of scope for this PR. Remove the always-zero v1alpha1 status field before it ships rather than maintain a misleading one; it can be reintroduced wired and tested later. Regenerated the CRD via `make manifests`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

…reds Completes the snapshot backup story with restore and a security pass over the whole backup+restore object-store path. EtcdRestore: - New CRD: restore a snapshot into a NEW/empty EtcdCluster, sourcing it either by a completed EtcdBackup ref or an explicit object-store location. Status phases (Pending/Downloading/Restoring/Completed/Failed), a Succeeded condition, and Events. - New Restorer seam (exec etcdctl snapshot restore into the genesis member, stamping the member identity to match the bootstrapping pod) mirrors the Snapshotter design so orchestration is unit-testable with no cloud/etcd. - Empty-target guarantee: creates the target cluster fresh (owned by the restore) or refuses to restore over a cluster with ready members. objectstore: - Add Download to the Store interface with streaming S3 (GetObject) and GCS (NewReader) impls; missing objects map to a new ErrNotFound sentinel. Security hardening (backup + restore share one seam): - resolveStoreCredentials reads creds ONLY from secretRef (or ambient identity), emits an audit log line with the secret NAME only, never logs or error-echoes credential values, and validates required keys. - validateDestination / validateRestoreSpec reject misconfiguration before any I/O. Tests: restore orchestration (mock store+restorer) covering both sources, empty-target rejection, not-found/download/restore errors and terminal no-op; a cred-never-logged guarantee test (funcr log capture); Download tests for s3/gcs/mem. Sample CRs, docs (restore + security + least-priv IAM), RBAC, and a compile-only restore e2e. Deferred (noted in docs): cron scheduling, encryption-at-rest. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

The exec-based snapshotter shelled out with `sh -c "etcdctl snapshot save ... | cat ..."` inside the member pod. The etcd images the operator deploys (gcr.io/etcd-development/etcd:v3.6.x) are distroless and ship no shell, so the backup snapshot exec failed at runtime with: OCI runtime exec failed: exec: "sh": executable file not found in $PATH i.e. EtcdBackup could never reach Completed against the operator's own clusters. Caught by the object-storage e2e (ETCD_E2E_BACKUP=true). Replace the exec snapshotter with a clientv3 Maintenance Snapshot() stream from the operator process: it reaches the member over the client port the operator can already address in-cluster and works regardless of what binaries the member image carries. The Snapshotter interface is unchanged, so the fake and all backup/restore unit tests are unaffected. The restorer still execs `sh -c` (a deeper redesign is needed to write the member data dir without a shell); that path remains gated/skipped in e2e and is documented as a known gap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

Live-validated against kind (ETCD_E2E_BACKUP=true). Improvements: Backup (now genuinely passes, ~43s): - Independently verify the snapshot really landed in the bucket: `mc stat` the object and assert its size == status.SnapshotSizeBytes, plus a minimum -size floor. Previously only the controller's self-reported status fields were checked, so a controller that set them without uploading would pass. - Seed a 50-key content-addressable keyspace (single source of truth shared with the restore proof) instead of one key. Reliability / flakiness (standing rule: readiness watches must catch failure): - Pod waits detect CrashLoopBackOff/ImagePullBackOff/Failed and surface pod logs instead of spinning to the timeout with no diagnosis. - Never call t.Fatalf inside a wait.For poll closure (racey from the poller goroutine); return an error and fail from the calling goroutine. - Fix the shared-singleton bucket pod collision ("pods mc-mkbucket already exists"): name the bootstrap pod per-MinIO-instance and clear any stale one. - Add feature.Teardown to both tests (MinIO, secret, clusters, mc pod) so the shared namespace does not leak between runs. - Pin MinIO/mc images to RELEASE tags instead of :latest. - Tighten poll intervals (5s -> 2-3s). Restore: the data-path is known-incomplete against distroless etcd images (restorer execs `sh -c` with no shell; restores to /var/run/etcd not /var/lib/etcd; no member restart). Gate the restore round-trip behind a second ETCD_E2E_RESTORE=true flag and document the three product gaps, so the working backup e2e stays green while the test remains an executable spec. When enabled it now asserts the FULL keyspace round-trips with exact values, polls for it, and adds a negative control (a never-written key must be absent) to rule out a vacuous pass on an empty member. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

…ackup e2e Enable hermetic, credential-free GCS testing and add a comprehensive dual-provider object-storage backup e2e that proves a real, intact etcd snapshot lands in each backend. GCS emulator endpoint support (mirrors the existing S3 endpoint seam): - api: add GCSDestinationSpec.Endpoint (+optional) so a destination can target a GCS-compatible emulator (fake-gcs-server / storage testbench). - objectstore: relax Destination.validate() to accept Endpoint for GCS while still rejecting the S3-only ForcePathStyle. - helpers: map GCS.Endpoint through toObjectStoreDestination. - gcs.go: when Endpoint is set, build the client with option.WithEndpoint + option.WithoutAuthentication (the documented unauthenticated-emulator recipe), leaving the real-Google credential paths untouched. - regenerated CRDs (controller-gen v0.21.0); deepcopy unchanged (scalar field). - unit: validate() now accepts a GCS emulator endpoint and still rejects forcePathStyle. Comprehensive backup e2e (test/e2e, gated by ETCD_E2E_BACKUP=true): - Factor the MinIO blueprint into a backupBackend seam and run the SAME full backup cycle against both providers: MinIO (S3) and an in-cluster fake-gcs-server (GCS), neither needing real cloud creds. - Seed a 53-key keyspace: 50 content-addressable keys (value=hash(key)) plus a ~100 KiB large value and a multibyte UTF-8 value, and a pre-backup provenance sentinel. - Prove the object is present-and-valid independently of controller status: stat the object directly on the backend (size == status), apply the 16 KiB real-snapshot floor, then download the bytes and run `etcdutl snapshot status`, asserting it is an intact snapshot whose totalKey == seeded count. - Adversarial safety guards (live-green; fire before/independent of the known-incomplete restore data-path): restore into a populated cluster is rejected and the existing data is preserved; restore from a missing object fails cleanly within a bounded deadline (no hang). A corrupt-snapshot rejection test is added but honestly gated behind ETCD_E2E_RESTORE, since it cannot be distinguished from the broken restorer until the restore data-path is fixed. - jsonInt64Field tolerates the GCS JSON API's quoted "size" string as well as mc's bare integer. Verified live on kind (ETCD_E2E_BACKUP=true), two independent runs: both provider cycles pass with totalKey=53, plus the populated-target and missing-object guards. go build/vet and unit+envtest are green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

…und-trip) The previous execRestorer was dead-on-arrival three ways: it shelled into a distroless etcd image (no `sh`), restored to /var/run/etcd instead of the member's real /var/lib/etcd data dir, and exec'd a live member with no force-new-cluster semantics. Replace it with an init-container restore. Architecture: - EtcdRestore controller now stamps an `operator.etcd.io/restore-source` annotation (JSON: snapshot addressing + creds Secret *name* + operator image + a per-snapshot generation token) onto the target EtcdCluster, then watches the genesis member boot. Completed now truthfully means "a member booted from the restored data" (gated on StatefulSet ReadyReplicas>0), not "spec injected". Missing-object is verified eagerly (terminal, prompt failure); a wedged restore init-container (corrupt snapshot, bad creds) is detected and marked Failed. - The cluster controller injects, ONLY when that annotation is present, a single restore init-container running the operator image as `manager restore-localize`. A normal cluster's pod template is byte-identical (zero init-containers), so the existing cluster e2e is unaffected and no fleet-wide rolling restart is triggered. - restore-localize (shell-free, single image): ordinal-0 guard, per-generation idempotency marker (a pod restart never re-wipes-and-restores), downloads the snapshot via pkg/objectstore, then runs etcd's snapshot-restore library in-process into /var/lib/etcd matching the member identity etcd advertises. A corrupt snapshot fails the library's hash check -> exit non-zero (corrupt rejection now fails for the RIGHT reason). Creds reach this init-container only, via optional secretKeyRefs, never the long-running etcd container. - Dockerfile/Makefile build the whole ./cmd package (not just main.go) so the subcommand is linked; OPERATOR_IMAGE plumbed via flag/env + e2e patch. Validated live on kind (both providers), ungating the full round-trip: - S3 (MinIO) + GCS (fake-gcs-server): 50-key keyspace restored byte-identical, never-snapshotted key absent. - corrupt-snapshot, populated-target, missing-object guards green. go build/vet + unit/envtest (incl. cluster bring-up) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Lange <xrlange@gmail.com>

kubernetes-prow · 2026-07-02T02:56:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xrl
Once this PR has been reviewed and has the lgtm label, please assign jberkus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xrl · 2026-07-02T02:56:12Z

Rebased onto current main; converting to draft while #363 lands.

k8s-ci-robot added needs-ok-to-test size/XXL labels Jun 18, 2026

xrl marked this pull request as draft July 2, 2026 02:25

kubernetes-prow Bot added the do-not-merge/work-in-progress label Jul 2, 2026

xrl and others added 9 commits July 1, 2026 22:27

xrl force-pushed the pr/etcd-backup-snapshot branch from e0d4484 to ff3ef84 Compare July 2, 2026 02:56

This was referenced Jul 3, 2026

feat: implement and register EtcdCluster admission webhooks #384

Draft

feat: automatic quorum-loss disaster recovery #387

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add EtcdBackup CR for on-demand snapshots to object storage (S3/GCS, pluggable)#386

feat: add EtcdBackup CR for on-demand snapshots to object storage (S3/GCS, pluggable)#386
xrl wants to merge 9 commits into
etcd-io:mainfrom
xrl:pr/etcd-backup-snapshot

xrl commented Jun 18, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jun 18, 2026

Uh oh!

k8s-ci-robot commented Jun 18, 2026

Uh oh!

kubernetes-prow Bot commented Jul 2, 2026

Uh oh!

xrl commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xrl commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Architecture decision (delegated) — and rationale

Credentials

Testing

Notes for reviewers

PR series — operability fixes & TLS

Uh oh!

k8s-ci-robot commented Jun 18, 2026

Uh oh!

k8s-ci-robot commented Jun 18, 2026

Uh oh!

kubernetes-prow Bot commented Jul 2, 2026

Uh oh!

xrl commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xrl commented Jun 18, 2026 •

edited

Loading