feat(helm): opt-in PodDisruptionBudget and probes for CNPG plugin sidecars#384
Draft
WentingWu666666 wants to merge 1 commit into
Draft
Conversation
PodDisruptionBudget (M9 from documentdb#381) ================================== Add an optional PodDisruptionBudget for the operator, gated by podDisruptionBudget.enabled (default: false). Disabled by default because the operator currently ships with replicaCount: 1 and a PDB on a single-replica deployment blocks node drains rather than helping availability. Users running multi-replica with leader election should enable it. Plugin probes (M12 from documentdb#381) ============================= The sidecar-injector and wal-replica deployments are gRPC servers on port 9090 that previously had no probes pods were marked Ready as soon as the container started, regardless of whether the gRPC endpoint was actually serving. Add tcpSocket readiness + liveness probes on port 9090, gated by pluginProbes.enabled (default: true) with tunable initialDelaySeconds, periodSeconds, and failureThreshold. TCP socket probe is used because the plugins do not expose an HTTP health endpoint. The probe verifies the gRPC server is bound and accepting connections. Verified locally on kind: - helm template renders PDB only when enabled; renders probes by default. - helm upgrade applies PDB; second upgrade with --set ... enabled=false removes it as expected. - sidecar-injector pod becomes Ready (TCP probe passes against the actually-running gRPC server). - helm lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningeffort from diff stats (70+0 LOC, 4 files); LLM: Additive opt-in Helm chart improvements (PodDisruptionBudget and plugin probes) as part of GA-readiness audit, touching manifests/templates with no schema or cross-component changes. If a label is wrong, remove it manually and ping |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Two small, additive, opt-in improvements for pod availability.
PodDisruptionBudget (M9)
Adds
templates/11_pdb.yaml, gated bypodDisruptionBudget.enabled(default: false).Disabled by default on purpose: the operator currently runs at
replicaCount: 1. A PDB withminAvailable: 1(ormaxUnavailable: 0) on a single-replica deployment blocks node drains forever that's worse than no PDB. Users running multi-replica with leader election should setpodDisruptionBudget.enabled=true. When the operator chart gets multi-replica defaults in the future, this default can be flipped.Plugin probes (M12)
The sidecar-injector and wal-replica deployments are gRPC servers on port 9090 with no probes today pods are marked Ready as soon as the container starts, regardless of whether the gRPC endpoint is actually serving.
Adds tcpSocket readiness + liveness probes on port 9090, gated by
pluginProbes.enabled(default: true) with tunableinitialDelaySeconds/periodSeconds/failureThreshold.TCP socket probe is used because the plugins don't expose an HTTP health endpoint. It verifies the gRPC server has bound the port and is accepting connections better than nothing, not as strong as a real gRPC health check (deferred until the plugins implement
grpc.health.v1).Local verification
helm lintclean.helm templaterenders PDB only whenpodDisruptionBudget.enabled=true.helm templaterenders TCP probes by default; omits them withpluginProbes.enabled=false.helm upgradeon kind with--set podDisruptionBudget.enabled=truecreates the PDB; subsequent--set podDisruptionBudget.enabled=falseremoves it cleanly.helm upgrade(Helm default behavior; documented in PR for reviewer awareness).Tracking