Skip to content

helm: allow overriding probe timeoutSeconds/failureThreshold#679

Open
ChenZiHong-Gavin wants to merge 1 commit into
NVIDIA:mainfrom
ChenZiHong-Gavin:helm-probe-timeout-failurethreshold
Open

helm: allow overriding probe timeoutSeconds/failureThreshold#679
ChenZiHong-Gavin wants to merge 1 commit into
NVIDIA:mainfrom
ChenZiHong-Gavin:helm-probe-timeout-failurethreshold

Conversation

@ChenZiHong-Gavin

Copy link
Copy Markdown

What

Expose timeoutSeconds and failureThreshold (plus readinessProbe.periodSeconds)
as optional values on the liveness and readiness probes in the DaemonSet chart.

Why

The chart currently only templatizes initialDelaySeconds and (for liveness)
periodSeconds. It never sets timeoutSeconds or failureThreshold, so the
kubelet defaults apply: timeoutSeconds: 1, failureThreshold: 3.

The liveness probe hits /health over HTTP. On busy nodes that endpoint can take
longer than 1s to respond; three consecutive >1s responses trip the default
failureThreshold: 3 and the kubelet kills an otherwise-healthy container
(graceful exit 0, not OOM), which then restarts and can repeat — a false-kill /
crashloop. Today there is no values-based way to loosen the probe; the only
options are vendoring the chart or post-rendering the manifest.

This adds the missing knobs so operators can set e.g. timeoutSeconds: 5 /
failureThreshold: 5 (see #612 for a report of the same symptom).

How

The new fields are rendered with {{- with ... }}, so when they are unset the
template emits nothing and the kubelet defaults continue to apply. Existing
installations see no change
unless they explicitly set the new values.

Validation

  • helm template with no overrides: rendered probes are byte-for-byte unchanged
    (no timeoutSeconds / failureThreshold emitted).
  • helm template with the new values set: fields render correctly on both the
    httpGet and basicAuth tcpSocket probe branches.
  • helm lint: passes.

Signed-off-by: Zihong Chen <522023320011@smail.nju.edu.cn>
@ChenZiHong-Gavin

Copy link
Copy Markdown
Author

@nccurry @daphne2337 @jay-mckay @glowkey @guptaNswati Please review this PR. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant