feat: add cascade delete observability metrics#1113
Open
ishtoo1 wants to merge 1 commit intofeat/cascade-delete-childrenfrom
Open
feat: add cascade delete observability metrics#1113ishtoo1 wants to merge 1 commit intofeat/cascade-delete-childrenfrom
ishtoo1 wants to merge 1 commit intofeat/cascade-delete-childrenfrom
Conversation
1 task
Go Coverage Report (Bazel)Total Coverage: 63.4% Coverage Policy:
|
c2f628c to
a97b06a
Compare
6795311 to
852f830
Compare
a97b06a to
17a9b4c
Compare
852f830 to
d29a625
Compare
17a9b4c to
1a4d392
Compare
d29a625 to
a869aa8
Compare
1a4d392 to
a7ac07c
Compare
a869aa8 to
bbaa8f6
Compare
a7ac07c to
1abceee
Compare
bbaa8f6 to
a996226
Compare
Summary: Intent: - Add observability for pipeline cascade delete operations Changes: - Add 3 counters (started, completed, error) and 1 gauge (active_children) for cascade delete - Wire metrics into handleDeletion at each phase transition and error path - Register new metrics in RegisterPipelineMetrics Test Plan: - go test ./components/pipeline/... -v -count=1 (all tests pass) - go build ./components/pipeline/... (builds successfully) Revert Plan: Revert this PR via git revert. Jira Issues:
1abceee to
c248111
Compare
a996226 to
a0c2eb4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this? (check all applicable)
What changed?
go/components/pipeline/metrics.go:pipeline_cascade_delete_started_total{namespace, pipeline}pipeline_cascade_delete_completed_total{namespace, pipeline}pipeline_cascade_delete_error_total{namespace, pipeline, reason}(reasons:list_error,delete_error,update_error,kill_timeout)pipeline_cascade_delete_active_children{namespace, pipeline, kind}(kinds:trigger_run,pipeline_run)handleDeletion.cascade-delete-started-atannotation so the "started" counter fires exactly once per cascade (not once per requeue).cascadeDeleteKillTimeout = 30m. After the timeout, controller stops killing and forcefully deletes child CRs withreason=\"kill_timeout\"on the error counter, so the Pipeline CR never stays stuck inTerminatingforever.TestCascadeDelete_KillTimeoutandTestCascadeDelete_CounterIncrementsOncePerCascade.Why?
Addressing #1091.
How did you test it?
bazel test //go/components/pipeline/...— all tests pass.bazel build //go/...— no build errors.End-to-end behavior of the full cascade-delete stack is verified on a sandbox cluster; results are attached on docs: document pipeline cascade delete behavior #1114.
Potential risks
namespace × pipeline × (reason|kind). For typical deployments this is bounded; for operators managing thousands of pipelines the gauge may grow. Uses the same pattern asgo/components/pipelinerun/metrics.go.Terminatingindefinitely. Operators can investigate stuck workflows via logs and thekill_timeoutmetric.Release notes
New metrics emitted under the
pipeline_cascade_delete_*prefix. No user-facing behavior change beyond the 30-minute kill timeout fallback.Documentation Changes
N/A.
Stacked on top of #1112.