OCPBUGS-90708: Fix NTP-failover dual-publisher for CLOCK_REALTIME#710
Conversation
|
@vitus133: This pull request references Jira Issue OCPBUGS-90708, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Warning Review limit reached
More reviews will be available in 50 minutes and 33 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthrough
ChangesNTP Failover Guard and Tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/ptp_operator/metrics/ntp_test.go (1)
112-123: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the Prometheus gauge on this path too.
This test covers the only branch that now calls
UpdateSyncStateMetrics(..., CLOCK_REALTIME, FREERUN), but it never verifies the gauge value. Without that assertion, the stale-metric regression this PR fixes can return unnoticed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/ptp_operator/metrics/ntp_test.go` around lines 112 - 123, This test path already checks the sync state and emitted event, but it does not verify the Prometheus gauge updated by UpdateSyncStateMetrics for CLOCK_REALTIME. Add an assertion in ntp_test.go alongside the existing ptpStats[metrics.ClockRealTime].LastSyncState check to confirm the gauge reflects FREERUN on the phc2sys process-down path when chronyd is not enabled, using the existing mock metrics object and CLOCK_REALTIME symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/ptp_operator/metrics/ntp_test.go`:
- Around line 24-27: The test in ntp_test.go is using repeated string literals
that trigger the linter, so move the shared values into file-level constants.
Introduce constants for the profile name, config name, and the repeated
chrony/PTP argument strings, then replace the inline literals in the relevant
test setup and assertions. Update the ntp test helpers around initPubSubTypes
and metrics.NewPTPEventManager to use those constants consistently so the
repeated-literal lint warning is removed.
---
Nitpick comments:
In `@plugins/ptp_operator/metrics/ntp_test.go`:
- Around line 112-123: This test path already checks the sync state and emitted
event, but it does not verify the Prometheus gauge updated by
UpdateSyncStateMetrics for CLOCK_REALTIME. Add an assertion in ntp_test.go
alongside the existing ptpStats[metrics.ClockRealTime].LastSyncState check to
confirm the gauge reflects FREERUN on the phc2sys process-down path when chronyd
is not enabled, using the existing mock metrics object and CLOCK_REALTIME
symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 6cd242d2-0b47-44dc-b989-71326c3c4c95
📒 Files selected for processing (2)
plugins/ptp_operator/metrics/metrics.goplugins/ptp_operator/metrics/ntp_test.go
In NTP-failover profiles (chronyd + phc2sys configured), phc2sys
process-down was emitting a spurious OsClockSyncStateChange FREERUN
event for CLOCK_REALTIME, conflicting with chronyd which is the sole
E3 authority in NTP mode. This caused:
- Double CLOCK_REALTIME values in os-clock-sync-state events
- Stale openshift_ptp_clock_state{process="phc2sys"} metric
Fix: skip the OsClockSyncStateChange emission in processDownEvent when
ChronydEnabled() is true for the profile. Also update the Prometheus
gauge on the non-chronyd path to prevent stale metrics.
Co-authored-by: Cursor <cursoragent@cursor.com>
b46ae31 to
8a79a22
Compare
|
/jira refresh |
|
@vitus133: This pull request references Jira Issue OCPBUGS-90708, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: GitHub didn't allow me to request PR reviews from the following users: klaskosk. Note that only redhat-cne members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherrypick release-4.22 |
|
@vitus133: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nocturnalastro, vitus133 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@vitus133: Jira Issue OCPBUGS-90708: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-90708 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@vitus133: new pull request created: #711 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
In NTP-failover profiles (chronyd + phc2sys configured), phc2sys process-down was emitting a spurious OsClockSyncStateChange FREERUN event for CLOCK_REALTIME, conflicting with chronyd which is the sole E3 authority in NTP mode. This caused:
Fix: skip the OsClockSyncStateChange emission in processDownEvent when ChronydEnabled() is true for the profile. Also update the Prometheus gauge on the non-chronyd path to prevent stale metrics.