CI Doctor: failure fingerprint grouping, slim doctor prose by pmtk · Pull Request #215 · openshift-eng/edge-tooling

pmtk · 2026-07-03T14:43:20Z

Depends on #214

extract-evidence.py condenses a failed job's downloaded artifacts into one structured evidence file per job: failed steps, test and phase failures, journal alerts, and container restart counts — every entry stamped with its timestamp and merged into a single time-sorted failure timeline. doctor.sh gains an `evidence` phase that runs it for every downloaded job, and the lvms-ci/microshift-ci plugins symlink the shared script. The evidence pack becomes the single starting point for analysis agents instead of each agent re-scanning raw artifacts.

Replace the prow-job skill's inline RCA instructions with a dedicated analyze-evidence agent that starts from the evidence pack and consults the MicroShift CI artifact primer (moved under agents/references/) and a structured-summary contract with tightened causal-chain rules. The doctor skill launches the same agent for its per-job analyses; prow-job becomes a thin wrapper that downloads artifacts, extracts evidence, and spawns the agent. validate-reports.py checks every agent report against the structured summary contract, and the doctor skill re-launches fix agents for reports that fail; parse.py sanitizes structured summaries before parsing.

The validator previously only checked that 'evidence' looked like a path — a hallucinated-but-plausible citation passed. It now resolves each citation against the job's downloaded artifacts (build dir derived from the entry's job_url), checks the file exists, the line is in range, and the quote actually appears near the cited line (timestamps stripped, whitespace normalized). Error messages include where the quote really is so fix agents can re-ground citations instead of guessing. Fix agents are no longer told to delete unsupported links to pass validation — they must re-ground each link or move the claim to analysis_gaps and downgrade confidence, then re-run the validator on their own output. Evidence packs now record the source file for every rf/boot_and_run/ journal alert entry (journal alerts from multiple files are merged, so line numbers alone were ambiguous). Drop missing_patterns from the agent contract: nothing consumed it — parse.py discarded it at aggregation — so it was pure token cost.

Grouping and cross-release dedup previously keyed on LLM-authored text (raw_error + root_cause) with a 0.5 token-similarity threshold — demanding cross-run determinism a sampled model cannot give, while the truly deterministic key (which step/tests/scenarios failed) already sat in the evidence pack. extract-evidence.py now computes a failure fingerprint from artifact facts only (job type, failed step, failing test names, phase failures, timeout cascade, greenboot verdict, infra indicator labels, first build error — all normalized, no job names/builds/timestamps). New doctor.sh plan/fanout phases (plan-analysis.py): - plan groups all failed jobs (releases + PRs) by fingerprint, writes template verdicts for pure-infrastructure and no-failure groups (no agent at all), and renders one fully substituted agent prompt file per remaining group - ONE agent analyzes each distinct failure instead of one per job — cross-release verdicts consistent by construction and fewer agents against the CI session's 45-minute budget - fanout explodes each validated group report into the per-job report files aggregate.py/search-bugs.py/create-report.py already consume, patching job fields and injecting 'fingerprint' (+ entry ordinal so independent failures stay separate issues) parse.py groups by fingerprint when present; token similarity remains as fallback for legacy reports. The validator resolves citations against all group members' build dirs. The analyze-evidence agent template is now group-native; prow-job renders it as a group of one. lvms-ci symlinks the new shared plan-analysis.py so its doctor flow resolves it too. Verified on a synthetic workdir: 5 jobs → 3 groups (1 agent), two consecutive runs produce byte-identical grouping.

fanout used to exit 0 even when group reports were missing or unparseable, merely listing them in its JSON — easy for the orchestrating session to ignore, silently dropping every job in those groups. Now it exits 3 and emits a retry_groups array with each group's prompt_file (null for deterministic groups) so the orchestrator can re-launch the failed analysis agents directly.

The group-first flow made most of the orchestration prose obsolete: the orchestrator no longer reads job JSON fields or builds agent prompts, so the field-name warnings, evidence-content inventories, duplicate examples, and step-restating notes are gone (318 → ~200 lines). The '-p mode' turn-keeping scaffolding stays until the CI step takes over the deterministic phases.

In CI the deterministic phases (prepare, graphs, evidence, fetch-previous, finalize) burn the Claude session's 45-minute wall clock while the model just waits on downloads. With --prepared the CI step runs them in bash around the session, and the skill covers only what needs a model: planning-driven agent launches, validation, fan-out, and bug correlation. Interactive use without the flag is unchanged.

openshift-ci · 2026-07-03T14:43:23Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-07-03T14:43:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [pmtk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-07-03T14:43:29Z

Warning

Review limit reached

@pmtk, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 59 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 98cbbd3c-5563-45f0-91f3-c3cbabae0662

📥 Commits

Reviewing files that changed from the base of the PR and between 54f8bc8 and 6a14923.

📒 Files selected for processing (16)

plugins/lvms-ci/scripts/extract-evidence.py
plugins/lvms-ci/scripts/plan-analysis.py
plugins/microshift-ci/agents/analyze-evidence.md
plugins/microshift-ci/agents/references/microshift-ci-primer.md
plugins/microshift-ci/agents/references/structured-summary.md
plugins/microshift-ci/scripts/extract-evidence.py
plugins/microshift-ci/scripts/plan-analysis.py
plugins/microshift-ci/scripts/validate-reports.py
plugins/microshift-ci/skills/doctor/SKILL.md
plugins/microshift-ci/skills/prow-job/SKILL.md
plugins/microshift-ci/skills/prow-job/references/microshift-ci-primer.md
plugins/shared/scripts/doctor.sh
plugins/shared/scripts/extract-evidence.py
plugins/shared/scripts/parse.py
plugins/shared/scripts/plan-analysis.py
plugins/shared/scripts/validate-reports.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

pmtk added 7 commits July 3, 2026 11:58

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI Doctor: failure fingerprint grouping, slim doctor prose#215

CI Doctor: failure fingerprint grouping, slim doctor prose#215
pmtk wants to merge 7 commits into
openshift-eng:mainfrom
pmtk:ci-doctor-shiftweek-2a

pmtk commented Jul 3, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pmtk commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

coderabbitai Bot commented Jul 3, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pmtk commented Jul 3, 2026 •

edited

Loading