Skip to content

CI Doctor: failure fingerprint grouping, slim doctor prose#215

Draft
pmtk wants to merge 7 commits into
openshift-eng:mainfrom
pmtk:ci-doctor-shiftweek-2a
Draft

CI Doctor: failure fingerprint grouping, slim doctor prose#215
pmtk wants to merge 7 commits into
openshift-eng:mainfrom
pmtk:ci-doctor-shiftweek-2a

Conversation

@pmtk

@pmtk pmtk commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Depends on #214

pmtk added 7 commits July 3, 2026 11:58
extract-evidence.py condenses a failed job's downloaded artifacts into
one structured evidence file per job: failed steps, test and phase
failures, journal alerts, and container restart counts — every entry
stamped with its timestamp and merged into a single time-sorted
failure timeline. doctor.sh gains an `evidence` phase that runs it for
every downloaded job, and the lvms-ci/microshift-ci plugins symlink
the shared script.

The evidence pack becomes the single starting point for analysis
agents instead of each agent re-scanning raw artifacts.
Replace the prow-job skill's inline RCA instructions with a dedicated
analyze-evidence agent that starts from the evidence pack and consults
the MicroShift CI artifact primer (moved under agents/references/) and
a structured-summary contract with tightened causal-chain rules. The
doctor skill launches the same agent for its per-job analyses;
prow-job becomes a thin wrapper that downloads artifacts, extracts
evidence, and spawns the agent.

validate-reports.py checks every agent report against the structured
summary contract, and the doctor skill re-launches fix agents for
reports that fail; parse.py sanitizes structured summaries before
parsing.
The validator previously only checked that 'evidence' looked like a
path — a hallucinated-but-plausible citation passed. It now resolves
each citation against the job's downloaded artifacts (build dir derived
from the entry's job_url), checks the file exists, the line is in
range, and the quote actually appears near the cited line (timestamps
stripped, whitespace normalized). Error messages include where the
quote really is so fix agents can re-ground citations instead of
guessing.

Fix agents are no longer told to delete unsupported links to pass
validation — they must re-ground each link or move the claim to
analysis_gaps and downgrade confidence, then re-run the validator on
their own output.

Evidence packs now record the source file for every rf/boot_and_run/
journal alert entry (journal alerts from multiple files are merged, so
line numbers alone were ambiguous).

Drop missing_patterns from the agent contract: nothing consumed it —
parse.py discarded it at aggregation — so it was pure token cost.
Grouping and cross-release dedup previously keyed on LLM-authored text
(raw_error + root_cause) with a 0.5 token-similarity threshold —
demanding cross-run determinism a sampled model cannot give, while the
truly deterministic key (which step/tests/scenarios failed) already sat
in the evidence pack.

extract-evidence.py now computes a failure fingerprint from artifact
facts only (job type, failed step, failing test names, phase failures,
timeout cascade, greenboot verdict, infra indicator labels, first build
error — all normalized, no job names/builds/timestamps).

New doctor.sh plan/fanout phases (plan-analysis.py):
- plan groups all failed jobs (releases + PRs) by fingerprint, writes
  template verdicts for pure-infrastructure and no-failure groups (no
  agent at all), and renders one fully substituted agent prompt file
  per remaining group
- ONE agent analyzes each distinct failure instead of one per job —
  cross-release verdicts consistent by construction and fewer agents
  against the CI session's 45-minute budget
- fanout explodes each validated group report into the per-job report
  files aggregate.py/search-bugs.py/create-report.py already consume,
  patching job fields and injecting 'fingerprint' (+ entry ordinal so
  independent failures stay separate issues)

parse.py groups by fingerprint when present; token similarity remains
as fallback for legacy reports. The validator resolves citations
against all group members' build dirs. The analyze-evidence agent
template is now group-native; prow-job renders it as a group of one.

lvms-ci symlinks the new shared plan-analysis.py so its doctor flow
resolves it too.

Verified on a synthetic workdir: 5 jobs → 3 groups (1 agent), two
consecutive runs produce byte-identical grouping.
fanout used to exit 0 even when group reports were missing or
unparseable, merely listing them in its JSON — easy for the
orchestrating session to ignore, silently dropping every job in those
groups. Now it exits 3 and emits a retry_groups array with each group's
prompt_file (null for deterministic groups) so the orchestrator can
re-launch the failed analysis agents directly.
The group-first flow made most of the orchestration prose obsolete:
the orchestrator no longer reads job JSON fields or builds agent
prompts, so the field-name warnings, evidence-content inventories,
duplicate examples, and step-restating notes are gone (318 → ~200
lines). The '-p mode' turn-keeping scaffolding stays until the CI step
takes over the deterministic phases.
In CI the deterministic phases (prepare, graphs, evidence,
fetch-previous, finalize) burn the Claude session's 45-minute wall
clock while the model just waits on downloads. With --prepared the CI
step runs them in bash around the session, and the skill covers only
what needs a model: planning-driven agent launches, validation,
fan-out, and bug correlation. Interactive use without the flag is
unchanged.
@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 3, 2026
@openshift-ci

openshift-ci Bot commented Jul 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2026
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@pmtk, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 59 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 98cbbd3c-5563-45f0-91f3-c3cbabae0662

📥 Commits

Reviewing files that changed from the base of the PR and between 54f8bc8 and 6a14923.

📒 Files selected for processing (16)
  • plugins/lvms-ci/scripts/extract-evidence.py
  • plugins/lvms-ci/scripts/plan-analysis.py
  • plugins/microshift-ci/agents/analyze-evidence.md
  • plugins/microshift-ci/agents/references/microshift-ci-primer.md
  • plugins/microshift-ci/agents/references/structured-summary.md
  • plugins/microshift-ci/scripts/extract-evidence.py
  • plugins/microshift-ci/scripts/plan-analysis.py
  • plugins/microshift-ci/scripts/validate-reports.py
  • plugins/microshift-ci/skills/doctor/SKILL.md
  • plugins/microshift-ci/skills/prow-job/SKILL.md
  • plugins/microshift-ci/skills/prow-job/references/microshift-ci-primer.md
  • plugins/shared/scripts/doctor.sh
  • plugins/shared/scripts/extract-evidence.py
  • plugins/shared/scripts/parse.py
  • plugins/shared/scripts/plan-analysis.py
  • plugins/shared/scripts/validate-reports.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant