CI Doctor: evidence packs, prow job analysis agent, double check causal chains#214
CI Doctor: evidence packs, prow job analysis agent, double check causal chains#214pmtk wants to merge 3 commits into
Conversation
extract-evidence.py condenses a failed job's downloaded artifacts into one structured evidence file per job: failed steps, test and phase failures, journal alerts, and container restart counts — every entry stamped with its timestamp and merged into a single time-sorted failure timeline. doctor.sh gains an `evidence` phase that runs it for every downloaded job, and the lvms-ci/microshift-ci plugins symlink the shared script. The evidence pack becomes the single starting point for analysis agents instead of each agent re-scanning raw artifacts.
Replace the prow-job skill's inline RCA instructions with a dedicated analyze-evidence agent that starts from the evidence pack and consults the MicroShift CI artifact primer (moved under agents/references/) and a structured-summary contract with tightened causal-chain rules. The doctor skill launches the same agent for its per-job analyses; prow-job becomes a thin wrapper that downloads artifacts, extracts evidence, and spawns the agent. validate-reports.py checks every agent report against the structured summary contract, and the doctor skill re-launches fix agents for reports that fail; parse.py sanitizes structured summaries before parsing.
The validator previously only checked that 'evidence' looked like a path — a hallucinated-but-plausible citation passed. It now resolves each citation against the job's downloaded artifacts (build dir derived from the entry's job_url), checks the file exists, the line is in range, and the quote actually appears near the cited line (timestamps stripped, whitespace normalized). Error messages include where the quote really is so fix agents can re-ground citations instead of guessing. Fix agents are no longer told to delete unsupported links to pass validation — they must re-ground each link or move the claim to analysis_gaps and downgrade confidence, then re-run the validator on their own output. Evidence packs now record the source file for every rf/boot_and_run/ journal alert entry (journal alerts from multiple files are merged, so line numbers alone were ambiguous). Drop missing_patterns from the agent contract: nothing consumed it — parse.py discarded it at aggregation — so it was pure token cost.
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughAdds a shared evidence-extraction pipeline (extract-evidence.py, validate-reports.py) under plugins/shared/scripts, wires a new ChangesEvidence Extraction and Validation Pipeline
Estimated code review effort: 4 (Complex) | ~60 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5 | ❌ 6❌ Failed checks (6 inconclusive)
✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 markdownlint-cli2 (0.22.1)plugins/microshift-ci/agents/analyze-evidence.mdmarkdownlint-cli2 v0.22.1 (markdownlint v0.40.0) ... [truncated 1314 characters] ... node:internal/modules/esm/resolve:271:11) plugins/microshift-ci/agents/references/microshift-ci-primer.mdmarkdownlint-cli2 v0.22.1 (markdownlint v0.40.0) ... [truncated 1314 characters] ... node:internal/modules/esm/resolve:271:11) plugins/microshift-ci/agents/references/structured-summary.mdmarkdownlint-cli2 v0.22.1 (markdownlint v0.40.0) ... [truncated 1314 characters] ... node:internal/modules/esm/resolve:271:11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
plugins/shared/scripts/validate-reports.py (2)
222-223: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueNested
errclosure captures loop variables (ruff B023).
errcloses oversig,li, andcause, and is redefined every iteration. It works today because it's called synchronously within the same iteration, but ruff flags it (B023) and the file must pass ruff. Hoist it to a module/nested helper taking explicit args.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/validate-reports.py` around lines 222 - 223, The nested err closure in validate-reports.py is capturing loop variables sig, li, and cause, which triggers ruff B023. Replace the inner closure with a helper that takes sig, li, and cause as explicit parameters, and update the call site inside the surrounding loop so the formatted error string is built from passed-in values instead of closed-over state.Source: Linters/SAST tools
170-195: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicated STRUCTURED SUMMARY extraction/sanitization — consume the canonical parser instead.
This block re-implements the exact marker regex + tab/control-char sanitization +
json.loadsthat already lives inparse.py'sparse_structured_summary. Keeping two copies means the sanitization rules (e.g. the tab handling above) drift between the two consumers.Per CONTRIBUTING.md Review Principles — "Single source of truth — no derived state": extract a shared helper (e.g.
_extract_summary_json(content)inparse.py) and call it from both, so validation and parsing stay in lock-step.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/validate-reports.py` around lines 170 - 195, The `_load_entries` logic is duplicating STRUCTURED SUMMARY parsing and sanitization that already exists in `parse.py`’s `parse_structured_summary`, so make the parser the single source of truth. Extract the shared summary extraction/sanitization into a helper in `parse.py` (for example, a function that returns the parsed entries or the normalized JSON text) and have `_load_entries` call that helper instead of redoing the marker regex, tab/control-character cleanup, and `json.loads`. Keep the existing fallback behavior for missing or malformed summaries, but ensure both consumers use the same canonical implementation.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/microshift-ci/skills/doctor/SKILL.md`:
- Around line 121-129: Step 2 uses undefined output filename placeholders for
JOB_ID and JOB_NAME_SUFFIX, so update the placeholder mapping in SKILL.md to use
only fields introduced from the prepare script JSON or rename them to
already-defined placeholders. Make sure the OUTPUT_FILE examples in the step
reference concrete values derived from job, url, and build_id, and that any
PR-specific suffix logic is expressed using the existing JOB_NAME / PR
placeholders rather than new undefined symbols.
In `@plugins/microshift-ci/skills/prow-job/SKILL.md`:
- Around line 26-60: The workflow in SKILL.md is missing inline stop conditions
for failure cases and the agent completion check. Update the numbered steps
around the artifact download, evidence extraction, and analyze stages to
explicitly say what to do if gsutil fails, extract-evidence.py fails, or the
spawned agent does not return DONE, using the existing workflow structure and
the analyze-evidence.md-driven agent step as the anchor. Keep the guard
conditions adjacent to their relevant step, and make the steps actionable with
clear fallback/abort behavior rather than deferring these checks to a separate
section.
- Around line 22-33: The workdir setup in the SKILL workflow can fail on a clean
run because mktemp -d assumes the parent directory already exists. Update the
setup steps in SKILL.md so the URL path branch explicitly creates the parent
workdir first with mkdir -p before calling mktemp -d, and keep the existing
WORKDIR/TMP flow intact.
- Around line 17-32: The `SKILL.md` workflow currently only treats `/`-prefixed
inputs as local in the artifact setup logic, so relative directories will be
misclassified as URLs. Update the `<ARGUMENTS>` contract and the Step 1 handling
in the Prow job workflow to either explicitly require an absolute local
artifacts path or accept existing relative directories as local inputs, and keep
the guidance aligned with the `Work Directory` and `Workflow` sections.
In `@plugins/shared/scripts/extract-evidence.py`:
- Line 330: The script has Ruff violations that will fail validation: rename the
ambiguous comprehension variable `l` in `extract-evidence.py` to a clearer name
in the `context_lines` assignment and the other similar spot, prefix the unused
unpacked `infra_fail_count` with `_` where it is assigned, and split the chained
statements separated by semicolons into separate lines in the affected block.
Update the relevant logic in `extract-evidence.py` around `context_lines`,
`infra_fail_count`, and the code near the multiple-statement lines so the file
passes Ruff.
- Around line 182-192: The refs selection logic in extract-evidence.py can still
index into an empty extra_refs array and raise IndexError in single-mode jobs.
Refactor the refs assignment near the prowjob handling to check whether
extra_refs is present and non-empty before accessing its first element, and keep
the existing fallback behavior when spec.refs is missing. Use the refs, prowjob,
and extra_refs lookup path to locate the fix.
In `@plugins/shared/scripts/parse.py`:
- Line 51: The pre-parse normalization in parse.py is escaping tabs in the whole
JSON payload, which breaks valid tab-indented JSON before json.loads() runs.
Update the parse flow in the json_text handling to stop rewriting all tab
characters globally; instead, only handle control characters inside string
values or remove the tab replacement entirely while keeping the existing JSON
parsing path intact.
---
Nitpick comments:
In `@plugins/shared/scripts/validate-reports.py`:
- Around line 222-223: The nested err closure in validate-reports.py is
capturing loop variables sig, li, and cause, which triggers ruff B023. Replace
the inner closure with a helper that takes sig, li, and cause as explicit
parameters, and update the call site inside the surrounding loop so the
formatted error string is built from passed-in values instead of closed-over
state.
- Around line 170-195: The `_load_entries` logic is duplicating STRUCTURED
SUMMARY parsing and sanitization that already exists in `parse.py`’s
`parse_structured_summary`, so make the parser the single source of truth.
Extract the shared summary extraction/sanitization into a helper in `parse.py`
(for example, a function that returns the parsed entries or the normalized JSON
text) and have `_load_entries` call that helper instead of redoing the marker
regex, tab/control-character cleanup, and `json.loads`. Keep the existing
fallback behavior for missing or malformed summaries, but ensure both consumers
use the same canonical implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a046fc16-c033-4567-a1dd-f6c0b098029e
📒 Files selected for processing (13)
plugins/lvms-ci/scripts/extract-evidence.pyplugins/microshift-ci/agents/analyze-evidence.mdplugins/microshift-ci/agents/references/microshift-ci-primer.mdplugins/microshift-ci/agents/references/structured-summary.mdplugins/microshift-ci/scripts/extract-evidence.pyplugins/microshift-ci/scripts/validate-reports.pyplugins/microshift-ci/skills/doctor/SKILL.mdplugins/microshift-ci/skills/prow-job/SKILL.mdplugins/microshift-ci/skills/prow-job/references/microshift-ci-primer.mdplugins/shared/scripts/doctor.shplugins/shared/scripts/extract-evidence.pyplugins/shared/scripts/parse.pyplugins/shared/scripts/validate-reports.py
💤 Files with no reviewable changes (1)
- plugins/microshift-ci/skills/prow-job/references/microshift-ci-primer.md
| Substitute these placeholders from the prepare script's JSON output (`job`, `url`, `build_id` fields): | ||
|
|
||
| **For release jobs:** | ||
| | Placeholder | Value | | ||
| |---|---| | ||
| | `{EVIDENCE_PACK}` | `<WORKDIR>/evidence/evidence-<BUILD_ID>.json` | | ||
| | `{JOB_NAME}` | `job` field (for PR jobs, append a space and `(PR #<PR>)`) | | ||
| | `{JOB_URL}` | `url` field | | ||
| | `{OUTPUT_FILE}` | Release: `<WORKDIR>/jobs/release-<RELEASE>-job-<N>-<JOB_ID>.txt`. PR: `<WORKDIR>/jobs/prs-job-<N>-pr<PR>-<JOB_NAME_SUFFIX>.txt` | | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/bash
set -euo pipefail
printf '--- file outline ---\n'
ast-grep outline plugins/microshift-ci/skills/doctor/SKILL.md || true
printf '\n--- search for JOB_ID / JOB_NAME_SUFFIX / build_id ---\n'
rg -n '\bJOB_ID\b|\bJOB_NAME_SUFFIX\b|\bbuild_id\b|\bjob\b|\burl\b' plugins/microshift-ci/skills/doctor/SKILL.md
printf '\n--- surrounding lines 110-140 ---\n'
sed -n '110,140p' plugins/microshift-ci/skills/doctor/SKILL.mdRepository: openshift-eng/edge-tooling
Length of output: 255
🏁 Script executed:
#!/usr/bin/bash
set -euo pipefail
printf '--- broader search in plugin docs ---\n'
rg -n '\bJOB_ID\b|\bJOB_NAME_SUFFIX\b|JOB_NAME_SUFFIX|JOB_ID' plugins/microshift-ci -g '!**/node_modules/**' || trueRepository: openshift-eng/edge-tooling
Length of output: 255
🏁 Script executed:
#!/usr/bin/bash
set -euo pipefail
printf '--- relevant section with line numbers ---\n'
cat -n plugins/microshift-ci/skills/doctor/SKILL.md | sed -n '118,132p'Repository: openshift-eng/edge-tooling
Length of output: 255
🏁 Script executed:
#!/usr/bin/bash
set -euo pipefail
echo '--- relevant section ---'
sed -n '118,132p' plugins/microshift-ci/skills/doctor/SKILL.md
echo
echo '--- placeholder search in file ---'
rg -n '\bJOB_ID\b|\bJOB_NAME_SUFFIX\b|\bbuild_id\b|\bjob\b|\burl\b' plugins/microshift-ci/skills/doctor/SKILL.md
echo
echo '--- broader search in plugin docs ---'
rg -n '\bJOB_ID\b|\bJOB_NAME_SUFFIX\b' plugins/microshift-ci -g '!**/node_modules/**' || trueRepository: openshift-eng/edge-tooling
Length of output: 4967
🏁 Script executed:
#!/usr/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
path = Path('plugins/microshift-ci/skills/doctor/SKILL.md')
text = path.read_text()
for needle in ['JOB_ID', 'JOB_NAME_SUFFIX', 'build_id', 'job', 'url']:
print(f'-- {needle} --')
for i, line in enumerate(text.splitlines(), 1):
if needle in line:
print(f'{i}: {line}')
PYRepository: openshift-eng/edge-tooling
Length of output: 5417
Define the output filename placeholders. JOB_ID and JOB_NAME_SUFFIX are not defined anywhere in Step 2, while the surrounding table only maps job, url, and build_id. Map them to concrete fields or rename them to placeholders already introduced.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/microshift-ci/skills/doctor/SKILL.md` around lines 121 - 129, Step 2
uses undefined output filename placeholders for JOB_ID and JOB_NAME_SUFFIX, so
update the placeholder mapping in SKILL.md to use only fields introduced from
the prepare script JSON or rename them to already-defined placeholders. Make
sure the OUTPUT_FILE examples in the step reference concrete values derived from
job, url, and build_id, and that any PR-specific suffix logic is expressed using
the existing JOB_NAME / PR placeholders rather than new undefined symbols.
| `<ARGUMENTS>`: Prow URL, GCS web URL, or local artifacts directory. | ||
|
|
||
| - `<TMP>/build-log.txt`: Log containing prow job output and most likely place to identify AWS infra related or hypervisor related errors. | ||
| - `<STEP>/build-log.txt`: Each step in the CI job is individually logged in a build-log.txt file. | ||
| - `<TMP>/artifacts/<TEST_NAME>/openshift-microshift-infra-sos-aws/artifacts/sosreport-*.tar.xz`: Compressed archive containing select portions of the test host's filesystem, relevant logs, and system configurations. `<TEST_NAME>` varies by job (e.g., `e2e-aws-tests`, `e2e-aws-ovn-ocp-conformance-arm64`). | ||
| - `<TMP>/artifacts/<TEST_NAME>/openshift-microshift-e2e-origin-conformance/build-log.txt`: Step-specific build log for origin conformance tests. | ||
|
|
||
| ## Important Links | ||
|
|
||
| **Step Diagram URL** (found at the end of the main build-log): | ||
|
|
||
| ```text | ||
| https://steps.ci.openshift.org/job?org=openshift&repo=microshift&branch=release-4.19&test=e2e-aws-tests-bootc-nightly&variant=periodics | ||
| ``` | ||
|
|
||
| This link provides a diagram of the steps that make up the test. Think about reading this diagram when identifying step failures because not all fatal errors cause the current step to fail but may cause the next step to fail. | ||
|
|
||
| **SOS Report** (contains pod/container logs and cluster-scoped resources) | ||
|
|
||
| **Journals:** use the plain-text `journal_*.log` files next to the sosreport tarballs (e.g., `scenario-info/<scenario>/vms/host1/sos/journal_*.log`). These are readable directly with Read/Grep and contain the journal evidence you need (service failures, x509 errors, OOM kills, microshift unit logs). | ||
|
|
||
| **Pod logs, cluster state, inspect outputs:** extract a specific sosreport tarball when you need pod logs (container crashes, restarts, probe failures). The extraction script pulls pod logs, inspect outputs, and cluster-scoped resources. | ||
|
|
||
| **When to extract a sosreport:** when the journal shows `CrashLoopBackOff`, `Back-off restarting`, repeated `Created container` events, or probe failures after readiness. Pod and container logs — in particular `previous.log`, the only record of WHY a dead container exited — exist exclusively inside the sosreport tarball. | ||
|
|
||
| **How to extract:** find the tarball for the scenario, then run the extraction script on that single tarball: | ||
|
|
||
| ```bash | ||
| # Find sosreport tarballs for the scenario | ||
| find <scenario-dir>/.. -name 'sosreport-*.tar.xz' | ||
|
|
||
| # Extract only pod logs, inspect outputs, and cluster-scoped resources | ||
| bash plugins/shared/scripts/extract-sosreport.sh <tarball-path> | ||
| ``` | ||
|
|
||
| The script prints the extraction directory to stdout. Extracted files land in `<tarball-parent>/sos-extracted/<sosreport-name>/`. The extraction is idempotent — running it again on the same tarball is a no-op. Inside the extracted tree: | ||
|
|
||
| - `sos_commands/microshift/namespaces/<namespace>/pods/<pod>/<container>/<container>/logs/{current,previous}.log` — container logs | ||
| - `sos_commands/microshift/namespaces/<namespace>/core/{pods.yaml,events.yaml}` — pod status and events | ||
| - `sos_commands/microshift/cluster-scoped-resources/` — nodes, CRDs, webhooks | ||
| - `sos_commands/*/inspect_*` — component command outputs | ||
|
|
||
| **There may be several sosreports for a single scenario**: the test framework's sos-on-failure listener (`test/resources/sos-on-failure-listener.py` in openshift/microshift) captures a sosreport at the moment of each test failure, in addition to the one collected at the end of the scenario. **Prefer the on-failure sosreport when investigating a specific test failure**: it contains the pods and container logs of the namespaces created specifically for that test (suite), which are absent from the end-of-scenario sosreport because they have already been cleaned up by then. Match a sosreport to its test failure by capture time. | ||
|
|
||
| Correlate journal entries with the failure timestamp recorded during the Characterize phase. | ||
|
|
||
| ## Performance Graphs | ||
|
|
||
| When the input is a local artifacts directory of the form `<WORKDIR>/artifacts/<BUILD_ID>` (the doctor workflow), pre-generated PCP performance graphs may exist in the sibling directory: | ||
|
|
||
| ```text | ||
| <WORKDIR>/graphs/<BUILD_ID>/ | ||
| 1_cpu_usage.png — CPU usage (user, system, I/O wait) | ||
| 2_mem_usage.png — Memory usage (used, cached) | ||
| 3_disk_io.png — Disk I/O (read/write OPS, await) | ||
| 4_disk_usage.png — Disk usage by partition (% fill) | ||
| ``` | ||
|
|
||
| Use the Read tool to view these PNGs during the drill-down phase whenever the failure involves a timeout, slowness, readiness/health-check expiry, eviction, OOM, or any resource-related error. Look for CPU saturation, memory exhaustion, or disk I/O stalls overlapping the failure window. If the directory does not exist (e.g., standalone URL invocation), skip graph correlation — do not attempt to generate graphs. | ||
| URL formats — periodic: `.../logs/{JOB_NAME}/{JOB_ID}`, presubmit: `.../pr-logs/pull/openshift_microshift/{PR}/{JOB_NAME}/{JOB_ID}`. | ||
| Hosts: `prow.ci.openshift.org/view/gs/test-platform-results/...` or `gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/...`. | ||
|
|
||
| ## Work Directory | ||
|
|
||
| Compute once at the start by running `date +%y%m%d` and substituting into the path below. In all commands, replace `<WORKDIR>` with the computed path — do not store the work directory in a shell variable. | ||
|
|
||
| ```text | ||
| /tmp/microshift-ci-claude-workdir.<YYMMDD> | ||
| ``` | ||
|
|
||
| ## Common Commands | ||
|
|
||
| Scan the build log for arbitrary text: | ||
|
|
||
| ```bash | ||
| grep '${SOME_TEXT}' ${GREP_OPTS} ${TMP}/build-log.txt | ||
| ``` | ||
|
|
||
| Download all prow job artifacts (only needed when given a URL, not a local path): | ||
|
|
||
| ```bash | ||
| GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') | ||
| gsutil -q -m cp -r "${GCS_PATH}/" ${TMP}/ | ||
| ``` | ||
| `/tmp/microshift-ci-claude-workdir.<YYMMDD>` — compute `<YYMMDD>` once via `date +%y%m%d`. | ||
|
|
||
| ## Workflow | ||
|
|
||
| The user argument is: `<ARGUMENTS>` | ||
|
|
||
| 0. **Determine input type and set up artifacts directory**: | ||
| - If `<ARGUMENTS>` is a **local directory path** (starts with `/` and contains `build-log.txt`): set `TMP` to that directory. Skip step 1. | ||
| - If `<ARGUMENTS>` is a **URL** (starts with `http`): create a temporary working directory with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`, set `TMP` to that directory, and proceed to step 1. | ||
| 1. **Set up artifacts**: | ||
| - Local path (starts with `/`): use it as `<TMP>`. Skip step 2. | ||
| - URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
git ls-files plugins/microshift-ci/skills/prow-job/SKILL.md plugins/docs/SKILL-GUIDELINES.md CONTRIBUTING.md
echo
echo "=== plugins/microshift-ci/skills/prow-job/SKILL.md (selected lines) ==="
nl -ba plugins/microshift-ci/skills/prow-job/SKILL.md | sed -n '1,120p'
echo
echo "=== Search for local-path / absolute-path wording in related skill docs ==="
rg -n "local path|absolute path|starts with /|artifacts directory|TMP|mktemp -d|prow-job" plugins/microshift-ci/skills plugins/docs CONTRIBUTING.mdRepository: openshift-eng/edge-tooling
Length of output: 373
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== plugins/microshift-ci/skills/prow-job/SKILL.md ==="
cat -n plugins/microshift-ci/skills/prow-job/SKILL.md | sed -n '1,140p'
echo
echo "=== Relevant matches ==="
rg -n "local path|absolute path|starts with /|artifacts directory|TMP|mktemp -d|existing directory|directory" plugins/microshift-ci/skills/prow-job/SKILL.md plugins/microshift-ci/skills -g 'SKILL.md'Repository: openshift-eng/edge-tooling
Length of output: 7340
Require an absolute local-path contract. <ARGUMENTS> says “local artifacts directory,” but step 1 only treats /-prefixed inputs as local. Relative paths will go down the URL branch. Document the absolute-path requirement or accept existing directories.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/microshift-ci/skills/prow-job/SKILL.md` around lines 17 - 32, The
`SKILL.md` workflow currently only treats `/`-prefixed inputs as local in the
artifact setup logic, so relative directories will be misclassified as URLs.
Update the `<ARGUMENTS>` contract and the Step 1 handling in the Prow job
workflow to either explicitly require an absolute local artifacts path or accept
existing relative directories as local inputs, and keep the guidance aligned
with the `Work Directory` and `Workflow` sections.
| ## Work Directory | ||
|
|
||
| Compute once at the start by running `date +%y%m%d` and substituting into the path below. In all commands, replace `<WORKDIR>` with the computed path — do not store the work directory in a shell variable. | ||
|
|
||
| ```text | ||
| /tmp/microshift-ci-claude-workdir.<YYMMDD> | ||
| ``` | ||
|
|
||
| ## Common Commands | ||
|
|
||
| Scan the build log for arbitrary text: | ||
|
|
||
| ```bash | ||
| grep '${SOME_TEXT}' ${GREP_OPTS} ${TMP}/build-log.txt | ||
| ``` | ||
|
|
||
| Download all prow job artifacts (only needed when given a URL, not a local path): | ||
|
|
||
| ```bash | ||
| GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') | ||
| gsutil -q -m cp -r "${GCS_PATH}/" ${TMP}/ | ||
| ``` | ||
| `/tmp/microshift-ci-claude-workdir.<YYMMDD>` — compute `<YYMMDD>` once via `date +%y%m%d`. | ||
|
|
||
| ## Workflow | ||
|
|
||
| The user argument is: `<ARGUMENTS>` | ||
|
|
||
| 0. **Determine input type and set up artifacts directory**: | ||
| - If `<ARGUMENTS>` is a **local directory path** (starts with `/` and contains `build-log.txt`): set `TMP` to that directory. Skip step 1. | ||
| - If `<ARGUMENTS>` is a **URL** (starts with `http`): create a temporary working directory with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`, set `TMP` to that directory, and proceed to step 1. | ||
| 1. **Set up artifacts**: | ||
| - Local path (starts with `/`): use it as `<TMP>`. Skip step 2. | ||
| - URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`. | ||
|
|
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Create the parent workdir before mktemp -d.
mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX assumes <WORKDIR> already exists, so a clean run can fail before download/extraction starts. Add an explicit mkdir -p "<WORKDIR>" first.
Suggested fix
1. **Set up artifacts**:
+ - Ensure `<WORKDIR>` exists: `mkdir -p "<WORKDIR>"`.
- Local path (starts with `/`): use it as `<TMP>`. Skip step 2.
- URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Work Directory | |
| Compute once at the start by running `date +%y%m%d` and substituting into the path below. In all commands, replace `<WORKDIR>` with the computed path — do not store the work directory in a shell variable. | |
| ```text | |
| /tmp/microshift-ci-claude-workdir.<YYMMDD> | |
| ``` | |
| ## Common Commands | |
| Scan the build log for arbitrary text: | |
| ```bash | |
| grep '${SOME_TEXT}' ${GREP_OPTS} ${TMP}/build-log.txt | |
| ``` | |
| Download all prow job artifacts (only needed when given a URL, not a local path): | |
| ```bash | |
| GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') | |
| gsutil -q -m cp -r "${GCS_PATH}/" ${TMP}/ | |
| ``` | |
| `/tmp/microshift-ci-claude-workdir.<YYMMDD>` — compute `<YYMMDD>` once via `date +%y%m%d`. | |
| ## Workflow | |
| The user argument is: `<ARGUMENTS>` | |
| 0. **Determine input type and set up artifacts directory**: | |
| - If `<ARGUMENTS>` is a **local directory path** (starts with `/` and contains `build-log.txt`): set `TMP` to that directory. Skip step 1. | |
| - If `<ARGUMENTS>` is a **URL** (starts with `http`): create a temporary working directory with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`, set `TMP` to that directory, and proceed to step 1. | |
| 1. **Set up artifacts**: | |
| - Local path (starts with `/`): use it as `<TMP>`. Skip step 2. | |
| - URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`. | |
| ## Work Directory | |
| `/tmp/microshift-ci-claude-workdir.<YYMMDD>` — compute `<YYMMDD>` once via `date +%y%m%d`. | |
| ## Workflow | |
| The user argument is: `<ARGUMENTS>` | |
| 1. **Set up artifacts**: | |
| - Ensure `<WORKDIR>` exists: `mkdir -p "<WORKDIR>"`. | |
| - Local path (starts with `/`): use it as `<TMP>`. Skip step 2. | |
| - URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/microshift-ci/skills/prow-job/SKILL.md` around lines 22 - 33, The
workdir setup in the SKILL workflow can fail on a clean run because mktemp -d
assumes the parent directory already exists. Update the setup steps in SKILL.md
so the URL path branch explicitly creates the parent workdir first with mkdir -p
before calling mktemp -d, and keep the existing WORKDIR/TMP flow intact.
| ## Workflow | ||
|
|
||
| The user argument is: `<ARGUMENTS>` | ||
|
|
||
| 0. **Determine input type and set up artifacts directory**: | ||
| - If `<ARGUMENTS>` is a **local directory path** (starts with `/` and contains `build-log.txt`): set `TMP` to that directory. Skip step 1. | ||
| - If `<ARGUMENTS>` is a **URL** (starts with `http`): create a temporary working directory with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`, set `TMP` to that directory, and proceed to step 1. | ||
| 1. **Set up artifacts**: | ||
| - Local path (starts with `/`): use it as `<TMP>`. Skip step 2. | ||
| - URL: create `<TMP>` with `mktemp -d <WORKDIR>/openshift-ci-analysis-XXXX`. | ||
|
|
||
| 1. **Download all artifacts** (skip if using pre-downloaded artifacts from step 0): | ||
| Download all prow job artifacts using `gsutil -q -m cp -r` into the temporary working directory. Derive the GCS path by stripping the web prefix from the job URL (handles both Prow and GCS web URL formats): | ||
| 2. **Download** (URL only): | ||
|
|
||
| ```bash | ||
| GCS_PATH=$(echo "${PROW_URL}" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') | ||
| gsutil -q -m cp -r "${GCS_PATH}/" ${TMP}/ | ||
| GCS_PATH=$(echo "<URL>" | sed -e 's|https://prow.ci.openshift.org/view/gs/|gs://|' \ | ||
| -e 's|https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/|gs://|') | ||
| gsutil -q -m cp -r "${GCS_PATH}/" <TMP>/ | ||
| ``` | ||
|
|
||
| This works for both periodic (`logs/...`) and presubmit PR (`pr-logs/pull/...`) job URLs, and for both Prow and GCS web URL formats. | ||
| This makes all build logs, step logs, and SOS reports available locally for analysis. | ||
|
|
||
| 2. **Localize — identify the failed step and the anchor error**: | ||
| - Scan the top level `build-log.txt` to determine the step where the failure occurred (the last `Running step ...` line before the container logs is a quick anchor — see Tips), then open that step's own `build-log.txt`. | ||
| - Record each candidate error with its filepath, line number, and timestamp. Read 50 lines before and 50 lines after each to separate the fatal error from setup/teardown noise. | ||
| - Select the **anchor error**: the first fatal error that caused the step to fail. This becomes `raw_error` in the report. | ||
| - **The anchor identifies the failure for deduplication — it is NOT the conclusion of the investigation. The first error found is rarely the root cause.** | ||
|
|
||
| 3. **Characterize — establish exactly WHAT failed before asking why**: | ||
| - For test steps with scenarios: enumerate the failing tests from `scenario-info/<scenario>/junit.xml` under the step's artifacts, then read the failing scenario's `rf-debug.log` and `phase_*/` logs (Robot Framework marks failures with `| FAIL |`). Record the failing scenario name(s) — the top-level `testsuite name` in each junit.xml — they populate the `scenarios` field in the report. | ||
| - For each failing scenario, check the plain-text `journal_*.log` files (next to the sosreport tarballs) for fatal patterns (panics, OOM kills, `leader election lost`, container exits). If the journal shows container crashes or restarts, extract the specific sosreport tarball with `bash plugins/shared/scripts/extract-sosreport.sh <tarball>` and read the pod logs (see SOS Report section). | ||
| - For conformance steps: extract the failing test names and their failure output from the step's `build-log.txt`. | ||
| - For build/infra steps: extract the failing command and its complete error output from the step log. | ||
| - Record the failure timestamp(s) — they drive the journal and graph correlation in the next phase. | ||
| - When the MicroShift source checkout is available — check with Glob for `<WORKDIR>/src/microshift-release-<RELEASE>/` (release jobs) or `<WORKDIR>/src/microshift/` (main) — read the failing test's source: Robot Framework suites under `test/suites/`, scenario definitions under `test/scenarios*/`. Its assertions, timeouts, and setup are how you distinguish a test bug from a product bug. If the checkout is absent, note `"source checkout not available"` in `analysis_gaps` and continue. | ||
| - Decide the stack layer: cloud infra, ci-config, hypervisor, or a legitimate test failure — and for test failures, the stage: setup, testing, teardown. | ||
| 3. **Extract evidence**: | ||
|
|
||
| 4. **Drill down — iterate hypothesis → evidence until the cause is actionable**: | ||
| Repeat this loop until you reach a cause that is **actionable** (a specific code, configuration, test, or infrastructure problem someone can act on) or until the available evidence is exhausted: | ||
| - State a hypothesis for WHY the error in hand occurred. | ||
| - Seek confirming or refuting evidence ONE LAYER DEEPER than the current log: | ||
| - **Journal** — ALWAYS check the plain-text `journal_*.log` files for the scenario (see SOS Report section). Correlate with the failure timestamp (entries within ±5 minutes) and scan for OOM kills, segfaults, service restarts, and disk pressure. | ||
| - **Sosreport** — when the journal shows container crashes or restarts, extract the specific sosreport tarball with `bash plugins/shared/scripts/extract-sosreport.sh <tarball>` (see SOS Report section for how to pick the right one when several exist). Read the pod/container logs of the failing workload. | ||
| - **Performance graphs** — when the failure involves a timeout, slowness, readiness/health-check expiry, eviction, or any resource error, Read the PNGs (see Performance Graphs section) and look for saturation overlapping the failure window. | ||
| - Treat restating errors as symptoms: an error like "timed out waiting for X" is NOT a root cause — explain why X was slow or absent, or explicitly record that the evidence ran out. | ||
| - **A test-layer fix is never the bottom when a product component misbehaved.** When the failure involves a product component that was unavailable, not ready, crashed, or slow ("no endpoints available", "connection refused", "not ready", "CrashLoopBackOff", probe failures), you MUST reconstruct that component's story from the journal and its pod logs before concluding. Build an exact timestamped timeline: when was the pod created, when did each container start, when did it become ready, did probes fail afterwards, did it restart, and why. Only then attribute the failure: | ||
| - **Product defect** — the component became ready and later flapped, crashed, or stopped serving (e.g., readiness flips back to not-ready, liveness probe connection refused after startup, container exits and restarts). Report the product mechanism as the root cause even if a test-side wait would also "fix" the symptom. | ||
| - **Test defect** — the component was still starting up normally and the test simply ran too early against a documented startup sequence. | ||
| - **Always check for container restarts.** Grep the journal for repeated `Created container`/`Started container` (crio) and `RemoveContainer`/PLEG events (kubelet) for the same pod. Two container instances for one pod means the first one DIED — a single startup story is the wrong narrative. Extract the sosreport (`bash plugins/shared/scripts/extract-sosreport.sh <tarball>`) and read the dead container's log at `sos_commands/microshift/namespaces/<namespace>/pods/<pod>/<container>/<container>/logs/previous.log` (`current.log` is the running instance). The last ~20 lines of `previous.log` usually state the exit reason (fatal error, leader election lost, panic, OOM). | ||
| - Record every accepted hop as a causal-chain link with its evidence file and line — these become `causal_chain` in the report. Discarded hypotheses do not go into the chain. | ||
| ```bash | ||
| python3 plugins/shared/scripts/extract-evidence.py --artifacts-dir <TMP> --workdir <WORKDIR> | ||
| ``` | ||
|
|
||
| 5. **Corroborate — cross-check the explanation**: | ||
| - When the source checkout is available, list commits from the last month that could be related: | ||
| Produces `<WORKDIR>/evidence/evidence-<BUILD_ID>.json`. The `<BUILD_ID>` is the last path component of `<TMP>`. | ||
|
|
||
| ```text | ||
| bash plugins/microshift-ci/scripts/repo-log.sh <SRC_DIR> --since <1_MONTH_BEFORE_FINISHED> --until <FINISHED_DATE> --paths test/ | ||
| ``` | ||
| 4. **Analyze**: Read `plugins/microshift-ci/agents/analyze-evidence.md`. Substitute placeholders: | ||
|
|
||
| Derive `FINISHED_DATE` from the job's `finished.json` timestamp. Drop `--paths` to see all changes. Name candidate commits in the causal chain when their timing and touched paths match the failure. | ||
| - If multiple scenarios in this job failed, decide cascade vs independent using the **timeline** (which failed first; did the earlier failure poison shared state?), not just error-text similarity. | ||
| | Placeholder | Value | | ||
| |---|---| | ||
| | `{EVIDENCE_PACK}` | `<WORKDIR>/evidence/evidence-<BUILD_ID>.json` | | ||
| | `{JOB_NAME}` | job name extracted from URL or directory path | | ||
| | `{JOB_URL}` | the original URL (or reconstruct from artifacts path) | | ||
| | `{OUTPUT_FILE}` | `<WORKDIR>/report-<BUILD_ID>.txt` | | ||
|
|
||
| 6. **Produce a report**: Create a concise report of the failure. The report MUST specify: | ||
| - Where in the pipeline the error occurred | ||
| - The specific step the error occurred in | ||
| - Whether the test failure was legitimate (i.e., a test failed) or due to an infrastructure failure (i.e., build image was not found, AWS infra failed due to quota, hypervisor failed to create test host VM, etc.) | ||
| - The causal chain from the observed symptom to the root cause, each link backed by evidence (file and line) | ||
| - A confidence rating for the root cause (see the field rules below) | ||
| Spawn the agent with the substituted content. When it replies `DONE`, read the output file and present the report to the user. | ||
|
|
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Restore inline stop conditions for the agent workflow.
This compact rewrite no longer says what to do when gsutil, evidence extraction, or the agent step fails, or when the agent never returns DONE. Per CONTRIBUTING.md: “Steps/workflow must be numbered and actionable with clear stop conditions” and “Orchestrator skills should include output structure plus Edge Cases and guard checks for sub-agent output.” Based on learnings, keep those guards inline with the relevant step rather than moving them to a separate section.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/microshift-ci/skills/prow-job/SKILL.md` around lines 26 - 60, The
workflow in SKILL.md is missing inline stop conditions for failure cases and the
agent completion check. Update the numbered steps around the artifact download,
evidence extraction, and analyze stages to explicitly say what to do if gsutil
fails, extract-evidence.py fails, or the spawned agent does not return DONE,
using the existing workflow structure and the analyze-evidence.md-driven agent
step as the anchor. Keep the guard conditions adjacent to their relevant step,
and make the steps actionable with clear fallback/abort behavior rather than
deferring these checks to a separate section.
Sources: Path instructions, Learnings
| refs = prowjob.get("spec", {}).get("refs") or prowjob.get("spec", {}).get("extra_refs", [{}])[0] if prowjob else {} | ||
| if isinstance(refs, dict): | ||
| org = refs.get("org", "") | ||
| repo = refs.get("repo", "") | ||
| if job_name and build_id: | ||
| pulls = refs.get("pulls", []) | ||
| if pulls: | ||
| pr_num = pulls[0].get("number", "") | ||
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr_num}/{job_name}/{build_id}" | ||
| else: | ||
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/logs/{job_name}/{build_id}" |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="plugins/shared/scripts/extract-evidence.py"
printf '\n== Context around lines 182-192 ==\n'
nl -ba "$FILE" | sed -n '165,205p'
printf '\n== Search for refs/extra_refs usage ==\n'
rg -n "extra_refs|refs =" "$FILE"
printf '\n== Small Python probe for the expression shape ==\n'
python3 - <<'PY'
cases = [
{"prowjob": None},
{"prowjob": {"spec": {}}},
{"prowjob": {"spec": {"refs": {"org": "o"}}}},
{"prowjob": {"spec": {"extra_refs": []}}},
{"prowjob": {"spec": {"extra_refs": [{}]}}},
{"prowjob": {"spec": {"extra_refs": [{"org": "o"}]}}},
]
for case in cases:
prowjob = case["prowjob"]
try:
refs = prowjob.get("spec", {}).get("refs") or prowjob.get("spec", {}).get("extra_refs", [{}])[0] if prowjob else {}
print(case, "=>", refs)
except Exception as e:
print(case, "=>", type(e).__name__, e)
PYRepository: openshift-eng/edge-tooling
Length of output: 240
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="plugins/shared/scripts/extract-evidence.py"
printf '\n== Context around lines 175-205 ==\n'
awk 'NR>=175 && NR<=205 { printf "%d:%s\n", NR, $0 }' "$FILE"
printf '\n== Exact refs/extra_refs expression ==\n'
grep -n "extra_refs\|refs =" "$FILE" || true
printf '\n== Python probe for expression behavior ==\n'
python3 - <<'PY'
cases = [
("no prowjob", None),
("empty spec", {"spec": {}}),
("refs present", {"spec": {"refs": {"org": "o"}}}),
("extra_refs empty", {"spec": {"extra_refs": []}}),
("extra_refs one empty dict", {"spec": {"extra_refs": [{}]}}),
("extra_refs one entry", {"spec": {"extra_refs": [{"org": "o"}]}}),
]
for label, prowjob in cases:
try:
refs = prowjob.get("spec", {}).get("refs") or prowjob.get("spec", {}).get("extra_refs", [{}])[0] if prowjob else {}
print(label, "=>", refs)
except Exception as e:
print(label, "=>", type(e).__name__, str(e))
PYRepository: openshift-eng/edge-tooling
Length of output: 2026
Guard extra_refs before indexing
A present-but-empty extra_refs: [] still reaches [0] and raises IndexError in single-mode jobs. Split this into two steps so the fallback checks extra_refs before indexing.
Suggested guard
- refs = prowjob.get("spec", {}).get("refs") or prowjob.get("spec", {}).get("extra_refs", [{}])[0] if prowjob else {}
+ spec = prowjob.get("spec", {}) if prowjob else {}
+ extra_refs = spec.get("extra_refs") or []
+ refs = spec.get("refs") or (extra_refs[0] if extra_refs else {})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| refs = prowjob.get("spec", {}).get("refs") or prowjob.get("spec", {}).get("extra_refs", [{}])[0] if prowjob else {} | |
| if isinstance(refs, dict): | |
| org = refs.get("org", "") | |
| repo = refs.get("repo", "") | |
| if job_name and build_id: | |
| pulls = refs.get("pulls", []) | |
| if pulls: | |
| pr_num = pulls[0].get("number", "") | |
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr_num}/{job_name}/{build_id}" | |
| else: | |
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/logs/{job_name}/{build_id}" | |
| spec = prowjob.get("spec", {}) if prowjob else {} | |
| extra_refs = spec.get("extra_refs") or [] | |
| refs = spec.get("refs") or (extra_refs[0] if extra_refs else {}) | |
| if isinstance(refs, dict): | |
| org = refs.get("org", "") | |
| repo = refs.get("repo", "") | |
| if job_name and build_id: | |
| pulls = refs.get("pulls", []) | |
| if pulls: | |
| pr_num = pulls[0].get("number", "") | |
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr_num}/{job_name}/{build_id}" | |
| else: | |
| job_url = f"https://prow.ci.openshift.org/view/gs/test-platform-results/logs/{job_name}/{build_id}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/shared/scripts/extract-evidence.py` around lines 182 - 192, The refs
selection logic in extract-evidence.py can still index into an empty extra_refs
array and raise IndexError in single-mode jobs. Refactor the refs assignment
near the prowjob handling to check whether extra_refs is present and non-empty
before accessing its first element, and keep the existing fallback behavior when
spec.refs is missing. Use the refs, prowjob, and extra_refs lookup path to
locate the fix.
| if anchor_line_idx >= 0: | ||
| start = max(0, anchor_line_idx - 5) | ||
| end = min(len(lines), anchor_line_idx + 6) | ||
| context_lines = [l.rstrip() for l in lines[start:end]] |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Ruff violations — the file will fail ruff validation.
- Lines 330 & 679: ambiguous variable name
l(E741). - Line 535:
infra_fail_countunpacked but unused (RUF059); prefix with_. - Lines 1006, 1008, 1010, 1012, 1014: multiple statements per line via
;(E702).
Per CONTRIBUTING.md Code Standards: "Python — Must pass ruff".
Also applies to: 535-535, 679-679, 1006-1014
🧰 Tools
🪛 Ruff (0.15.20)
[error] 330-330: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/shared/scripts/extract-evidence.py` at line 330, The script has Ruff
violations that will fail validation: rename the ambiguous comprehension
variable `l` in `extract-evidence.py` to a clearer name in the `context_lines`
assignment and the other similar spot, prefix the unused unpacked
`infra_fail_count` with `_` where it is assigned, and split the chained
statements separated by semicolons into separate lines in the affected block.
Update the relevant logic in `extract-evidence.py` around `context_lines`,
`infra_fail_count`, and the code near the multiple-statement lines so the file
passes Ruff.
Sources: Coding guidelines, Linters/SAST tools
| json_text = m.group(1) | ||
| # LLM agents sometimes copy raw control characters (tabs, CRs) from | ||
| # build logs into JSON string values. Sanitize before parsing. | ||
| json_text = json_text.replace('\t', '\\t').replace('\r', '\\r') |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
sed -n '1,140p' plugins/shared/scripts/parse.py
printf '\n---\n'
rg -n "parse\.py|json_text\.replace|replace\('\\t'" -S plugins tests . -g '!**/.git/**'Repository: openshift-eng/edge-tooling
Length of output: 5882
🏁 Script executed:
python3 - <<'PY'
import json
original = '{\n\t"a": 1,\n\t"b": 2\n}'
sanitized = original.replace('\t', '\\t').replace('\r', '\\r')
print("original parses:", json.loads(original))
try:
json.loads(sanitized)
print("sanitized parses: yes")
except json.JSONDecodeError as e:
print("sanitized parses: no")
print(type(e).__name__, e.msg)
PYRepository: openshift-eng/edge-tooling
Length of output: 285
Avoid escaping tabs before parsing. json_text.replace('\t', '\\t') rewrites valid JSON whitespace, so tab-indented payloads become invalid and json.loads() fails. Escape control chars only inside string values, or remove the pre-parse rewrite.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/shared/scripts/parse.py` at line 51, The pre-parse normalization in
parse.py is escaping tabs in the whole JSON payload, which breaks valid
tab-indented JSON before json.loads() runs. Update the parse flow in the
json_text handling to stop rewriting all tab characters globally; instead, only
handle control characters inside string values or remove the tab replacement
entirely while keeping the existing JSON parsing path intact.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes