-
Notifications
You must be signed in to change notification settings - Fork 14
USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4bdb88b
1ac3d17
93c4df0
058246e
3491f0a
14c86b4
04c2b97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../shared/scripts/doctor-analyze.js |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../shared/scripts/doctor-analyze.js |
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This essentially locks your skills to Claude Code which may be fine if its significantly improving your results. This doesn't strike me as a substantial workflow, though -- what is this getting you that driving Claude via openshift-eng/ai-helpers#545 should make the unstable version of Claude available in ai-helpers if you'd like to use it (I'd like to keep stable because unstable Claude Code has broken the payload agent several times....)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made use of the latest Claude version in a draft openshift/release#80503 PR and it works fine - thank you. For the reasoning of this change. Using Sub-Agents is not reliable for 2 main reasons:
Both of these scenarios sound borderline, but they happen with surprising consistency once the job runs "frequently enough"
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other than using Sub-Agents, we're considering to rewrite the orchestration into a deterministic script and run Claude as sub-processes (this is probably what you refer to when you say "claude -p"). This approach has its own downsides, but probably it's the only way to make the flow reliable. We will be working on this some time soon. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| export const meta = { | ||
| name: 'doctor-analyze', | ||
| description: 'Analyze CI jobs in parallel via per-job prow-job skill invocations', | ||
| phases: [ | ||
| { title: 'Analyze', detail: 'Per-job root cause analysis' }, | ||
| ], | ||
| } | ||
|
|
||
| // args: { | ||
| // jobs: [{ artifacts_dir: string, output_path: string, label: string }], | ||
| // prow_job_skill: string, // e.g. "/lvms-ci:prow-job" or "/microshift-ci:prow-job" | ||
| // } | ||
|
|
||
| // Defend against the model passing args as a JSON string instead of an object | ||
| const a = typeof args === 'string' ? JSON.parse(args) : args | ||
|
|
||
| if (!a || !Array.isArray(a.jobs)) { | ||
| log('ERROR: args.jobs is missing or not an array') | ||
| return { analyzed: 0, failed: 0, total: 0, error: 'args.jobs is missing or not an array' } | ||
| } | ||
| if (!a.prow_job_skill) { | ||
| log('ERROR: args.prow_job_skill is missing') | ||
| return { analyzed: 0, failed: 0, total: 0, error: 'args.prow_job_skill is missing' } | ||
| } | ||
|
|
||
| phase('Analyze') | ||
| log('Analyzing ' + a.jobs.length + ' jobs in parallel...') | ||
|
ggiguash marked this conversation as resolved.
|
||
|
|
||
| const results = await parallel(a.jobs.map(function(job) { | ||
| return function() { | ||
| return agent( | ||
| 'Analyze this Prow job and save the report:\n' + | ||
| '1. Run ' + a.prow_job_skill + ' ' + job.artifacts_dir + '\n' + | ||
| '2. After the analysis completes, save the FULL report output' + | ||
| ' (including the --- STRUCTURED SUMMARY --- block) to:\n' + | ||
| ' ' + job.output_path + '\n' + | ||
| ' Use the Write tool to save the file.' + | ||
| ' The file must contain the complete analysis report.', | ||
| { label: job.label, phase: 'Analyze' } | ||
| ) | ||
| } | ||
| })) | ||
|
|
||
| const analyzed = results.filter(function(r) { return r != null }).length | ||
| const failed = results.length - analyzed | ||
| if (failed > 0) { | ||
| log('Analysis complete: ' + analyzed + '/' + results.length + ' jobs analyzed, ' + failed + ' failed') | ||
| } else { | ||
| log('Analysis complete: all ' + analyzed + ' jobs analyzed') | ||
| } | ||
|
|
||
| return { | ||
| analyzed: analyzed, | ||
| failed: failed, | ||
| total: results.length, | ||
| } | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
Uh oh!
There was an error while loading. Please reload this page.