ci(trigger): tolerate per-call HTTP 401 in cleanup step (fixes #5811) by renecannao · Pull Request #5812 · sysown/proxysql

renecannao · 2026-05-23T15:46:30Z

Summary

The "Trigger workflow_run[completed]" step in .github/workflows/ci-trigger.yml opportunistically deletes skipped runs as housekeeping. The original implementation:

gh run list -s skipped -L 100 --json databaseId -q '.[].databaseId' \
    | xargs -r -n1 gh run delete

xargs -r -n1 propagates the first non-zero exit. Some gh run delete calls return HTTP 401: Bad credentials (typically on runs from forked PRs, other branches, or runs the PR-scoped GITHUB_TOKEN doesn't have rights to). One 401 → xargs exits 123 → step fails → CI-trigger job concludes failure → every downstream test workflow gated on workflow_run.conclusion == 'success' (legacy-g*, mysql84-*, mysql84-gr-*, set_parser_algorithm_3-g1, …) silently skips.

This has been today's recurring "why is CI failing on my PR" — hit at least PR #5803, PR #5809 (multiple times), and is documented with the smoking-gun log lines in #5811.

The fix

Replace xargs with a while read loop so per-call failures are caught individually, logged, and don't abort the step:

-        gh run list -s skipped -L 100 --json databaseId -q '.[].databaseId' \
-            | xargs -r -n1 gh run delete
+        gh run list -s skipped -L 100 --json databaseId -q '.[].databaseId' \
+            | while read -r id; do
+                if ! gh run delete "$id" 2>&1; then
+                    echo "  skip run $id (no rights / already gone)"
+                fi
+            done

What changes

Cleanup scope: unchanged (still repo-wide, still the most recent 100 skipped runs).
Error handling: per-call failures (auth, already-deleted, etc.) are now caught with a single log line each, instead of aborting the entire step.
Job conclusion: CI-trigger now reflects whether the trigger worked, not whether housekeeping was completely clean.

Test plan

Diff is minimal — only the error-handling shape of the cleanup pipeline.
Verified by next PR push that exercises CI-trigger — the cleanup step should run cleanly (or log "skip" lines for runs it can't delete) and the job should conclude success instead of failure.

Alternatives considered (from #5811)

Scope cleanup to current branch — cleaner long-term (eliminates the 401s instead of tolerating them), but changes behavior. Worth doing as a follow-up.
continue-on-error: true on the step — cheapest, but loses signal entirely.
Delete the cleanup step — relies on GitHub's retention. Worth considering if the housekeeping isn't actually needed.

Going with the minimal-behavior-change option (per-call tolerance) here. (1) can land as a separate PR if desired.

Summary by CodeRabbit

Chores
- Enhanced CI workflow robustness by improving the housekeeping process for skipped workflow runs with more graceful error handling.

`CI-trigger`'s "Trigger workflow_run[completed]" step does opportunistic housekeeping by listing every skipped run in the repo and trying to delete each one. Some of those runs (typically from forked PRs, other branches, or older retention windows) are not deletable by the PR-scoped `GITHUB_TOKEN`, which surfaces as `HTTP 401: Bad credentials` on the individual `gh run delete` call. The original pipeline used `xargs -r -n1 gh run delete`, which propagates the first non-zero exit and fails the whole step. That in turn marks the `CI-trigger` job conclusion as `failure`, and every downstream test workflow that gates on `workflow_run.conclusion == 'success'` (legacy-g*, mysql84-*, mysql84-gr-*, set_parser_algorithm_3-g1, ...) is silently skipped. Net result: a non-critical housekeeping failure blocks the entire test matrix for the PR. It's been the recurring cause of "why is CI failing on my PR" today, biting PR #5803, PR #5809, and others. Switch from `xargs` to a `while read` loop so per-call failures are caught individually, logged for visibility, and don't abort the step. Cleanup scope is unchanged (still repo-wide, still capped at the most recent 100 skipped runs) -- only the error-handling behavior changes.

gemini-code-assist · 2026-05-23T15:46:36Z

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

coderabbitai · 2026-05-23T15:46:44Z

📝 Walkthrough

Walkthrough

The CI-trigger workflow's housekeeping cleanup step has been refactored to tolerate permission errors when deleting skipped workflow runs. The single xargs pipeline is replaced with an explicit loop that handles deletion failures gracefully per run instead of failing the entire step.

Changes

CI Housekeeping Robustness

Layer / File(s)	Summary
Skipped run deletion error handling `.github/workflows/ci-trigger.yml`	Skipped-run deletion logic is reworked from a compact `xargs` pipeline into a guarded per-run deletion loop with explanatory comments. Individual deletion failures (e.g., HTTP 401 due to insufficient token permissions) are caught and logged as "skip run" messages instead of causing the entire cleanup step to fail.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A cleanup that stumbled on each stubborn run,
Now loops through with grace when delete won't play nice,
No more 401 blocking the fun,
The CI pipeline finally rolls the dice. 🐰✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main change: replacing xargs with a loop that tolerates HTTP 401 failures in the cleanup step, directly addressing the linked issue.
Linked Issues check	✅ Passed	The PR implements the second preferred fix from issue `#5811`: tolerating per-call auth failures by replacing xargs with a while-read loop that catches failures and continues deletion.
Out of Scope Changes check	✅ Passed	The PR is narrowly scoped to the cleanup step in ci-trigger.yml; no unrelated changes or ancillary modifications are present.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ci-trigger-tolerate-cleanup-401

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

.github/workflows/ci-trigger.yml (2)
50-55: ⚡ Quick win

Consider protecting against list command failures for complete robustness.

The current implementation correctly tolerates individual gh run delete failures via the if ! ... then construct. However, with GitHub Actions' default set -eo pipefail, if the gh run list command itself fails (due to transient API errors, network issues, etc.), the pipeline will still fail and mark the CI-trigger job as failed.

Since the PR describes this as "opportunistic housekeeping" that must not block downstream workflows, consider adding full protection:
🛡️ Option 1: Add `|| true` to tolerate any pipeline failure
 gh -R ${{ github.repository }} run list -s skipped -L 100 --json databaseId -q '.[].databaseId' \
     | while read -r id; do
         if ! gh -R ${{ github.repository }} run delete "$id" 2>&1; then
             echo "  skip run $id (no rights / already gone)"
         fi
-    done
+    done || true
🛡️ Option 2: Disable exit-on-error for the entire step
 echo "Cleanup skipped ..."
+set +e
 # Housekeeping only: opportunistically delete skipped runs. Per-call
Both approaches ensure that even if listing fails, the CI-trigger job succeeds and downstream test workflows can proceed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-trigger.yml around lines 50 - 55, The gh run list
pipeline can fail and cause the step to exit under set -eo pipefail; modify the
step so a failing list does not break the job by making the pipeline tolerate
failure (e.g., ensure the gh -R ... run list ... | while ... command returns
success even if the list fails by appending a safe-fallback like "|| true" to
the pipeline or temporarily disabling exit-on-error around the sequence with
"set +e" before and "set -e" after), keeping the existing per-id deletion check
(the if ! gh -R ... run delete "$id" ... then) intact; target the gh run list
invocation and the surrounding shell step to implement this resilience.
52-52: 💤 Low value

Optional: Consider suppressing GitHub CLI error output for cleaner logs.

The 2>&1 redirection merges stderr to stdout, so both the GitHub CLI error message and your custom skip message will be printed for each failed deletion:
Error: HTTP 401: Bad credentials (https://api.github.com/...)
  skip run 12345 (no rights / already gone)
If you prefer less verbose output, you could suppress the CLI's error and show only your message:
if ! gh -R ${{ github.repository }} run delete "$id" 2>/dev/null; then
However, keeping the actual error visible can be helpful for debugging, so the current approach is reasonable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci-trigger.yml at line 52, The gh CLI delete command
currently redirects stderr into stdout (`2>&1`) which causes both the CLI error
and your custom skip message to appear; to suppress the CLI error and show only
your skip message, change the redirection on the `gh -R ${{ github.repository }}
run delete "$id"` invocation from `2>&1` to `2>/dev/null` so stderr is
discarded, leaving your custom skip message as the sole output when deletion
fails.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/ci-trigger.yml:
- Around line 50-55: The gh run list pipeline can fail and cause the step to
exit under set -eo pipefail; modify the step so a failing list does not break
the job by making the pipeline tolerate failure (e.g., ensure the gh -R ... run
list ... | while ... command returns success even if the list fails by appending
a safe-fallback like "|| true" to the pipeline or temporarily disabling
exit-on-error around the sequence with "set +e" before and "set -e" after),
keeping the existing per-id deletion check (the if ! gh -R ... run delete "$id"
... then) intact; target the gh run list invocation and the surrounding shell
step to implement this resilience.
- Line 52: The gh CLI delete command currently redirects stderr into stdout
(`2>&1`) which causes both the CLI error and your custom skip message to appear;
to suppress the CLI error and show only your skip message, change the
redirection on the `gh -R ${{ github.repository }} run delete "$id"` invocation
from `2>&1` to `2>/dev/null` so stderr is discarded, leaving your custom skip
message as the sole output when deletion fails.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e90193c2-48be-4f2f-8044-f44b29bd8c59

📥 Commits

Reviewing files that changed from the base of the PR and between 4b53979 and 4f041de.

📒 Files selected for processing (1)

.github/workflows/ci-trigger.yml

📜 Review details

🔇 Additional comments (1)

.github/workflows/ci-trigger.yml (1)

43-49: LGTM!

sonarqubecloud · 2026-05-23T15:53:41Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

PR #5812 (merged into GH-Actions) fixes the recurring HTTP 401 from CI-trigger.yml's cleanup step. This commit exists only to force a fresh CI run so PR #5809 picks up the new ci-trigger.yml and stops flipping the whole test matrix to red because one housekeeping API call lost the auth lottery. No source changes.

renecannao mentioned this pull request May 23, 2026

CI-trigger.yml: cleanup step fails with HTTP 401 on gh run delete, blocking downstream test workflows #5811

Open

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

renecannao merged commit 350c2b4 into GH-Actions May 23, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(trigger): tolerate per-call HTTP 401 in cleanup step (fixes #5811)#5812

ci(trigger): tolerate per-call HTTP 401 in cleanup step (fixes #5811)#5812
renecannao merged 1 commit into
GH-Actionsfrom
ci-trigger-tolerate-cleanup-401

renecannao commented May 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

renecannao commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The fix

What changes

Test plan

Alternatives considered (from #5811)

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 23, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renecannao commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading