Skip to content

fix(ci): prevent spurious CI Summary failures after force-push [OS-616]#813

Merged
galligan merged 1 commit into
mainfrom
os-616-ci-summary-job-fails-spuriously-after-force-push-due-to
Mar 26, 2026
Merged

fix(ci): prevent spurious CI Summary failures after force-push [OS-616]#813
galligan merged 1 commit into
mainfrom
os-616-ci-summary-job-fails-spuriously-after-force-push-due-to

Conversation

@galligan

@galligan galligan commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary

CI Summary job fails intermittently even when all individual jobs pass. Observed 3-4 times in a single session across PRs #806, #810, #811, #812.

Root cause: When gt submit force-pushes a branch, GitHub Actions cancels the in-flight run. The ci-summary job checks contains(needs.*.result, 'cancelled'), which catches cancelled jobs from the superseded run — not actual failures.

Fixes https://linear.app/outfitter/issue/OS-616/ci-summary-job-fails-spuriously-after-force-push-due-to-cancelled-job

What changed

.github/workflows/ci.yml:

  1. Dropped cancelled from the failure check — only failure triggers the exit. Cancelled jobs from superseded runs aren't real failures.

  2. Added concurrency groupci-${{ github.ref }} with cancel-in-progress: true ensures only one CI run per branch exists at a time, preventing the race structurally.

Test plan

  • Workflow YAML is syntactically valid
  • The concurrency group scopes to the branch ref, so main and PR branches don't cancel each other
  • cancel-in-progress: true only cancels runs on the same ref (safe for stacked PRs on different branches)

🤘🏻 In-collaboration-with: Claude Code

@linear

linear Bot commented Mar 25, 2026

Copy link
Copy Markdown

galligan commented Mar 25, 2026

Copy link
Copy Markdown
Contributor Author

@galligan galligan marked this pull request as ready for review March 25, 2026 23:42
@greptile-apps

greptile-apps Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes intermittent CI Summary false failures caused by a race between gt submit force-pushes and the ci-summary job's cancelled-result check. It applies two complementary fixes to .github/workflows/ci.yml:

  • Concurrency group (ci-${{ github.ref }}, cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}): structurally ensures only one CI run is alive per PR branch at a time, cancelling the old run before it can reach ci-summary.
  • !cancelled() guard on the "Check for failures" step: a belt-and-suspenders layer that prevents ci-summary from emitting a failure status in the edge case where the job is itself being cancelled mid-execution.

Key observations:

  • The concurrency expression correctly excludes main (refs/heads/main) so post-merge CI always runs to completion.
  • Stacked PRs on different branches are unaffected because each PR has a unique refs/pull/{number}/merge ref.
  • Genuine timeout failures (which surface as cancelled on dependency jobs) are still correctly caught, because contains(needs.*.result, 'cancelled') remains in the condition — !cancelled() only suppresses the step when ci-summary itself is cancelled.
  • The PR description's first bullet ("Dropped cancelled from the failure check") is factually incorrect; cancelled was not removed from the dependency-result check — only !cancelled() was added as the current-job guard.

Confidence Score: 4/5

Safe to merge — logic is sound, all failure scenarios are correctly handled, and the concurrency group protects main.

The implementation correctly addresses the root cause via the concurrency group and adds a valid belt-and-suspenders guard with !cancelled(). Genuine failures (test failures, timeouts) continue to be caught. The only issue is a factual inaccuracy in the PR description that could mislead future maintainers.

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/ci.yml Adds a concurrency group to cancel in-progress PR runs on force-push (protecting main), and adds !cancelled() to the "Check for failures" step so a cancelled ci-summary job doesn't emit a false failure status.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub Actions
    participant OldRun as Old CI Run
    participant NewRun as New CI Run
    participant Summary as ci-summary

    Note over Dev,Summary: After fix — force-push with concurrency group

    Dev->>GH: gt submit (force-push)
    GH->>OldRun: Trigger new run
    GH->>OldRun: Cancel old run (concurrency group)
    OldRun-->>Summary: cancelled() = true
    Note over Summary: !cancelled() = false<br/>exit 1 suppressed ✅
    GH->>NewRun: New run proceeds normally
    NewRun->>Summary: All deps succeed
    Note over Summary: !cancelled() = true<br/>No failures → passes ✅

    Note over Dev,Summary: Genuine failure (timeout or test failure)

    NewRun->>Summary: dep result = cancelled (timeout) or failure
    Note over Summary: !cancelled() = true<br/>exit 1 fires ✅ (real failure caught)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/ci.yml
Line: 333-334

Comment:
**PR description vs. actual change**

The first bullet in the PR description says "**Dropped `cancelled` from the failure check**," but the code still includes `contains(needs.*.result, 'cancelled')`. What was actually added is the `!cancelled()` guard, which is a meaningfully different fix: it suppresses the failure step only when `ci-summary` *itself* is being cancelled (e.g. by the concurrency group), while still catching genuine `cancelled` results from dependency jobs like timeouts.

This distinction matters for future maintainers — a reader following the description might mistakenly "restore" the `cancelled` clause thinking it was accidentally lost. Worth correcting the description to reflect the real change.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (10): Last reviewed commit: "fix(ci): prevent spurious CI Summary fai..." | Re-trigger Greptile

Comment thread .github/workflows/ci.yml Outdated
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch from fcc26b6 to 68a5c67 Compare March 25, 2026 23:55
@galligan galligan force-pushed the os-605-surface-layer-output-handling-make-buildclicommands-handle branch from 8cbabff to 7754919 Compare March 26, 2026 00:08
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch 2 times, most recently from e4e4a27 to 9afd33e Compare March 26, 2026 02:35
@galligan galligan force-pushed the os-605-surface-layer-output-handling-make-buildclicommands-handle branch from 7754919 to 13e0b46 Compare March 26, 2026 02:35
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch from 9afd33e to ab962e7 Compare March 26, 2026 15:41
@galligan galligan force-pushed the os-605-surface-layer-output-handling-make-buildclicommands-handle branch from 13e0b46 to 70904d5 Compare March 26, 2026 15:41
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch from ab962e7 to 35f2b34 Compare March 26, 2026 16:20
@galligan galligan force-pushed the os-605-surface-layer-output-handling-make-buildclicommands-handle branch from 70904d5 to 6859c0f Compare March 26, 2026 16:20
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch from 35f2b34 to f219b3e Compare March 26, 2026 21:34

galligan commented Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

Merge activity

  • Mar 26, 11:48 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Mar 26, 11:54 PM UTC: Graphite rebased this pull request as part of a merge.
  • Mar 26, 11:55 PM UTC: @galligan merged this pull request with Graphite.

@galligan galligan changed the base branch from os-605-surface-layer-output-handling-make-buildclicommands-handle to graphite-base/813 March 26, 2026 23:49
@galligan galligan changed the base branch from graphite-base/813 to main March 26, 2026 23:53
@galligan galligan force-pushed the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch from f219b3e to 32b711e Compare March 26, 2026 23:54
@galligan galligan merged commit abc8459 into main Mar 26, 2026
12 checks passed
@galligan galligan deleted the os-616-ci-summary-job-fails-spuriously-after-force-push-due-to branch March 26, 2026 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci-cd CI/CD pipelines area/infra Infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant