feat(ppl): optimize init-job checkout (blobless + sparse) behind a feature flag#1063
Merged
Conversation
… checks The pipeline initialization (compilation) job only needs the pipeline YAML and the Git history (trees/commits, used by `change_in`) to compile the pipeline - not the full repository working tree. For large repositories the full checkout dominates the init job runtime (cloning hundreds of MiB and materializing tens of thousands of files). When there are no pre-flight checks, instruct `checkout` (via the SEMAPHORE_GIT_PARTIAL_CLONE_FILTER and SEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS env vars) to perform a blobless partial clone with a sparse working tree limited to the pipeline directory. When pre-flight checks are configured, their custom commands run after the predefined ones and may rely on the full working tree, so the standard full checkout is kept. Note: this relies on the corresponding toolbox `checkout` support and the spc `commands_file` on-demand fetch. Per-organization feature-flag gating is added in a follow-up commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the FeatureProvider/FeatureHub stack into plumber and gate the blobless + sparse init-job checkout behind the per-organization `sparse_checkout_init_job` feature flag. The optimization now applies only when both hold: there are no pre-flight checks AND the feature is enabled for the organization. The check fails closed (missing org id or unreachable Feature service => standard full checkout). - proto: generate InternalApi.Feature stubs; add INTERNAL_API_LOCAL_PATH support to the proto Makefile so they can be regenerated from a local internal_api checkout, plus a feature.proto generation step. - ppl: add feature_provider dep (pin yaml_elixir ~> 1.3 via override since definition_validator requires 1.x and we only use the FeatureHub provider). - Ppl.FeatureClient: gRPC client for the Feature service (INTERNAL_API_URL_FEATURE). - Ppl.FeatureHubProvider: FeatureProvider.Provider backed by the Feature service. - Ppl.Features: thin, fail-closed wrapper exposing sparse_checkout_init_job_enabled?/1. - Ppl.Application: init FeatureProvider and start the :feature_cache Cachex. - config: FeatureHub provider with CachexCache (no cache in test env). - tests: Feature gRPC mock + coverage for the gate decision and provider path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e_provider feature_provider lives at the repository root, outside the plumber/ tree, so it was not reachable from ppl's previous Docker build context (plumber/), breaking `mix deps.get` in the image build. Move ppl's build context to the repository root (matching front/zebra/secrethub) and prefix the Dockerfile COPY paths with plumber/, plus COPY feature_provider into the image. Update the Makefile build path and docker-compose context/dockerfile accordingly. Validated locally by building both the dev (mix compile) and prod (mix release) image targets. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The integration test restored INTERNAL_API_URL_ARTIFACTHUB / _PROJECT in on_exit via System.put_env/2, which raises when the previous value was nil (the vars are not set globally; they depend on test ordering/sharding). Guard the restore so a nil previous value deletes the env var instead of crashing. Surfaced after a new test file shifted integration sharding. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced Jun 15, 2026
Merged
The feature flag check runs in the pipeline initialization hot path (the compile task is built and awaited inside a deadline-bounded looper step). Harden it so Feature service availability cannot couple into init latency: - Ppl.FeatureClient: add a 1s per-call gRPC deadline and tighten the Wormhole backstop to 1.5s, so the call fails closed fast instead of blocking the init path for seconds when feature-hub is slow/unreachable. - Ppl.Features: memoize the (fail-closed) boolean result for 30s in the :feature_cache. FeatureProvider only caches successful responses, so without this a feature-hub blip would make every init pay the timeout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
skipi
added a commit
to semaphoreci/toolbox
that referenced
this pull request
Jun 23, 2026
…#542) ## What Adds an opt-in "optimized" checkout path to the `checkout` toolbox command, for callers that only need the Git history (commits/trees) and a subset of the working tree rather than the full repository. It is controlled by two env vars and is fully backwards compatible — when neither is set, `checkout` behaves exactly as before. ``` SEMAPHORE_GIT_PARTIAL_CLONE_FILTER e.g. "blob:none" -> git clone --filter=... SEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS e.g. ".semaphore" -> cone-mode sparse checkout ``` When either is set, `checkout` performs a `--no-checkout` (optionally filtered) clone and a cone-mode sparse checkout limited to the requested paths, handling push/branch, pull-request, and tag refs. For large repositories this avoids downloading blob content and materializing tens of thousands of files when only a small subset is needed. ## Changes - `libcheckout`: new `checkout::optimized` / `checkout::configure_sparse` paths, dispatched from `checkout()` only when the new env vars are present. Existing `shallow` / `refbased` / `use-cache` paths are untouched. - **Graceful fallback:** `git sparse-checkout` (cone mode) requires git >= 2.25. On older clients the optimized path detects the missing subcommand and falls back to a standard checkout, so the request degrades into a correct (non-optimized) clone instead of a full checkout from a blobless clone. - `tests/libcheckout.bats`: capability-aware tests for push/branch, PR and tag — asserting the sparse working tree where supported and the full-tree fallback where not. - CI: the macOS `xcode26 arm` block's prologue ran `brew upgrade ruby-build`, which newer Homebrew turns into an interactive `[y/n]` prompt that blocked on stdin and stopped the block (also broken on `master`). Pipe `yes` so it proceeds non-interactively. ## Testing - bats `libcheckout` suite green on Docker, Linux, Ubuntu 24.04, and Alpine 3.9 (git 2.20 -> exercises the fallback), plus `shellcheck -s bash libcheckout`. > Pairs with a consumer change (a pipeline initialization job that sets these env > vars) and a related compiler change; the env-var interface is additive and safe > to merge independently. ## Related PRs - semaphoreio/semaphore#1063 - semaphoreci/spc#59 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Mikołaj Kutryj <mikolaj.kutryj@gmail.com>
dexyk
requested changes
Jun 25, 2026
dexyk
approved these changes
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The pipeline initialization (compilation) job only needs the pipeline YAML and
the Git history (commits/trees, used by
change_in) to compile a pipeline — notthe full repository working tree. For large repositories the full checkout
dominates the init-job runtime (cloning hundreds of MiB and materializing tens
of thousands of files on every run).
This makes the init job perform a blobless partial clone + sparse checkout of
the pipeline directory, gated by a per-organization feature flag and disabled
when pre-flight checks are present.
How it works
The init job emits
SEMAPHORE_GIT_PARTIAL_CLONE_FILTER=blob:noneandSEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS=<working_dir>beforecheckout. Theoptimization is applied only when both hold:
working tree), and
sparse_checkout_init_jobfeature is enabled for the organization.The feature check fails closed — a missing org id or an unreachable Feature
service keeps the standard full checkout.
change_inkeeps working because itrelies only on
git diff --name-only/--shortstat/merge-base, which needtree/commit objects (present in a blobless clone), not blob contents.
Changes
proto: generateInternalApi.Featurestubs; addINTERNAL_API_LOCAL_PATHsupport to the proto Makefile so they can be regenerated from a local
internal_apicheckout.ppl: add thefeature_providerdependency and a FeatureHub-backed providerINTERNAL_API_URL_FEATURE);Ppl.Featuresexposes thefail-closed flag check; init
FeatureProviderand a:feature_cachein theapplication supervisor.
ppl: gate the optimized checkout in the compilation init-job commandgeneration; auto-disable when pre-flight checks are configured.
top-level services) so
feature_provider, which lives outsideplumber/, isincluded in the image.
small fix to make an unrelated integration test's env restore nil-safe.
Dependencies
This relies on the toolbox
checkoutpartial/sparse support and the spccommands_fileon-demand fetch being released (and the toolbox image rebuilt).Testing
Ppl: QAandPpl: Integration QAgreen; dev and prod image builds validated.Related PRs
🤖 Generated with Claude Code