Skip to content

feat(checkout): optional blobless partial clone + sparse working tree#542

Merged
skipi merged 5 commits into
masterfrom
db/checkout-blobless-sparse
Jun 23, 2026
Merged

feat(checkout): optional blobless partial clone + sparse working tree#542
skipi merged 5 commits into
masterfrom
db/checkout-blobless-sparse

Conversation

@DamjanBecirovic

@DamjanBecirovic DamjanBecirovic commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What

Adds an opt-in "optimized" checkout path to the checkout toolbox command, for
callers that only need the Git history (commits/trees) and a subset of the
working tree rather than the full repository. It is controlled by two env vars
and is fully backwards compatible — when neither is set, checkout behaves
exactly as before.

SEMAPHORE_GIT_PARTIAL_CLONE_FILTER   e.g. "blob:none"  -> git clone --filter=...
SEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS  e.g. ".semaphore" -> cone-mode sparse checkout

When either is set, checkout performs a --no-checkout (optionally filtered)
clone and a cone-mode sparse checkout limited to the requested paths, handling
push/branch, pull-request, and tag refs. For large repositories this avoids
downloading blob content and materializing tens of thousands of files when only
a small subset is needed.

Changes

  • libcheckout: new checkout::optimized / checkout::configure_sparse paths,
    dispatched from checkout() only when the new env vars are present. Existing
    shallow / refbased / use-cache paths are untouched.
  • Graceful fallback: git sparse-checkout (cone mode) requires git >= 2.25.
    On older clients the optimized path detects the missing subcommand and falls
    back to a standard checkout, so the request degrades into a correct
    (non-optimized) clone instead of a full checkout from a blobless clone.
  • tests/libcheckout.bats: capability-aware tests for push/branch, PR and tag —
    asserting the sparse working tree where supported and the full-tree fallback
    where not.
  • CI: the macOS xcode26 arm block's prologue ran brew upgrade ruby-build,
    which newer Homebrew turns into an interactive [y/n] prompt that blocked on
    stdin and stopped the block (also broken on master). Pipe yes so it
    proceeds non-interactively.

Testing

  • bats libcheckout suite green on Docker, Linux, Ubuntu 24.04, and Alpine 3.9
    (git 2.20 -> exercises the fallback), plus shellcheck -s bash libcheckout.

Pairs with a consumer change (a pipeline initialization job that sets these env
vars) and a related compiler change; the env-var interface is additive and safe
to merge independently.

Related PRs

🤖 Generated with Claude Code

DamjanBecirovic and others added 4 commits June 12, 2026 13:58
Add an opt-in optimized checkout path, gated on env vars, for callers that only
need the Git history (trees/commits) and a subset of the working tree - e.g. the
pipeline initialization job:

  SEMAPHORE_GIT_PARTIAL_CLONE_FILTER   e.g. "blob:none" -> git clone --filter=...
  SEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS  e.g. ".semaphore" -> cone-mode sparse checkout

When either var is set, checkout() routes to checkout::optimized, which performs
a --no-checkout (optionally --filter) clone and a cone-mode sparse checkout
limited to the requested paths, for push/branch, PR, and tag refs. Existing
shallow/refbased/use-cache paths are unchanged when the vars are unset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…heckout

The optimized path uses `git sparse-checkout` (cone mode), which requires
git >= 2.25. On older clients (e.g. the Alpine 3.9 CI image, git 2.20) the
subcommand is missing, so the sparse step silently no-opped and the full tree
was checked out from a blobless clone - fetching every blob on demand.

Detect sparse-checkout support and, when absent, fall back to the standard
shallow/ref-based checkout so the optimization degrades gracefully into a
correct (non-optimized) clone. Make the bats tests capability-aware: assert the
sparse working tree where supported, and the full-tree fallback where not.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macOS xcode26 arm block's prologue ran `brew upgrade ruby-build`, which on
newer Homebrew prompts for confirmation and aborts with no TTY, stopping the
whole block (also failing on master). Run brew non-interactively (NONINTERACTIVE=1)
and tolerate transient brew failures so the block can run its tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NONINTERACTIVE did not suppress Homebrew's "Do you want to proceed? [y/n]"
upgrade prompt, and with no TTY brew blocked on stdin until the job was killed
(block "stopped"), so `|| true` never ran. Pipe `yes` into `brew upgrade` so it
proceeds without hanging.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…obless-sparse

# Conflicts:
#	.semaphore/semaphore.yml
@skipi skipi marked this pull request as ready for review June 22, 2026 08:17
@skipi skipi requested a review from dexyk June 22, 2026 08:18
@skipi skipi merged commit 170acc3 into master Jun 23, 2026
23 checks passed
@skipi skipi deleted the db/checkout-blobless-sparse branch June 23, 2026 10:21
skipi added a commit to semaphoreio/semaphore that referenced this pull request Jun 25, 2026
…ature flag (#1063)

## What

The pipeline initialization (compilation) job only needs the pipeline
YAML and
the Git history (commits/trees, used by `change_in`) to compile a
pipeline — not
the full repository working tree. For large repositories the full
checkout
dominates the init-job runtime (cloning hundreds of MiB and
materializing tens
of thousands of files on every run).

This makes the init job perform a **blobless partial clone + sparse
checkout** of
the pipeline directory, gated by a per-organization feature flag and
disabled
when pre-flight checks are present.

## How it works

The init job emits `SEMAPHORE_GIT_PARTIAL_CLONE_FILTER=blob:none` and
`SEMAPHORE_GIT_SPARSE_CHECKOUT_PATHS=<working_dir>` before `checkout`.
The
optimization is applied only when **both** hold:

- there are **no pre-flight checks** (their custom commands may rely on
the full
  working tree), and
- the `sparse_checkout_init_job` feature is enabled for the
organization.

The feature check **fails closed** — a missing org id or an unreachable
Feature
service keeps the standard full checkout. `change_in` keeps working
because it
relies only on `git diff --name-only` / `--shortstat` / `merge-base`,
which need
tree/commit objects (present in a blobless clone), not blob contents.

## Changes

- `proto`: generate `InternalApi.Feature` stubs; add
`INTERNAL_API_LOCAL_PATH`
  support to the proto Makefile so they can be regenerated from a local
  `internal_api` checkout.
- `ppl`: add the `feature_provider` dependency and a FeatureHub-backed
provider
  + gRPC client (`INTERNAL_API_URL_FEATURE`); `Ppl.Features` exposes the
fail-closed flag check; init `FeatureProvider` and a `:feature_cache` in
the
  application supervisor.
- `ppl`: gate the optimized checkout in the compilation init-job command
  generation; auto-disable when pre-flight checks are configured.
- Build: move ppl's Docker build context to the repository root
(matching the
top-level services) so `feature_provider`, which lives outside
`plumber/`, is
  included in the image.
- Tests: feature-flag gate + provider coverage, with a Feature gRPC
mock; plus a
small fix to make an unrelated integration test's env restore nil-safe.

> Note: `yaml_elixir` is pinned to `~> 1.3` via override because the
umbrella's
> YAML validators rely on 1.x return semantics; this means
feature_provider's
> YAML provider is not used here (the FeatureHub gRPC provider is). A
follow-up
> is tracked to align on yaml_elixir 2.x for YAML-defined features.

## Dependencies

This relies on the toolbox `checkout` partial/sparse support and the spc
`commands_file` on-demand fetch being released (and the toolbox image
rebuilt).

## Testing

- `Ppl: QA` and `Ppl: Integration QA` green; dev and prod image builds
validated.

## Related PRs

- semaphoreci/toolbox#542
- semaphoreci/spc#59

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Mikołaj Kutryj <mikolaj.kutryj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants