ci: add compressed image size cap to GitHub Actions and GitLab CI#289
ci: add compressed image size cap to GitHub Actions and GitLab CI#289wdconinc wants to merge 16 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds CI guardrails to fail container builds when compressed image sizes exceed configured caps, helping catch size regressions in EIC container images.
Changes:
- Adds
.ci/check_image_sizeto inspect pushed image manifests and compare compressed layer size to a GiB limit. - Adds shared size-limit variables to GitHub Actions and GitLab CI.
- Runs size checks after base and EIC image builds.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
.ci/check_image_size |
New helper for compressed image size validation. |
.github/workflows/build-push.yml |
Adds size limit env vars and GitHub Actions size-check steps. |
.gitlab-ci.yml |
Adds size limit variables and GitLab CI size-check script entries. |
Comments suppressed due to low confidence (1)
.gitlab-ci.yml:528
- The eic GitLab job also invokes
.ci/check_image_sizewithout installing the script's requiredjqandbcdependencies; the existingapk add envsubst gitearlier in this job does not provide them. These size-check branches will fail before evaluating the cap unless the dependencies are installed.
.ci/check_image_size
"${CI_REGISTRY}/${CI_PROJECT_PATH}/${BUILD_IMAGE}${ENV}:${INTERNAL_TAG}-${BUILD_TYPE}"
"${SIZE_LIMIT_EIC_XL_GIB}" ;;
*)
.ci/check_image_size
"${CI_REGISTRY}/${CI_PROJECT_PATH}/${BUILD_IMAGE}${ENV}:${INTERNAL_TAG}-${BUILD_TYPE}"
"${SIZE_LIMIT_EIC_CI_GIB}" ;;
b780c8b to
89b50a9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (3)
.gitlab-ci.yml:531
- The eic build job only installs
envsubstandgitbefore this new check, but.ci/check_image_sizealso requiresjqandbc. The configured eic_ci/eic_xl limits make this path execute, so GitLab CI will fail after those image builds unless the missing tools are installed in this job or shared build setup.
.ci/check_image_size
"${CI_REGISTRY}/${CI_PROJECT_PATH}/${FULLNAME}:${INTERNAL_TAG}-${BUILD_TYPE}"
"${LIMIT}" ;
.ci/check_image_size:35
- This index resolver accepts any descriptor with a
platform, which includes BuildKit provenance attestation manifests (unknown/unknown) as well as the real image manifest. Because it then takeshead -1, an attestation descriptor can be selected and the check would sum the tiny attestation layers instead of the image layers, letting an oversized image pass; filter out attestation/unknown-platform descriptors or select the expected platform explicitly.
DIGEST=$(printf '%s' "${MANIFEST}" | jq -r '
.manifests[]
| select(.artifactType == null and .platform != null)
| .digest' | head -1)
.ci/check_image_size:66
- The remediation command shown here does not work for the OCI index case handled earlier in the script:
${IMAGE_REF}may have.manifestsinstead of.layers, so thisjqexpression errors or returns no size. The error message should either show a command that performs the same index-to-manifest resolution or print the resolved manifest reference used for the check.
echo "To update the high-water mark after an intentional size increase:"
echo " 1. Inspect the published image size with:"
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
.ci/check_image_size:67
- This troubleshooting command is wrong for the OCI index case handled above (the default GitHub build-push provenance path), because the top-level index has no
.layers. If the check fails on an index reference, maintainers following this hint will get a jq error or no size instead of the manifest size.
echo " 1. Inspect the published image size with:"
echo " docker buildx imagetools inspect --raw \"${IMAGE_REF}\" | jq '[.layers[].size] | add // 0'"
|
This works as intended, e.g. https://github.com/eic/containers/actions/runs/25965960211/job/76331007283#step:17:107 Strategy allows for additional size definitions as needed but the ones we have right now are the most critical ones to track: debian base, eic_ci, eic_xl. We could add a cuda base or eic image later if we want to guard against pulling in a second cuda version. |
Fail the build when the compressed download size of a built image exceeds a configurable high-water mark, to catch regressions like an accidental Spack-built LLVM being pulled in as a compiler. New script: - .ci/check_image_size IMAGE_REF LIMIT_GIB: inspects a pushed single-arch manifest, sums layers[].size, fails if over limit CI variables added (both GitHub Actions and GitLab CI): SIZE_LIMIT_DEBIAN_STABLE_BASE_GIB = 1 SIZE_LIMIT_EIC_CI_GIB = 8 SIZE_LIMIT_EIC_XL_GIB = 10 Limits calibrated at 2025-05 master sizes with ~15% headroom: debian_stable_base: 0.76 GiB → 1 GiB eic_ci: 6.93 GiB → 8 GiB eic_xl: 8.68 GiB → 10 GiB To update the high-water mark after an intentional size increase: run .ci/query_image_sizes locally, add 15%, round up to next GiB, then update SIZE_LIMIT_*_GIB in both CI files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
docker/build-push-action v4+ enables provenance attestations by default, which causes even single-platform builds to be pushed as an OCI Image Index (mediaType *index*/*manifest.list*) wrapping the real Image Manifest. The previous script called .layers[] directly on the index, which has no layers field, causing: jq: error: Cannot iterate over null (null) Fix: detect an OCI Image Index, pick the first platform manifest entry (excluding attestation entries that have artifactType set), re-fetch that manifest by digest, then proceed with the existing layer-size check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace hardcoded limit variable references with a dynamic lookup
derived from the image name:
LIMIT_VAR="SIZE_LIMIT_$(echo "${FULLNAME}" | tr lower upper)_GIB"
If no variable is defined for an image the check is skipped with a
message. This fixes the current incorrect behaviour where:
- cuda_devel and cuda_runtime were checked against the
debian_stable_base 1 GiB limit
- eic_cvmfs, eic_dbg, eic_jl, eic_prod, eic_ci_without_acts,
eic_dev_cuda were checked against the eic_ci 8 GiB limit
The three calibrated limits are unchanged:
SIZE_LIMIT_DEBIAN_STABLE_BASE_GIB = 1
SIZE_LIMIT_EIC_CI_GIB = 8
SIZE_LIMIT_EIC_XL_GIB = 10
To add a limit for a new image, add SIZE_LIMIT_<NAME>_GIB to the
env: section of both CI files. The GitLab CI cuda|tf special-case is
removed as those images are now skipped naturally (no variable set).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
${!VAR} is bash-specific and fails under /bin/sh (ash/busybox on Alpine).
Replace with POSIX-compatible eval:
eval "LIMIT=\${LIMIT_VAR:-}"
Fixes 'bad substitution' error in base and eic check_image_size steps.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Added necessary packages for build process.
Clarified the description of the image manifest check.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Wouter Deconinck <wdconinc@gmail.com>
352c54c to
f7a27d2
Compare
Briefly, what does this PR introduce? Please link to any relevant presentations or discussions.
This PR fails the build when the compressed download size of a built image exceeds a configurable high-water mark, to catch regressions.
New script:
.ci/check_image_size IMAGE_REF LIMIT_GIB: inspects a pushed single-arch manifest, sums layers[].size, fails if over limitCI variables added (both GitHub Actions and GitLab CI):
Also, add script to shellcheck, and skip failing hadolint now that pre-commit.ci is activated.
What is the urgency of this PR?
What kind of change does this PR introduce?
Please check if any of the following apply
This is a high priority since we want to avoid regressions like the one fixed in #287.
Strategy defined by human; script written by AI, reviewed/tested/validated by human.