Harden upstream checksum fetching and validation#1235
Open
ideaship wants to merge 5 commits into
Open
Conversation
The two checksum fetch paths used for latest-image change detection (get_checksum_from_checksums_url and get_checksum_from_checksum_url) called requests.get() without a timeout, status check, or exception handling. Since these calls run outside any try/except on the process_image path, a connection failure, DNS error, or invalid URL aborted the entire run, and an unresponsive server could block it indefinitely. An HTTP error response was silently parsed as a checksums file, producing a misleading "could not find checksum" message instead of a clear HTTP error. Both fetch sites now use a bounded timeout, call raise_for_status(), and catch requests.RequestException by logging an error and returning an empty checksum, so a single unreachable checksum URL only skips that image instead of aborting the run. Add unit tests for both functions covering the bare-digest and manifest-matching success paths, a junk response body, an HTTP error status, a connection failure, and the timeout argument. These weaknesses predate the checksum_url feature (#1202); that change merely duplicated the existing pattern onto the new path. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
is_checksum() accepted any 32/40/64/128-character string without a period as a valid digest, e.g. "x" * 64 passed as a SHA-256 checksum. Because the extracted value is stored as the upstream_checksum image property and drives latest-image change detection, a fixed-length junk response from a checksum URL could suppress a needed update or force a spurious re-import. Validate the digest with re.fullmatch against hexadecimal content of the four supported lengths (MD5, SHA-1, SHA-256, SHA-512). The previous "." exclusion is subsumed by the hex match. This logic was extracted, not introduced, by #1202; the same gap existed in the old inline get_checksum code on the checksums_url path, so this hardens both paths at once. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
The schema accepted ftp:// and ftps:// URLs for checksum_url, checksums_url, and latest_checksum_url, but requests.get() has no FTP adapter and raises InvalidSchema for such URLs. A schema-valid configuration could therefore abort the manager at runtime. Restrict the regex on these three fields to HTTP(S), so the schema only accepts URLs the implementation can actually fetch. No image definition in this repository uses FTP URLs in these fields. The latest_url, url, mirror_url, and source fields are not fetched through this code path and are left unchanged. Add a schema unit test asserting that the three fields accept http and https URLs and reject ftp and ftps URLs. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
The distinction between checksums_url and checksum_url was not documented anywhere: checksums_url expects a checksums file that contains the image filename (e.g. the SHA256SUMS manifests published by Ubuntu), while checksum_url, added in #1202, expects a file containing a single bare digest and nothing else (e.g. the .sha512 sidecar files published by Alpine). The schema regex can only assert that the value is a URL, not which content format the file must have, so without documentation users are likely to pick the wrong field and have their latest images silently skipped. Spell out the contract in the README and in the docstrings of the two checksum extraction functions. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
When process_image() fails to determine the upstream checksum of a latest image, it returns early without adding the image's name to the managed images set. main() then ran manage_outdated_images() regardless, which treats every cloud image missing from that set as a removal candidate. A transient failure fetching a checksum file could therefore get a perfectly healthy image hidden, deactivated, or deleted when the corresponding cleanup flags are enabled. The hardening of the checksum fetching widened this window: connection errors and timeouts that previously crashed the run before cleanup could start now continue into cleanup. The same gap existed for the other skip paths in process_image() (local file missing, download URL check failure): they set exit_with_error, but main() only evaluated the flag after the cleanup had already run. Fix this in two places. The checksum lookup failure now sets exit_with_error, consistent with the other skip paths. main() no longer calls manage_outdated_images() when exit_with_error is set, because the managed images set is incomplete at that point; cleanup resumes on the next error-free run. Note this changes behavior for runs with errors: cleanup is deferred instead of operating on an incomplete managed images set, and a checksum lookup failure now causes a non-zero exit status. Assisted-by: Claude:claude-fable-5 Signed-off-by: Roger Luethi <luethi@osism.tech>
d5d7e73 to
70764ad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the review of the
checksum_urlfeature:checksum_urloption #1202These robustness gaps predate that PR — it merely duplicated existing patterns onto the new
checksum_urlpath — so they are fixed here uniformly across both checksum paths instead of being requested as changes on the contribution.Changes
Harden both checksum fetch sites.
get_checksum_from_checksums_urlandget_checksum_from_checksum_urlcalledrequests.get()with no timeout, status check, or exception handling, outside anytry/excepton theprocess_imagepath. A connection failure or invalid URL aborted the entire run, a hung server blocked it indefinitely, and an HTTP error page was parsed as a checksums file, producing a misleading "could not find checksum" message. Both sites now use a bounded timeout, callraise_for_status(), and catchrequests.RequestExceptionby logging and returning an empty checksum, so a failing checksum URL only skips that image.Skip cleanup of outdated images when processing errors occur. An image that fails to process (e.g. its upstream checksum could not be determined) is missing from the managed images set, so
manage_outdated_images()would treat its healthy cloud images as removal candidates and could hide, deactivate, or delete them when the corresponding flags are enabled. A checksum lookup failure now setsexit_with_error(consistent with the other skip paths inprocess_image), andmain()no longer runs the cleanup when that flag is set — cleanup resumes on the next error-free run. Note the behavior change: runs with processing errors now defer cleanup, and a checksum lookup failure now causes a non-zero exit status.Validate hexadecimal content in
is_checksum(). It previously accepted any 32/40/64/128-character string without a period (e.g."x" * 64). Since this value is stored as theupstream_checksumproperty and drives latest-image change detection, fixed-length junk could suppress a needed update or force a spurious re-import.Restrict checksum URL schemes to HTTP(S). The schema accepted
ftp:///ftps://forchecksum_url,checksums_url, andlatest_checksum_url, butrequests.get()has no FTP adapter and raisesInvalidSchema, so a schema-valid configuration could abort the manager. No existing image definition uses FTP URLs.Add unit tests for
is_checksum, both fetch functions (bare digest, manifest matching, junk body, HTTP error, connection failure, timeout usage), the schema URL scheme restriction, and the error-handling paths (checksum lookup failure sets the error flag; cleanup is skipped on errors).Document the field contract:
checksums_urlexpects a manifest containing the image filename;checksum_urlexpects a single bare digest (e.g. Alpine sidecar files).🤖 Generated with Claude Code