Fix #112: tolerate empty compressed files on the system read path#113
Draft
jdidion wants to merge 1 commit into
Draft
Fix #112: tolerate empty compressed files on the system read path#113jdidion wants to merge 1 commit into
jdidion wants to merge 1 commit into
Conversation
Reading an empty (zero-byte) compressed file routed through SystemReader,
which spawns gzip/pigz to decompress. System decompressors exit non-zero on
empty input ("unexpected end of file"), unlike Python's gzip module, which
treats an empty file as a valid empty stream. SystemReader._raise_if_error
turned that exit code into an EOFError, but only when process.poll() observed
the child exiting first, so the failure was a race that surfaced far more
often on single-CPU machines.
Suppress the non-zero exit code when the source file is empty, aligning the
system-decompressor path with the Python path. Genuine truncation or
corruption of a non-empty file still raises.
Also make test_xopen_file write a real gzip stream before reading (it was
relying on the empty-file race), and add test_xopen_empty_compressed_file as
a deterministic regression test that forces the system read path on an empty
.gz and reads to completion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: This MR was largely generated by Claude and has not been completely reviewed by me (the human). You should feel free to defer your review until this warning has been removed.
Fixes #112.
Root cause
Reading an empty (zero-byte) compressed file routes through
SystemReader, which spawnsgzip/pigzto decompress. System decompressors exit non-zero on empty input (gzip: unexpected end of file,pigz: skipping: ... empty), unlike Python'sgzipmodule, which treats an empty file as a valid empty stream.SystemReader._raise_if_error()turned that exit code into anEOFError. Whether it fired depended on a race between the parent'sprocess.poll()and the child process exiting, so the failure showed up nondeterministically and far more often on single-CPU machines, which is exactly what @sanvila observed (reproducible withnr_cpus=1).Fix
SystemReader._raise_if_error()suppresses a non-zero exit code when the source file is empty (0 bytes), aligning the system-decompressor read path with the Python read path. Genuine truncation or corruption of a non-empty file still raises.test_xopen_filenow writes a real gzip stream before reading, so it no longer depends on the empty-file race.test_xopen_empty_compressed_fileforces the system read path on an empty.gzand reads to completion. It is deterministic (no race): it fails on the pre-fix code with the sameEOFErrorfrom the issue and passes with the fix.Verification
pytest -m "not perf"(the selector the Debian build uses): 143 passed, 3 skipped, 29 subtests passed. No regressions (+1 vs. the prior 142, the new regression test). Reproduced locally with GNUgzip, which also exits 1 on empty input.@sanvila could you confirm this resolves the build failure on your single-CPU setup? Thanks for the detailed report.