Skip to content

Fix #112: tolerate empty compressed files on the system read path#113

Draft
jdidion wants to merge 1 commit into
mainfrom
fix/112-empty-gz-system-reader
Draft

Fix #112: tolerate empty compressed files on the system read path#113
jdidion wants to merge 1 commit into
mainfrom
fix/112-empty-gz-system-reader

Conversation

@jdidion

@jdidion jdidion commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Note: This MR was largely generated by Claude and has not been completely reviewed by me (the human). You should feel free to defer your review until this warning has been removed.

Fixes #112.

Root cause

Reading an empty (zero-byte) compressed file routes through SystemReader, which spawns gzip/pigz to decompress. System decompressors exit non-zero on empty input (gzip: unexpected end of file, pigz: skipping: ... empty), unlike Python's gzip module, which treats an empty file as a valid empty stream.

SystemReader._raise_if_error() turned that exit code into an EOFError. Whether it fired depended on a race between the parent's process.poll() and the child process exiting, so the failure showed up nondeterministically and far more often on single-CPU machines, which is exactly what @sanvila observed (reproducible with nr_cpus=1).

Fix

  • SystemReader._raise_if_error() suppresses a non-zero exit code when the source file is empty (0 bytes), aligning the system-decompressor read path with the Python read path. Genuine truncation or corruption of a non-empty file still raises.
  • test_xopen_file now writes a real gzip stream before reading, so it no longer depends on the empty-file race.
  • New test_xopen_empty_compressed_file forces the system read path on an empty .gz and reads to completion. It is deterministic (no race): it fails on the pre-fix code with the same EOFError from the issue and passes with the fix.

Verification

pytest -m "not perf" (the selector the Debian build uses): 143 passed, 3 skipped, 29 subtests passed. No regressions (+1 vs. the prior 142, the new regression test). Reproduced locally with GNU gzip, which also exits 1 on empty input.

@sanvila could you confirm this resolves the build failure on your single-CPU setup? Thanks for the detailed report.

Reading an empty (zero-byte) compressed file routed through SystemReader,
which spawns gzip/pigz to decompress. System decompressors exit non-zero on
empty input ("unexpected end of file"), unlike Python's gzip module, which
treats an empty file as a valid empty stream. SystemReader._raise_if_error
turned that exit code into an EOFError, but only when process.poll() observed
the child exiting first, so the failure was a race that surfaced far more
often on single-CPU machines.

Suppress the non-zero exit code when the source file is empty, aligning the
system-decompressor path with the Python path. Genuine truncation or
corruption of a non-empty file still raises.

Also make test_xopen_file write a real gzip stream before reading (it was
relying on the empty-file race), and add test_xopen_empty_compressed_file as
a deterministic regression test that forces the system read path on an empty
.gz and reads to completion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_xopen_file is flaky

1 participant