feat: unified CCITT decoder with G3 2D, byte alignment, and resource limits#26
Draft
lilith wants to merge 2 commits into
Draft
feat: unified CCITT decoder with G3 2D, byte alignment, and resource limits#26lilith wants to merge 2 commits into
lilith wants to merge 2 commits into
Conversation
f26fb35 to
d0b17bc
Compare
…limits New `unified` module provides a single `Decoder` struct configurable via `DecodeOptions`, handling G4, G3 1D, and G3 2D in one code path. Maps directly to PDF CCITTFaxDecode parameters and TIFF Group3/4Options. New capabilities: G3 2D (mixed 1D/2D with K parameter), byte-aligned mode, EndOfBlock=false, EndOfLine=false (Modified Huffman), LSB-first bit order, resource limits (max_pixels, max_input_bytes), native u32 transitions (default Limits caps at u16 range). API: #[non_exhaustive] on DecodeOptions/EncodingMode/Error. Own Error type — legacy DecodeError unchanged. decode() convenience function, pels32() for u32 transitions. Changes to existing code: 3 helpers made pub(crate) in decoder.rs, new private decode_2d_line extracted. ByteReader gains new_lsb(), bytes_consumed(), align_to_byte(). Zero changes to public API.
tests/unified.rs (9 tests): inline Go corpus data (BSD-3, ~1.2KB), legacy parity, G3/G4/aligned/G3-2D/limits. tests/external_corpus.rs (8 tests): Pillow TIFFs (MIT-CMU, 5KB) for LSB bit order, G3 no-EOL, crash regression. libtiff TIFF inlined for minimal G3. SHA-256 hash verification of all 43 committed test files against libtiff ground truth (reference-hashes.tsv). test-files/pillow/: 5 small TIFFs (5KB total). test-files/reference-hashes.tsv: libtiff-verified decode hashes.
9f251ec to
4edfe27
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft PR addressing several open issues and feature gaps.
New: unified
DecoderstructA single
Decoderconfigurable viaDecodeOptions, replacing the need to choose betweenGroup3DecoderandGroup4Decoderat the type level. Maps directly to PDFCCITTFaxDecodeparameters and TIFFGroup3Options/Group4Optionstags.The existing
decode_g3,decode_g4,Group3Decoder, andGroup4Decoderare unchanged — fully backwards compatible.New features
u32 transitions (partially addresses #13)
The unified decoder uses
u32internally for transitions and width, enabling images wider than 65535 pixels. DefaultLimitscaps atu16::MAXrange for safety — callers opt in by raising limits. Legacy API remainsu16.Group 3 2D decoding (addresses #5 G3 thread)
Mixed 1D/2D coding per T.4. Reads the tag bit after each EOL to select 1D or 2D decoding per line. The 2D path reuses the existing G4 mode-code logic. Tested with a real G3 2D image generated by libtiff's
tiffcp -c g3:2d.Byte-aligned mode (
rows_are_byte_aligned)Consumes padding bits between lines to reach the next byte boundary. Needed for TIFF
Group3Optionsbit 2 and PDFEncodedByteAlign. Tested against Go'sx/image/ccittaligned test files — both G3 and G4 aligned variants now decode correctly.EndOfBlock=falsesupportWhen
end_of_blockis false, the decoder usesrowsto determine when to stop instead of requiring EOFB/RTC markers. Handles TIFF strips that end without termination markers.EndOfLine=falsesupportG3 data without EOL markers between lines. Run-lengths terminate at
columnswidth instead of requiring an EOL marker. Needed for some PDF streams.LSB-first bit order (
msb_first=false)ByteReader::new_lsb()reverses bits within each input byte. For TIFFFillOrder=2.Resource limits
Limits { max_pixels, max_input_bytes }rejects oversized images early and bounds input consumption. ReturnsDecodeError::LimitExceededon violation. Default limits cap at u16 range.Test coverage
9 new tests using Go's
x/image/ccittcorpus (153×55 bw-gopher, BSD-3-Clause, 56KB):decode_g4/decode_g3tiffcp -c g3:2dmax_pixelsrejects oversized dimensions26 total tests pass (23 unit + 2 check + 1 errors).
What this doesn't address (yet)
u16. Full migration (encoder,pels(),Transitions) is a separate breaking change.Stacks on #24 (cargo fmt).