Skip to content

Fix fill() corrupting data when the fill pattern is longer than one word#58

Open
gaoflow wants to merge 1 commit into
eerimoq:masterfrom
gaoflow:fix-fill-overrun
Open

Fix fill() corrupting data when the fill pattern is longer than one word#58
gaoflow wants to merge 1 commit into
eerimoq:masterfrom
gaoflow:fix-fill-overrun

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 30, 2026

Copy link
Copy Markdown

Fixes #51.

The bug

fill() silently corrupts (or destroys) existing segment data when the fill pattern is longer than one word:

import bincopy
bf = bincopy.BinFile(word_size_bits=8)
bf.add_binary(b"AAAA", 0)
bf.add_binary(b"BBBB", 6)          # 2-byte gap at addresses 4..5
bf.as_binary()                     # b'AAAA\xff\xffBBBB'
bf.fill(b"xyz")
bf.as_binary()                     # b'AAAAxyzxyz'   <-- BBBB destroyed

A smaller case mangles rather than deletes: filling a 1-byte gap between cafe and babe with b" -" yields b'cafe -bab' (and the segment view, b'cafe -babe', is even a different length than as_binary()). The docstring says fill "Fill empty space between segments", so altering bytes that already hold data is wrong under any fill semantics.

Root cause

bincopy.py, in fill():

fill_segments.append(Segment(
    previous_segment_maximum_address,
    previous_segment_maximum_address + fill_size,   # gap is fill_size bytes
    value * fill_size_words,                         # but this is fill_size_words * len(value) bytes
    self.word_size_bytes))

The Segment is declared to span fill_size bytes, but its data value * fill_size_words has fill_size_words * len(value) bytes. These are equal only when len(value) == word_size_bytes. For a longer value the oversized data overruns the gap, and _segments.add() overwrites the following segment.

The fix

Repeat value and clamp it to exactly the gap size:

fill_value = (value * (fill_size // len(value) + 1))[:fill_size]

This is byte-identical to the current behaviour for the normal len(value) == word_size_bytes case (including the default b'\xff' * word_size_bytes), so existing fills are unchanged — it only stops the overrun. This matches the repeat-the-pattern semantics suggested in #51; I went with clamp-to-gap rather than raising, since it is the backward-compatible choice and the default fill keeps working. Happy to switch to raising on a non-dividing pattern if you'd prefer that.

Tests

test_fill_pattern_longer_than_word covers: a multi-byte pattern over a 2-byte gap (trailing segment preserved), a 2-byte pattern over a 1-byte gap, the default-value path (unchanged), word_size_bits=16 with a two-word pattern, and an as_binary()/segment length-consistency assert. The first case fails on master (b'AAAAxyzxyz' != b'AAAAxyBBBB') and passes with the fix; the full suite stays green (103 passed).

fill() built each gap-filling Segment's data as `value * fill_size_words`,
i.e. `fill_size_words * len(value)` bytes. That equals the gap only when
`len(value) == word_size_bytes`; a longer fill pattern produced too many
bytes, overran the gap and silently overwrote the following segment,
destroying existing data. Repeat the pattern and clamp it to exactly the
gap size, which is byte-identical to the previous behaviour for the normal
one-word pattern. Fixes eerimoq#51.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fill() silently corrupts data if fill pattern is larger than any gap

1 participant