Skip to content

[low] deserialize ignores unconsumed trailing bytes, silently returning a partial value on corrupt/skewed buffers #49

@toloco

Description

@toloco

Severity: ⚪ low • Category: correctness
Location: src/serde.rs : 39-43

What's wrong

deserialize calls deserialize_one and discards the _consumed count without checking it equals data.len(). If a stored value buffer is internally inconsistent (e.g. a tuple header claiming fewer elements than the bytes encode, or any trailing garbage from a format-skewed/corrupt cross-process buffer), deserialize_one returns the first decoded object as Ok(Some(...)) and the extra bytes are silently dropped — producing a wrong-but-plausible result with no error. For self-consistent buffers written by this same code path this cannot trigger, so it is only reachable under format skew or external corruption of the mmap file (which is same-trust-domain), hence low severity.

Trigger

Only via a corrupt or different-format value buffer in the mmap (e.g. two builds with different serde formats sharing a derived shm file whose version field still matches). Self-produced buffers are always fully consumed.

Suggested fix

After deserialize_one, verify consumed == data.len(); if not, return Ok(None) to route to the pickle fallback (or raise a clear error). Also consider adding a serde format-version byte so skewed buffers are rejected rather than misinterpreted.

Adversarial verification note

The code at src/serde.rs:39-43 does exactly what the finding describes: match deserialize_one(py, data)? { Some((obj, _consumed)) => Ok(Some(obj)), ... } binds _consumed and discards it without comparing to data.len(). So if a buffer decodes a valid first element with trailing bytes, the extra bytes are silently dropped and a partial value is returned without error. The mechanism is confirmed.

However, the finding is also correct that this is NOT reachable from self-produced buffers. Buffers are written via serde::serialize / serde::wrap_pickle (src/shared_store.rs:230-236, 259-264) and read via deserialize_value (shared_store.rs:279-285), which calls serde::deserialize and falls back to pickle on the TAG_PICKLE/None path. A single top-level element written by serialize always fully consumes its buffer, so there is no trailing garbage in normal operation. Triggering requires either external corruption of the mmap file (located under $TMPDIR/warp_cache/, same trust domain) or two builds with incompatible serde formats sharing a file — and there is no format-version byte to reject that. This is a defensive-hardening gap rather than a bug reachable from the safe Python API.

Note the finding's example of a tuple claiming "fewer elements than the bytes encode" is slightly imprecise: tuple deserialization (lines 200-218) recursively consumes per-element and returns its own offset, and the top-level deserialize still ignores the returned consumed count regardless — so the real (and only) path to a partial read is genuine trailing bytes after a fully-decoded top-level element, which only arises under corruption/skew. That nuance doesn't change the conclusion.

Verdict: real but low/defensive. Severity 'low' as claimed is appropriate (borderline nit, since it needs out-of-trust-model file corruption). Suggested fix (verify consumed == data.len(), add a format-version byte) is reasonable hardening.

Verifier correction: The finding's corruption example (a tuple header claiming fewer elements than the bytes encode) would not by itself cause the trailing-byte misread, because tuple deserialization recursively consumes and reports its own offset; the actual partial-read path is any fully-decoded top-level element followed by genuine trailing bytes, which only occurs under external mmap corruption or cross-build format skew.


Filed from a multi-agent code review (finder → adversarial verification → synthesis). Confirmed real after a skeptic re-read the code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfrom-reviewFiled from the multi-agent code reviewrustPull requests that update rust codeseverity:lowMinor issue or nit

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions