[low] deserialize ignores unconsumed trailing bytes, silently returning a partial value on corrupt/skewed buffers

**Severity:** ⚪ low  •  **Category:** correctness
**Location:** `src/serde.rs` : 39-43

## What's wrong
`deserialize` calls `deserialize_one` and discards the `_consumed` count without checking it equals `data.len()`. If a stored value buffer is internally inconsistent (e.g. a tuple header claiming fewer elements than the bytes encode, or any trailing garbage from a format-skewed/corrupt cross-process buffer), `deserialize_one` returns the first decoded object as `Ok(Some(...))` and the extra bytes are silently dropped — producing a wrong-but-plausible result with no error. For self-consistent buffers written by this same code path this cannot trigger, so it is only reachable under format skew or external corruption of the mmap file (which is same-trust-domain), hence low severity.

## Trigger
Only via a corrupt or different-format value buffer in the mmap (e.g. two builds with different serde formats sharing a derived shm file whose version field still matches). Self-produced buffers are always fully consumed.

## Suggested fix
After `deserialize_one`, verify `consumed == data.len()`; if not, return `Ok(None)` to route to the pickle fallback (or raise a clear error). Also consider adding a serde format-version byte so skewed buffers are rejected rather than misinterpreted.

<details><summary>Adversarial verification note</summary>

The code at src/serde.rs:39-43 does exactly what the finding describes: `match deserialize_one(py, data)? { Some((obj, _consumed)) => Ok(Some(obj)), ... }` binds `_consumed` and discards it without comparing to `data.len()`. So if a buffer decodes a valid first element with trailing bytes, the extra bytes are silently dropped and a partial value is returned without error. The mechanism is confirmed.

However, the finding is also correct that this is NOT reachable from self-produced buffers. Buffers are written via `serde::serialize` / `serde::wrap_pickle` (src/shared_store.rs:230-236, 259-264) and read via `deserialize_value` (shared_store.rs:279-285), which calls `serde::deserialize` and falls back to pickle on the TAG_PICKLE/None path. A single top-level element written by `serialize` always fully consumes its buffer, so there is no trailing garbage in normal operation. Triggering requires either external corruption of the mmap file (located under `$TMPDIR/warp_cache/`, same trust domain) or two builds with incompatible serde formats sharing a file — and there is no format-version byte to reject that. This is a defensive-hardening gap rather than a bug reachable from the safe Python API.

Note the finding's example of a tuple claiming "fewer elements than the bytes encode" is slightly imprecise: tuple deserialization (lines 200-218) recursively consumes per-element and returns its own offset, and the top-level `deserialize` still ignores the returned consumed count regardless — so the real (and only) path to a partial read is genuine trailing bytes after a fully-decoded top-level element, which only arises under corruption/skew. That nuance doesn't change the conclusion.

Verdict: real but low/defensive. Severity 'low' as claimed is appropriate (borderline nit, since it needs out-of-trust-model file corruption). Suggested fix (verify consumed == data.len(), add a format-version byte) is reasonable hardening.

</details>

> **Verifier correction:** The finding's corruption example (a tuple header claiming fewer elements than the bytes encode) would not by itself cause the trailing-byte misread, because tuple deserialization recursively consumes and reports its own offset; the actual partial-read path is any fully-decoded top-level element followed by genuine trailing bytes, which only occurs under external mmap corruption or cross-build format skew.

---
_Filed from a multi-agent code review (finder → adversarial verification → synthesis). Confirmed real after a skeptic re-read the code._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[low] deserialize ignores unconsumed trailing bytes, silently returning a partial value on corrupt/skewed buffers #49

What's wrong

Trigger

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[low] deserialize ignores unconsumed trailing bytes, silently returning a partial value on corrupt/skewed buffers #49

Description

What's wrong

Trigger

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions