Severity: ⚪ low • Category: correctness
Location: src/serde.rs : 39-43
What's wrong
deserialize calls deserialize_one and discards the _consumed count without checking it equals data.len(). If a stored value buffer is internally inconsistent (e.g. a tuple header claiming fewer elements than the bytes encode, or any trailing garbage from a format-skewed/corrupt cross-process buffer), deserialize_one returns the first decoded object as Ok(Some(...)) and the extra bytes are silently dropped — producing a wrong-but-plausible result with no error. For self-consistent buffers written by this same code path this cannot trigger, so it is only reachable under format skew or external corruption of the mmap file (which is same-trust-domain), hence low severity.
Trigger
Only via a corrupt or different-format value buffer in the mmap (e.g. two builds with different serde formats sharing a derived shm file whose version field still matches). Self-produced buffers are always fully consumed.
Suggested fix
After deserialize_one, verify consumed == data.len(); if not, return Ok(None) to route to the pickle fallback (or raise a clear error). Also consider adding a serde format-version byte so skewed buffers are rejected rather than misinterpreted.
Adversarial verification note
The code at src/serde.rs:39-43 does exactly what the finding describes: match deserialize_one(py, data)? { Some((obj, _consumed)) => Ok(Some(obj)), ... } binds _consumed and discards it without comparing to data.len(). So if a buffer decodes a valid first element with trailing bytes, the extra bytes are silently dropped and a partial value is returned without error. The mechanism is confirmed.
However, the finding is also correct that this is NOT reachable from self-produced buffers. Buffers are written via serde::serialize / serde::wrap_pickle (src/shared_store.rs:230-236, 259-264) and read via deserialize_value (shared_store.rs:279-285), which calls serde::deserialize and falls back to pickle on the TAG_PICKLE/None path. A single top-level element written by serialize always fully consumes its buffer, so there is no trailing garbage in normal operation. Triggering requires either external corruption of the mmap file (located under $TMPDIR/warp_cache/, same trust domain) or two builds with incompatible serde formats sharing a file — and there is no format-version byte to reject that. This is a defensive-hardening gap rather than a bug reachable from the safe Python API.
Note the finding's example of a tuple claiming "fewer elements than the bytes encode" is slightly imprecise: tuple deserialization (lines 200-218) recursively consumes per-element and returns its own offset, and the top-level deserialize still ignores the returned consumed count regardless — so the real (and only) path to a partial read is genuine trailing bytes after a fully-decoded top-level element, which only arises under corruption/skew. That nuance doesn't change the conclusion.
Verdict: real but low/defensive. Severity 'low' as claimed is appropriate (borderline nit, since it needs out-of-trust-model file corruption). Suggested fix (verify consumed == data.len(), add a format-version byte) is reasonable hardening.
Verifier correction: The finding's corruption example (a tuple header claiming fewer elements than the bytes encode) would not by itself cause the trailing-byte misread, because tuple deserialization recursively consumes and reports its own offset; the actual partial-read path is any fully-decoded top-level element followed by genuine trailing bytes, which only occurs under external mmap corruption or cross-build format skew.
Filed from a multi-agent code review (finder → adversarial verification → synthesis). Confirmed real after a skeptic re-read the code.
Severity: ⚪ low • Category: correctness
Location:
src/serde.rs: 39-43What's wrong
deserializecallsdeserialize_oneand discards the_consumedcount without checking it equalsdata.len(). If a stored value buffer is internally inconsistent (e.g. a tuple header claiming fewer elements than the bytes encode, or any trailing garbage from a format-skewed/corrupt cross-process buffer),deserialize_onereturns the first decoded object asOk(Some(...))and the extra bytes are silently dropped — producing a wrong-but-plausible result with no error. For self-consistent buffers written by this same code path this cannot trigger, so it is only reachable under format skew or external corruption of the mmap file (which is same-trust-domain), hence low severity.Trigger
Only via a corrupt or different-format value buffer in the mmap (e.g. two builds with different serde formats sharing a derived shm file whose version field still matches). Self-produced buffers are always fully consumed.
Suggested fix
After
deserialize_one, verifyconsumed == data.len(); if not, returnOk(None)to route to the pickle fallback (or raise a clear error). Also consider adding a serde format-version byte so skewed buffers are rejected rather than misinterpreted.Adversarial verification note
The code at src/serde.rs:39-43 does exactly what the finding describes:
match deserialize_one(py, data)? { Some((obj, _consumed)) => Ok(Some(obj)), ... }binds_consumedand discards it without comparing todata.len(). So if a buffer decodes a valid first element with trailing bytes, the extra bytes are silently dropped and a partial value is returned without error. The mechanism is confirmed.However, the finding is also correct that this is NOT reachable from self-produced buffers. Buffers are written via
serde::serialize/serde::wrap_pickle(src/shared_store.rs:230-236, 259-264) and read viadeserialize_value(shared_store.rs:279-285), which callsserde::deserializeand falls back to pickle on the TAG_PICKLE/None path. A single top-level element written byserializealways fully consumes its buffer, so there is no trailing garbage in normal operation. Triggering requires either external corruption of the mmap file (located under$TMPDIR/warp_cache/, same trust domain) or two builds with incompatible serde formats sharing a file — and there is no format-version byte to reject that. This is a defensive-hardening gap rather than a bug reachable from the safe Python API.Note the finding's example of a tuple claiming "fewer elements than the bytes encode" is slightly imprecise: tuple deserialization (lines 200-218) recursively consumes per-element and returns its own offset, and the top-level
deserializestill ignores the returned consumed count regardless — so the real (and only) path to a partial read is genuine trailing bytes after a fully-decoded top-level element, which only arises under corruption/skew. That nuance doesn't change the conclusion.Verdict: real but low/defensive. Severity 'low' as claimed is appropriate (borderline nit, since it needs out-of-trust-model file corruption). Suggested fix (verify consumed == data.len(), add a format-version byte) is reasonable hardening.
Filed from a multi-agent code review (finder → adversarial verification → synthesis). Confirmed real after a skeptic re-read the code.