fix(storage): propagate fallible reads instead of panicking in WASM path#3006
Conversation
Three fallible storage operations panicked on transient errors where a
Result could be returned instead:
- WriterSetCell::load_value (runs on every `get`/`get_mut`) used
`.expect("read WriterSetCell value")` on the underlying storage read.
Return `Result<&mut T, StoreError>` and propagate through `get`/`get_mut`
(both already return `Result`), so a transient store error no longer
aborts WASM execution. A missing entry still falls back to `T::default()`.
- blobstore `validate_path_component` panicked on invalid input and used the
Unicode-aware `is_alphanumeric`. Restrict to `is_ascii_alphanumeric`
(rejecting homoglyphs / full-width digits) and return `eyre::Result<()>`;
propagate through `FileSystem`/`BlobManager` `*_path` helpers.
- delta_store `apply_lock_slot` mutex used `.expect("apply_lock_slot
poisoned")` at four sites. Recover the guard via `PoisonError::into_inner`
in a shared `lock_apply_slot` helper. The slot only holds an
`Option<ContextAtomicKey>` taken/replaced wholesale, so a poisoned guard
never exposes a torn value — recovering keeps a transient panic in one
apply from crashing every later apply on the context.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg
There was a problem hiding this comment.
🤖 MeroReviewer
Reviewed by 1 agents | Quality score: 31% | Review time: 52.1s
🟡 Warning (1)
1. version_path validates version but not package
File: crates/store/blobs/src/lib.rs (line 494-496) | Consensus: 1/1 agents ✓
In FileSystem::version_path, only version is validated with validate_path_component, but package is passed directly to self.package_path(package)? which does validate it. This is actually correct via delegation, but the doc comment says 'Returns an error if package or version is not a safe path component' — the validation of package is implicit via the delegated call. This is fine functionally, but the asymmetry is a latent maintenance hazard: if package_path is ever refactored to skip validation, version_path silently loses the package check. Consider adding an explicit validate_path_component(package, Some("package"))? call at the top of version_path for defence-in-depth, matching the pattern used in application_blob_path.
Suggested fix:
Add `utils::validate_path_component(package, Some("package"))?;` at the start of `FileSystem::version_path`, before the `version` check, so the function is self-contained and doesn't rely on the delegated call for correctness.
Found by: security-reviewer
💡 Suggestion (1)
1. Redundant get_or_insert_with after manual Some insertion
File: crates/storage/src/collections/shared.rs (line 298-313) | Consensus: 1/1 agents ✓
After the if slot.is_none() block sets *slot = Some(loaded), the slot is guaranteed to be Some. The subsequent slot.get_or_insert_with(T::default) is therefore dead code — the closure T::default can never run. While harmless, the comment 'this closure never runs' acknowledges this, but the code is still misleading. A direct slot.as_mut().unwrap() (or an expect with a safety comment) would be clearer and more honest about the invariant.
Suggested fix:
Replace `let value = slot.get_or_insert_with(T::default);` with `// SAFETY: slot was populated in the if-block above
let value = slot.as_mut().expect("slot populated above");`
Found by: security-reviewer
🤖 Generated by MeroReviewer | Review ID: review-9b5d74d4
Documentation ReviewThe following documentation may need updates based on the changes in this PR:
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 49% | Review time: 556.1s
🟡 1 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-693ac62c
| // Ensure the component is not empty. | ||
| if component.is_empty() { | ||
| panic!("{component_label} cannot be empty"); | ||
| eyre::bail!("{component_label} cannot be empty"); |
There was a problem hiding this comment.
🟡 Path traversal check runs before character allowlist, allowing bypass via encoded separators
The .. check at line 33 runs before the character allowlist at line 39. However, the real concern is ordering: a component like .. passes the .. check only if it literally contains .., but the character allowlist (. is allowed) would then pass .. through if the boundary check at line 47 didn't catch it. More critically, the .. check uses contains("..") which is correct for the literal string, but the allowlist permits . freely between alphanumeric chars. The boundary check at line 47 only catches leading/trailing dots. A component like a..b is caught by the .. check, but a.b.c is fine. The ordering is actually safe for the current rules, but the comment says 'Prevent path traversal' while the real traversal prevention is the character allowlist blocking / and \. Consider documenting that the allowlist is the primary traversal defense, and the .. check is a belt-and-suspenders guard, to avoid future maintainers reordering these checks thinking they are independent.
Suggested fix:
Add a comment clarifying that the character allowlist (blocking `/`, `\`, etc.) is the primary path-traversal defense, and the `..` check is an additional guard. This prevents future reordering that could weaken security.
| if slot.is_none() { | ||
| let loaded = self.inner.get(self.value_id())?.unwrap_or_default(); | ||
| *slot = Some(loaded); | ||
| } |
There was a problem hiding this comment.
💡 Redundant get_or_insert_with(T::default) after explicit Some assignment
After the if slot.is_none() block unconditionally sets *slot = Some(loaded), the slot is guaranteed to be Some. The subsequent slot.get_or_insert_with(T::default) will therefore never invoke T::default — the comment even says so. Using get_or_insert_with here is misleading because it implies a fallback path exists; a reader must trace the logic to confirm it never fires. A direct unwrap (with a // SAFETY: comment) or expect would be clearer and more honest about the invariant.
Suggested fix:
Replace `let value = slot.get_or_insert_with(T::default);` with `let value = slot.as_mut().expect("slot was just populated above");` and remove the misleading comment about the closure never running.
There was a problem hiding this comment.
I kept get_or_insert_with(T::default) rather than switching to as_mut().expect(...). The whole point of this PR is to remove panics from this hot read path (load_value runs on every get), and a bare expect/unwrap reintroduces one. It would also need a #[expect(clippy::expect_used)] waiver, which is the pattern this change is removing here. get_or_insert_with on a slot that's already Some is the borrow-checker-clean, panic-free way to hand back &mut T (the match slot.as_mut() { Some(v) => v, None => slot.insert(..) } form doesn't compile on stable without Polonius). I've reworded the comment to explain this rather than just say "the closure never runs".
Generated by Claude Code
| /// takes or replaces wholesale, so a poisoned guard never exposes a torn | ||
| /// value. Recovering it (rather than `.expect()`-ing) keeps a transient | ||
| /// panic in one apply from poisoning the mutex and turning every later | ||
| /// apply on this context into a crash. |
There was a problem hiding this comment.
💡 lock_apply_slot recovery of poisoned mutex silently continues with potentially stale state
The lock_apply_slot helper recovers from a poisoned mutex via unwrap_or_else(PoisonError::into_inner). The doc comment correctly explains that the slot holds only an Option<ContextAtomicKey> taken/replaced wholesale, so a torn value is not possible. However, if the prior holder panicked after taking the key but before replacing it, the recovered slot will be None — causing the next caller to issue a fresh ContextAtomic::Lock rather than reusing ContextAtomic::Held. This is safe (the lock is re-acquired), but the prior holder's key is leaked (never returned to the executor). This is an inherent limitation of the recovery approach and is acceptable given the panic scenario, but it is not documented. The comment should note this edge case so future maintainers understand the tradeoff.
Suggested fix:
Add a note to the `lock_apply_slot` doc comment: 'If the prior holder panicked after taking the key but before replacing it, the recovered slot is `None` and the leaked key is abandoned — the executor will issue a fresh lock on the next call. This is safe but means one lock acquisition is lost per panic.'
There was a problem hiding this comment.
Documented this edge case, with one correction: the lock isn't actually leaked. ContextAtomicKey wraps a ContextGuard, which is an owned RwLock guard (OwnedRwLockWriteGuard/OwnedRwLockReadGuard) — so if a holder panics after take()-ing the key, the key is dropped during unwind and its lock is released. The recovered slot is then observed empty and the next apply acquires a fresh ContextAtomic::Lock, which is correct and safe. Added a note to the lock_apply_slot doc comment.
Generated by Claude Code
Address review feedback with comment-only clarifications: - utils.rs: note that the ASCII allowlist (blocking `/`, `\`) is the primary path-traversal defense and the `..` check is a non-independent guard, so a future reorder doesn't weaken it. - shared.rs: explain why `get_or_insert_with(T::default)` is used instead of a bare `expect`/`unwrap` (panic-free hot path, no clippy::expect_used waiver). - delta_store.rs: document the post-`take()` panic case — the slot is observed empty and the next apply takes a fresh lock; nothing leaks because the abandoned ContextAtomicKey is an owned RwLock guard released on unwind. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg
Bugbot is paused — on-demand spend limit reachedBugbot uses usage-based billing for this team and has hit its on-demand spend limit. A team admin can raise the spend limit in the Cursor dashboard, or wait for the next billing cycle to continue. |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 53% | Review time: 220.4s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-decdb2b9
| utils::validate_path_component(version, Some("version"))?; | ||
|
|
||
| self.package_path(package).join(version) | ||
| Ok(self.package_path(package)?.join(version)) |
There was a problem hiding this comment.
🟡 version_path validates version but not package before delegating
In FileSystem::version_path, only version is validated with validate_path_component before calling self.package_path(package)?. While package_path does validate package, the ordering means a malformed version is caught first and the error message may be confusing. More importantly, if the validation order were ever swapped or package_path inlined, package could slip through unvalidated. The original code validated both package and version explicitly before constructing the path; the refactored version relies on the delegation chain for package. This is a latent correctness risk rather than a current bug, but it diverges from the documented contract ("Returns an error if package or version is not a safe path component") in that package is only validated indirectly.
Suggested fix:
Add an explicit `utils::validate_path_component(package, Some("package"))?;` call at the top of `version_path`, before the `validate_path_component(version, ...)` call, to match the original behavior and make the validation self-contained.
There was a problem hiding this comment.
Done in 9e00303. version_path now validates both package and version explicitly and builds the path directly, so the traversal contract is self-contained (matching application_blob_path). One note: the original code also validated only version and delegated package to package_path, so behavior was already equivalent — but making it explicit is clearer and removes the "if package_path were inlined, package could slip through" concern.
Generated by Claude Code
Make `FileSystem::version_path` self-contained by validating both `package` and `version` up front and constructing the path directly, instead of delegating `package` validation to `package_path`. Matches the explicit two-component validation already used by `application_blob_path` and keeps the documented path-traversal contract local to the function. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg
Fixes the `cargo fmt --check` CI failure: collapse the single-line function signature and wrap the long `assert_rejected` call in the test. Formatting only — no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 53% | Review time: 551.5s
🟡 1 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-f5d766d2
| // Allow only alphanumeric characters, hyphen, and dot. | ||
| // Allow only ASCII alphanumeric characters, hyphen, and dot. The check is | ||
| // deliberately ASCII-restricted: a Unicode-aware `is_alphanumeric` would | ||
| // admit homoglyphs and full-width digits that can confuse path handling. |
There was a problem hiding this comment.
🟡 Path traversal check order allows .. bypass via dot-boundary rule
The .. check (line 44) fires before the dot-boundary check (line 57), but the ASCII allowlist check (line 51) fires between them. A component like .. passes the .. check only if it contains .., which it does — so it is correctly rejected. However, the ordering creates a subtle maintenance hazard: if the .. check were ever removed (the comment says it is a 'belt-and-suspenders' guard), a component of exactly .. would pass the ASCII allowlist (only dots) and then be caught by the dot-boundary check (starts_with('.')) — so the fallback works. But a component like a..b would pass both the ASCII allowlist AND the dot-boundary check, relying solely on the .. substring check. The comment correctly documents this dependency ('they are not independent'), but the code does not enforce it structurally. This is a latent path-traversal risk if the .. check is ever accidentally removed during refactoring. Consider adding a debug_assert or a compile-time note that the .. check is load-bearing for the a..b case.
Suggested fix:
Add a comment directly above the ASCII allowlist block: `// NOTE: The '..' check above is load-bearing for components like 'a..b'; do not remove it.` Alternatively, reorder so the allowlist runs first (it already blocks `/` and `\`), then add the `..` check as a secondary guard with a clear note that it is the only guard for consecutive-dot sequences within otherwise-valid characters.
There was a problem hiding this comment.
Good catch — fixed in 8883461. You're right that the .. check is load-bearing for the a..b case (the allowlist permits . and the dot-boundary check only rejects leading/trailing dots), so my earlier "belt-and-suspenders" framing undersold it. Reworded the comment to state plainly that the interior-.. check is the only guard for sequences like a..b and must not be dropped when refactoring. No behavior change — a..b was already rejected; this just makes the invariant explicit for future maintainers.
Generated by Claude Code
The interior-`..` check is the only guard against traversal sequences like `a..b` (the ASCII allowlist permits `.` and the dot-boundary check only catches leading/trailing dots). Reword the comment so a future refactor doesn't drop it as merely belt-and-suspenders. Comment-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 51% | Review time: 117.7s
🟡 1 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-7af1a46a
|
|
||
| // Allow only alphanumeric characters, hyphen, and dot. | ||
| // Allow only ASCII alphanumeric characters, hyphen, and dot. The check is | ||
| // deliberately ASCII-restricted: a Unicode-aware `is_alphanumeric` would |
There was a problem hiding this comment.
🟡 Path traversal: .. check runs before character allowlist, allowing bypass via encoded or mixed sequences
The .. containment check (line 44) fires before the ASCII allowlist check (line 49). A component like a%2e%2eb passes the .. check (no literal ..) and then fails the allowlist (contains %). That ordering is correct for the current allowlist. However, the comment on line 37 says "The allowlist independently blocks path separators" — this is only true as long as the allowlist is ASCII-only. If the allowlist were ever relaxed (e.g. to allow % for URL-encoded names), the .. check would need to run on the decoded form. The current code is safe, but the load-bearing ordering is not enforced structurally. More concretely: the .. check uses str::contains which is byte-level, so a component a\..\. (backslash-dot-dot) passes the .. check and is then caught by the allowlist — but only because \ is not in the allowlist. The security relies on the conjunction of both checks; neither alone is sufficient, and the code comment correctly notes this. No immediate vulnerability, but the ordering dependency is fragile.
Suggested fix:
Add a brief inline comment above the `..` check noting that it must run before any future allowlist relaxation, and that the allowlist must never permit `%`, `\`, or other encoding-prefix characters without re-auditing the traversal guard.
There was a problem hiding this comment.
No change here — this duplicates the already-addressed thread above. As you note, there's no current vulnerability: .. is rejected and the allowlist is ASCII-only, so %/\/%2e all fail the allowlist today. The existing comment already states both guards are necessary and neither subsumes the other. Pre-documenting a hypothetical future where the allowlist is relaxed to permit % (URL-encoded names) is speculative scope — if that change is ever made, decoding-before-validation is the thing to add then, and it would be its own review. Keeping the comment focused on the current invariant.
Generated by Claude Code
| let loaded = self.inner.get(self.value_id())?.unwrap_or_default(); | ||
| *slot = Some(loaded); | ||
| } | ||
| // The slot is `Some` by the line above, so `get_or_insert_with` never |
There was a problem hiding this comment.
💡 Redundant get_or_insert_with(T::default) after explicit Some assignment
After the if slot.is_none() block, slot is guaranteed to be Some. Using slot.get_or_insert_with(T::default) is semantically correct but misleading: the comment explains it avoids unwrap, but as_mut().unwrap() would be equally panic-free here since the Some invariant is established two lines above. The comment is longer than the code it justifies. A cleaner alternative is slot.as_mut().expect("slot is Some; set above") or simply relying on the established invariant with a brief comment, which makes the control flow clearer to future readers without the clippy::expect_used concern (the expect message documents the invariant rather than hiding a real panic risk).
Suggested fix:
Replace `slot.get_or_insert_with(T::default)` with `slot.as_mut().unwrap_or_else(|| unreachable!("slot was set above"))` or add a `// SAFETY: slot is Some; set in the if-block above` comment before `slot.as_mut().unwrap()` to make the invariant explicit without the misleading implication that `T::default` might be called.
There was a problem hiding this comment.
Duplicate of the thread above — keeping get_or_insert_with(T::default). Both suggested alternatives are panic paths: unreachable!() panics, and as_mut().unwrap()/expect() panic and trip clippy::expect_used/unwrap_used (denied here, and removing such panics from this hot read path is the whole point of the PR). get_or_insert_with on an already-Some slot is the borrow-checker-clean, panic-free way to hand back &mut T; the comment explains exactly that. No change.
Generated by Claude Code
frdomovic
left a comment
There was a problem hiding this comment.
Approving — converts the hot read path (WriterSetCell::load_value via get/get_mut), the blob path-validation helpers, and the delta_store.rs mutex handling from panic-based to Result/poison-recovery. I verified no panic was merely relocated: the only .unwrap()s in the diff are in tests, and the prod paths use ? / unwrap_or_else(PoisonError::into_inner) / unwrap_or_default. WriterSetCell::get/get_mut already returned Result before this PR (only the body changed), so no new signature break, and the blob helpers have no callers outside this repo. load_value's get_or_insert_with(T::default) after populating the slot is a sound panic-free &mut T accessor (never actually invokes default). No silent error-swallowing.
Notes (non-blocking): the title says "fallible reads" but sibling .expect() calls remain on other read/write/relocate paths in shared.rs (intentional scoping); and PermissionedStorage::get/get_mut doc-comments still claim "currently infallible," now stale.
Description
This PR converts the
validate_path_componentfunction and related path methods from panic-based error handling to returningResulttypes, improving error handling throughout the blob store and delta store modules.Key Changes
Path validation refactoring (
crates/store/blobs/src/utils.rs):validate_path_componentto returneyre::Result<()>instead of panickingpanic!calls witheyre::bail!for all validation errorsassert_rejectedhelper that checks error messages instead of#[should_panic]Blob manager API updates (
crates/store/blobs/src/lib.rs):BlobManager::package_path()now returnsEyreResult<Utf8PathBuf>BlobManager::version_path()now returnsEyreResult<Utf8PathBuf>BlobManager::application_blob_path()now returnsEyreResult<Utf8PathBuf>FileSystemimplementations updated to propagate validation errors with?operatorDelta store improvements (
crates/node/src/delta_store.rs):lock_apply_slot()helper method to safely recover from poisoned mutexes instead of panicking with.expect().expect("apply_lock_slot poisoned")with calls to the new helperStorage collection error handling (
crates/storage/src/collections/shared.rs):WriterSetCell::load_value()to returnResult<&mut T, StoreError>instead of panicking.expect()with proper error propagationget_mut()andget()to propagate storage errorsTest Plan
crates/store/blobs/src/utils.rshave been updated and now verify error messages instead of panic behaviorWire contract (SDK gate)
N/A — No HTTP wire DTOs or routes changed.
Documentation update
N/A — Changes are internal error handling improvements with no public API documentation impact.
https://claude.ai/code/session_01DWZsAV51GqqGS4TXiZDGpg