starknet_transaction_prover: retry storage-proof RPC fetches with backoff#14396
starknet_transaction_prover: retry storage-proof RPC fetches with backoff#14396l-henri wants to merge 1 commit into
Conversation
…koff Proving a virtual block fetches a storage proof per chunk of touched state (fetch_single_proof). Against a public node these bursts get rate-limited: a single HTTP 429/5xx arrives as a non-JSON body that starknet-rs fails to parse, surfacing as `ProviderError::Other(expected value at line 1 column 1)`. With no retry, one such transient failure anywhere in input collection aborted a prove that had already run for minutes. Retry transient get_storage_proof failures with exponential backoff (5 attempts, 0.5s..8s; configurable via STARKNET_PROVER_RPC_MAX_RETRIES). Structured upstream JSON-RPC errors (block-not-found, etc.) are deterministic and explicitly NOT retried, mirroring the downcast in `impl From<ProviderError> for ProofProviderError`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR SummaryLow Risk Overview Retry policy defaults to 5 attempts with delays 0.5s → 8s (doubling, capped), overridable via Reviewed by Cursor Bugbot for commit 16bb5d2. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
Henri writing, not an agent. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Reviewed by Cursor Bugbot for commit 16bb5d2. Configure here.
| .is_some_and(|e| matches!(e, JsonRpcClientError::JsonRpcError(_))); | ||
| if is_structured_rpc_error { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
JSON-RPC rate limits not retried
Medium Severity
is_retryable_rpc_error treats every ProviderError::Other downcast to JsonRpcClientError::JsonRpcError as non-retryable. Nodes that answer throttling with a valid JSON-RPC error (e.g. “too many requests”) therefore fail on the first attempt, even though the change is meant to retry rate limits.
Reviewed by Cursor Bugbot for commit 16bb5d2. Configure here.
| match result { | ||
| Ok(storage_proof) => return Ok(storage_proof), | ||
| Err(err) if attempt < max_retries && is_retryable_rpc_error(&err) => { | ||
| attempt += 1; |
There was a problem hiding this comment.
Retry cap exceeds documented attempts
Low Severity
Docs and the PR describe STARKNET_PROVER_RPC_MAX_RETRIES as the number of attempts (default five), but fetch_single_proof retries while attempt < max_retries after each failure, yielding one initial call plus up to max_retries retries—six RPC calls and five backoff sleeps by default, not the documented five attempts and ~7.5s wait.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 16bb5d2. Configure here.


Problem
Proving a virtual block fetches a storage proof per 100-key chunk of touched state via
RpcStorageProofsProvider::fetch_single_proof(crates/starknet_transaction_prover/src/running/storage_proofs.rs). Against a public RPC node, a burst of these gets rate-limited. The call is a bareget_storage_proof().await?with no retry, so a single HTTP 429/5xx aborts a prove that has already run for minutes.The failure is also opaque: a throttling node returns a non-JSON body, which
starknet-rsfails to deserialize and surfaces asProviderError::Other(..)carryingexpected value at line 1 column 1— not a structured JSON-RPC error.Field data: a controlled replay of 150 small requests against a public node returned 137× HTTP 429; one anywhere in input collection killed the whole prove.
Fix
Retry transient
get_storage_prooffailures with exponential backoff:STARKNET_PROVER_RPC_MAX_RETRIES.is_retryable_rpc_error()retries rate-limit / 5xx / transport / non-JSON-parse failures.JsonRpcClientError::JsonRpcErrordowncast already used inimpl From<ProviderError> for ProofProviderError, so real errors still fail fast.tracing::warn!is emitted per retry with attempt / backoff / error context.No new dependencies (uses
tokio::time+tracing, both already in the crate).Testing
cargo check -p starknet_transaction_proverpasses. (Compiled on thePRIVACY-0.14.2-RC.6line, where this was found; main's error model — theFrom<ProviderError>downcast — and deps are identical, and the patch applies tomainunchanged.)