fix(sdk-provider-solana): prevent false TransactionExpired for confirmed txs#380
fix(sdk-provider-solana): prevent false TransactionExpired for confirmed txs#380effie-ms wants to merge 7 commits into
Conversation
| } | ||
| return null | ||
| })() | ||
| sendingPromise.catch(() => {}) |
There was a problem hiding this comment.
Why are we swallowing errors here. Won't this cause problems?
There was a problem hiding this comment.
Good catch. The sending loop is fire-and-forget — confirmation is handled by pollingPromise, so errors here don't affect the result. Added a comment explaining this.
Also wrapped the isBlockhashValid/getBlockHeight calls in try-catch inside the loop. Previously, if those RPC calls threw, the loop would die silently while blockhashValid stayed true. Now transient failures are handled and the loop continues.
| txSignature: string | ||
| } | ||
|
|
||
| function getConfirmedStatus( |
There was a problem hiding this comment.
Can we move this into it's own util function and reuse in sendAndConfirmBundle?
There was a problem hiding this comment.
Done. Moved getConfirmedStatus and SignatureStatus into utils/getConfirmedStatus.ts. Also added isConfirmedCommitment() to replace the inline checks in the bundle path.
tomiiide
left a comment
There was a problem hiding this comment.
Overall great fix, solid job done.
I really like the extractBlockhash util, nice abstraction.
Just two issues flagged:
-
The
getConfirmedStatushelper at the top ofsendAndConfirmTransaction.ts— could we pull it into its own util? The bundle path is doing the same confirmed/finalized check inline (twice), so it'd be good to share it. -
the
sendingPromise.catch(() => {}), what's the reasoning for swallowing all errors there? IfisBlockhashValidorgetBlockHeightkeeps failing, the loop just spins quietly until it aborts and we'd never know.
If intentional, can we add a comment for why, else can we let it propagate?
…n RPC error handling
There was a problem hiding this comment.
Code Review — PR #380
Overview
PR fixes false TransactionExpired errors on Solana swaps that confirm on-chain (EMB-355). Three changes:
- Correct polling target — Replaces
isBlockhashValidagainst a freshly-fetched blockhash with a check against the signed transaction's actual blockhash, extracted from the compiled message bytes. - Final status check — Adds a
getSignatureStatuses/getBundleStatusescall after the polling loop exits, so a tx that confirmed during the last sleep window isn't reported as expired. - Strict simulation — Drops
replaceRecentBlockhash: truefrom simulation so stale blockhashes fail simulation rather than being silently rewritten and passing.
The first two correctly target the bug. The third is a defensible defensive change.
Type check passes; biome clean.
🔴 Blocker — durable-nonce regression
The PR adds durable-nonce handling: extractBlockhash is meant to return null for nonce-lifetime txs so confirmation falls back to a block-height timeout. The detection mechanism doesn't work, and for nonce-based txs the runtime behavior is strictly worse than main.
Verified against @solana/transactions 6.9.0 in three places:
(1) Codec decode produces a stripped Transaction. transactionCodec.decode() (used in SolanaSignAndExecuteTask.ts:91 before signedTransactions flows downstream) returns literally:
return {
messageBytes,
signatures: Object.freeze(signaturesMap)
};No lifetimeConstraint field on the resulting object.
(2) isTransactionWithDurableNonceLifetime checks for that exact field.
function isTransactionWithDurableNonceLifetime(transaction) {
return "lifetimeConstraint" in transaction && ...
}→ Always returns false for codec-decoded txs, regardless of whether the wire bytes actually encode a durable nonce.
(3) lifetimeToken is byte-indistinguishable between modes. From getCompiledLifetimeToken:
if ("nonce" in lifetimeConstraint) return lifetimeConstraint.nonce;
return lifetimeConstraint.blockhash;Same 32-byte slot; the Blockhash brand is a compile-time tag with no runtime guard.
Runtime trace for a durable-nonce tx on this PR:
extractBlockhash(tx)→isTransactionWithDurableNonceLifetimeisfalse→ returnscompiledMessage.lifetimeToken(the nonce value) typed asBlockhash.- Polling loop enters with
blockhashValid = true. - Within ~1s, the sending loop calls
rpc.isBlockhashValid(nonceValue). The nonce account holds whatever blockhash was current at last advance — typically minutes to days old, well outside the ~150-slot recent-blockhash window. Returnsfalse. blockhashValidflips tofalse; the polling loop exits on its next iteration.- The final
getSignatureStatusescheck runs once; if the tx hasn't propagated yet, returnsnull. SolanaStandardWaitForTransactionTaskthrowsTransactionExpired.
Effective polling window for nonce txs: ~1–2 seconds.
Why this matters: durable-nonce txs are designed not to expire by blockhash. They're valid as long as the nonce account still holds the embedded lifetimeToken — they can legitimately land seconds, minutes, or hours after signing. On main, the old loop happened to poll for ~150 blocks (~90s) against an unrelated fresh blockhash, which functioned as a workable arbitrary timeout. This PR cuts that window to ~1–2s on every nonce-based swap.
| Tx type | main |
this PR |
|---|---|---|
| Blockhash-lifetime | Loose ~150-block polling against an unrelated fresh blockhash | Correct: polling against the tx's actual blockhash, with final status check |
| Durable-nonce | Loose ~150-block polling (~90s, accidental but functional) | ~1–2s before false TransactionExpired |
Required fix
Detect nonce-lifetime from the compiled message (wire format), not from the runtime Transaction object. Two equivalent shapes:
Option A — official helper:
import { getTransactionLifetimeConstraintFromCompiledTransactionMessage } from '@solana/kit'
export async function extractBlockhash(tx: Transaction): Promise<Blockhash | null> {
const compiled = decoder.decode(tx.messageBytes)
const c = await getTransactionLifetimeConstraintFromCompiledTransactionMessage(compiled)
return 'blockhash' in c ? c.blockhash : null
}Async; both call sites need await.
Option B — inline check (sync, mirrors what the helper does internally):
import { SYSTEM_PROGRAM_ADDRESS } from '@solana-program/system'
function isAdvanceNonceInstruction(ix, staticAccounts) {
return (
staticAccounts[ix.programAddressIndex] === SYSTEM_PROGRAM_ADDRESS &&
ix.data?.byteLength === 4 &&
ix.data[0] === 4 && ix.data[1] === 0 && ix.data[2] === 0 && ix.data[3] === 0 &&
ix.accountIndices?.length === 3
)
}
export function extractBlockhash(tx: Transaction): Blockhash | null {
const compiled = decoder.decode(tx.messageBytes)
const first = compiled.instructions[0]
if (first && isAdvanceNonceInstruction(first, compiled.staticAccounts)) {
return null
}
return compiled.lifetimeToken as Blockhash
}A more rigorous variant would replace the arbitrary timeout entirely with an on-chain nonce check — getAccountInfo(nonceAccountAddress) → decode the stored nonce → compare to lifetimeToken. If they still match, the tx can still land; if not, it never will. But that's a larger change. The minimum viable fix is: detect nonce txs properly and keep a sensible block-height fallback.
Required test
A unit test that round-trips a real durable-nonce signed tx through extractBlockhash and asserts null. Without one, this regression can return silently on future refactors.
🟡 No unit tests for the new utilities
extractBlockhash and getConfirmedStatus are pure functions on a critical confirmation path. The package has unit specs elsewhere (KeypairWallet.unit.spec.ts, SolanaProvider.unit.spec.ts) but nothing covers:
extractBlockhashdecoding a known wire-format tx and returning the expected blockhashextractBlockhashreturningnullfor a durable-nonce tx (this would have caught the issue above)isConfirmedCommitmentfor each commitment valuegetConfirmedStatusreturning the status forconfirmed/finalizedandnullforprocessed/null
🟡 replaceRecentBlockhash: true removal — trade-off worth confirming
Simulation in SolanaStandardWaitForTransactionTask.ts:54 runs immediately after signing, so blockhash freshness should be fine in the happy path. On slow networks or slow user signing, the signed blockhash may already be aging by the time simulation runs, producing a BlockhashNotFound simulation error — now surfaced as TransactionSimulationFailed rather than TransactionExpired. The error mapping at :68 covers it; just worth knowing the user-visible error bucket has changed.
🟢 Minor / nits
-
expiryBlockHeight!non-null assertions atsendAndConfirmTransaction.ts:137andsendAndConfirmBundle.ts:119. Control flow makes them safe but fragile to refactors. Hoisting the timeout check into a local helper that closes over the variable would encode the relationship in scope rather than assertions. -
Array.isArray(sigResponse.value)guards atsendAndConfirmBundle.ts:90,145are leftover JS defensive coding — the typed RPC response already declaresvalueas an array. -
Coupled loop cadence in
sendAndConfirmTransaction.pollingPromisereadsblockhashValid,sendingPromisewrites it (every ~1s). The poll loop's effective timeout is therefore implicitly bounded by the send loop's cadence + RPC latency. Same shape as the prior coupling onblockHeight, but worth a comment, or moving validity checks into the polling loop so each loop is self-contained. -
File name
getConfirmedStatus.tsvs. its exports. It exports two functions and a type;signatureStatus.tsor splitting the type into its own file would better reflect responsibilities. -
No retry on
isBlockhashValidfailure. If a single RPC consistently fails the validity check, that RPC's promise loops forever untilabortControlleris tripped by another. Same failure mode as the oldgetBlockHeightpath, so not a regression — but a max-attempts counter would harden it.
Verdict
Direction is right. Changes required for the nonce regression and accompanying unit-test coverage before merge. The rest are smaller follow-ups.
Which Linear task is linked to this PR?
EMB-355
Why was it implemented this way?
Three issues caused the SDK to report confirmed Solana transactions as expired:
Wrong polling horizon — after sending,
sendAndConfirmTransactionandsendAndConfirmBundlefetched a freshgetLatestBlockhashand polled against that blockhash'slastValidBlockHeight. That blockhash is unrelated to the one inside the signed transaction. Replaced withisBlockhashValid(txBlockhash)which tracks the actual signed transaction's blockhash directly.No final status check — when the polling loop exited (blockhash expired), the code returned
signatureResult = nullwithout a finalgetSignatureStatusescall. Transactions that confirmed during the last 400ms sleep window were missed. Added a final check before returning null.replaceRecentBlockhash: truein simulation — simulation swapped the transaction's (potentially stale) blockhash with a fresh one, so simulation always passed even when the actual blockhash was already expired. Removed the flag so stale blockhashes are caught early.Blockhash extraction from signed transactions
The blockhash is extracted by decoding
messageBytesfrom the signed transaction viagetCompiledTransactionMessageDecoder. The compiled message always stores alifetimeToken— 32 bytes at a fixed offset, base58-encoded. For regular (blockhash-lifetime) transactions this is the blockhash; for durable nonce transactions it is the nonce value.Durable nonce transaction support
extractBlockhashreturnsnullfor durable nonce transactions (detected viaisTransactionWithDurableNonceLifetime, which checks fornonce/nonceAccountAddresson thelifetimeConstraintproperty of the Transaction object). Whennull, bothsendAndConfirmTransactionandsendAndConfirmBundlefall back to the oldgetLatestBlockhash+getBlockHeightpolling pattern as a timeout mechanism —isBlockhashValiddoesn't work with nonce values. This is a reasonable polling timeout (~150 blocks / ~90s) rather than a true expiry check, which is fine since durable nonce transactions don't expire based on blockhash.Acceptance criteria
TransactionExpiredTest plan
Checklist before requesting a review