Skip to content

fix(sdk-provider-solana): prevent false TransactionExpired for confirmed txs#380

Open
effie-ms wants to merge 7 commits into
mainfrom
fix/solana-false-transaction-expired
Open

fix(sdk-provider-solana): prevent false TransactionExpired for confirmed txs#380
effie-ms wants to merge 7 commits into
mainfrom
fix/solana-false-transaction-expired

Conversation

@effie-ms
Copy link
Copy Markdown
Contributor

@effie-ms effie-ms commented May 8, 2026

Which Linear task is linked to this PR?

EMB-355

Why was it implemented this way?

Three issues caused the SDK to report confirmed Solana transactions as expired:

  1. Wrong polling horizon — after sending, sendAndConfirmTransaction and sendAndConfirmBundle fetched a fresh getLatestBlockhash and polled against that blockhash's lastValidBlockHeight. That blockhash is unrelated to the one inside the signed transaction. Replaced with isBlockhashValid(txBlockhash) which tracks the actual signed transaction's blockhash directly.

  2. No final status check — when the polling loop exited (blockhash expired), the code returned signatureResult = null without a final getSignatureStatuses call. Transactions that confirmed during the last 400ms sleep window were missed. Added a final check before returning null.

  3. replaceRecentBlockhash: true in simulation — simulation swapped the transaction's (potentially stale) blockhash with a fresh one, so simulation always passed even when the actual blockhash was already expired. Removed the flag so stale blockhashes are caught early.

Blockhash extraction from signed transactions

The blockhash is extracted by decoding messageBytes from the signed transaction via getCompiledTransactionMessageDecoder. The compiled message always stores a lifetimeToken — 32 bytes at a fixed offset, base58-encoded. For regular (blockhash-lifetime) transactions this is the blockhash; for durable nonce transactions it is the nonce value.

Durable nonce transaction support

extractBlockhash returns null for durable nonce transactions (detected via isTransactionWithDurableNonceLifetime, which checks for nonce/nonceAccountAddress on the lifetimeConstraint property of the Transaction object). When null, both sendAndConfirmTransaction and sendAndConfirmBundle fall back to the old getLatestBlockhash + getBlockHeight polling pattern as a timeout mechanism — isBlockhashValid doesn't work with nonce values. This is a reasonable polling timeout (~150 blocks / ~90s) rather than a true expiry check, which is fine since durable nonce transactions don't expire based on blockhash.

Acceptance criteria

  • Solana swaps that confirm on-chain are no longer reported as TransactionExpired
  • Polling tracks the signed transaction's actual blockhash, not a freshly fetched one
  • Durable nonce transactions don't throw — they fall back to block-height-based timeout

Test plan

  • Execute a Solana swap (standard path) — confirm it completes without false expiry
  • Execute a Solana swap (Jito bundle path) — confirm bundle confirmation works
  • Reproduce the original bug scenario: submit a tx, wait for blockhash to age, verify it still confirms correctly instead of reporting expired

Checklist before requesting a review

  • I have performed a self-review and testing of my code.
  • This pull request is focused and addresses a single problem.

@effie-ms effie-ms marked this pull request as draft May 8, 2026 12:08
@effie-ms effie-ms self-assigned this May 14, 2026
@effie-ms effie-ms marked this pull request as ready for review May 14, 2026 14:27
@effie-ms effie-ms requested a review from tomiiide May 14, 2026 14:30
Comment thread packages/sdk-provider-solana/src/utils/extractBlockhash.ts
}
return null
})()
sendingPromise.catch(() => {})
Copy link
Copy Markdown
Contributor

@tomiiide tomiiide May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we swallowing errors here. Won't this cause problems?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The sending loop is fire-and-forget — confirmation is handled by pollingPromise, so errors here don't affect the result. Added a comment explaining this.

Also wrapped the isBlockhashValid/getBlockHeight calls in try-catch inside the loop. Previously, if those RPC calls threw, the loop would die silently while blockhashValid stayed true. Now transient failures are handled and the loop continues.

txSignature: string
}

function getConfirmedStatus(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into it's own util function and reuse in sendAndConfirmBundle?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved getConfirmedStatus and SignatureStatus into utils/getConfirmedStatus.ts. Also added isConfirmedCommitment() to replace the inline checks in the bundle path.

Copy link
Copy Markdown
Contributor

@tomiiide tomiiide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great fix, solid job done.

I really like the extractBlockhash util, nice abstraction.

Just two issues flagged:

  1. The getConfirmedStatus helper at the top of sendAndConfirmTransaction.ts — could we pull it into its own util? The bundle path is doing the same confirmed/finalized check inline (twice), so it'd be good to share it.

  2. the sendingPromise.catch(() => {}) , what's the reasoning for swallowing all errors there? If isBlockhashValid or getBlockHeight keeps failing, the loop just spins quietly until it aborts and we'd never know.
    If intentional, can we add a comment for why, else can we let it propagate?

@effie-ms effie-ms requested a review from tomiiide May 18, 2026 08:31
Copy link
Copy Markdown
Contributor

@tomiiide tomiiide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks @effie-ms

chybisov

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@chybisov chybisov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #380

Overview

PR fixes false TransactionExpired errors on Solana swaps that confirm on-chain (EMB-355). Three changes:

  1. Correct polling target — Replaces isBlockhashValid against a freshly-fetched blockhash with a check against the signed transaction's actual blockhash, extracted from the compiled message bytes.
  2. Final status check — Adds a getSignatureStatuses / getBundleStatuses call after the polling loop exits, so a tx that confirmed during the last sleep window isn't reported as expired.
  3. Strict simulation — Drops replaceRecentBlockhash: true from simulation so stale blockhashes fail simulation rather than being silently rewritten and passing.

The first two correctly target the bug. The third is a defensible defensive change.

Type check passes; biome clean.


🔴 Blocker — durable-nonce regression

The PR adds durable-nonce handling: extractBlockhash is meant to return null for nonce-lifetime txs so confirmation falls back to a block-height timeout. The detection mechanism doesn't work, and for nonce-based txs the runtime behavior is strictly worse than main.

Verified against @solana/transactions 6.9.0 in three places:

(1) Codec decode produces a stripped Transaction. transactionCodec.decode() (used in SolanaSignAndExecuteTask.ts:91 before signedTransactions flows downstream) returns literally:

return {
  messageBytes,
  signatures: Object.freeze(signaturesMap)
};

No lifetimeConstraint field on the resulting object.

(2) isTransactionWithDurableNonceLifetime checks for that exact field.

function isTransactionWithDurableNonceLifetime(transaction) {
  return "lifetimeConstraint" in transaction && ...
}

→ Always returns false for codec-decoded txs, regardless of whether the wire bytes actually encode a durable nonce.

(3) lifetimeToken is byte-indistinguishable between modes. From getCompiledLifetimeToken:

if ("nonce" in lifetimeConstraint) return lifetimeConstraint.nonce;
return lifetimeConstraint.blockhash;

Same 32-byte slot; the Blockhash brand is a compile-time tag with no runtime guard.

Runtime trace for a durable-nonce tx on this PR:

  1. extractBlockhash(tx)isTransactionWithDurableNonceLifetime is false → returns compiledMessage.lifetimeToken (the nonce value) typed as Blockhash.
  2. Polling loop enters with blockhashValid = true.
  3. Within ~1s, the sending loop calls rpc.isBlockhashValid(nonceValue). The nonce account holds whatever blockhash was current at last advance — typically minutes to days old, well outside the ~150-slot recent-blockhash window. Returns false.
  4. blockhashValid flips to false; the polling loop exits on its next iteration.
  5. The final getSignatureStatuses check runs once; if the tx hasn't propagated yet, returns null.
  6. SolanaStandardWaitForTransactionTask throws TransactionExpired.

Effective polling window for nonce txs: ~1–2 seconds.

Why this matters: durable-nonce txs are designed not to expire by blockhash. They're valid as long as the nonce account still holds the embedded lifetimeToken — they can legitimately land seconds, minutes, or hours after signing. On main, the old loop happened to poll for ~150 blocks (~90s) against an unrelated fresh blockhash, which functioned as a workable arbitrary timeout. This PR cuts that window to ~1–2s on every nonce-based swap.

Tx type main this PR
Blockhash-lifetime Loose ~150-block polling against an unrelated fresh blockhash Correct: polling against the tx's actual blockhash, with final status check
Durable-nonce Loose ~150-block polling (~90s, accidental but functional) ~1–2s before false TransactionExpired

Required fix

Detect nonce-lifetime from the compiled message (wire format), not from the runtime Transaction object. Two equivalent shapes:

Option A — official helper:

import { getTransactionLifetimeConstraintFromCompiledTransactionMessage } from '@solana/kit'

export async function extractBlockhash(tx: Transaction): Promise<Blockhash | null> {
  const compiled = decoder.decode(tx.messageBytes)
  const c = await getTransactionLifetimeConstraintFromCompiledTransactionMessage(compiled)
  return 'blockhash' in c ? c.blockhash : null
}

Async; both call sites need await.

Option B — inline check (sync, mirrors what the helper does internally):

import { SYSTEM_PROGRAM_ADDRESS } from '@solana-program/system'

function isAdvanceNonceInstruction(ix, staticAccounts) {
  return (
    staticAccounts[ix.programAddressIndex] === SYSTEM_PROGRAM_ADDRESS &&
    ix.data?.byteLength === 4 &&
    ix.data[0] === 4 && ix.data[1] === 0 && ix.data[2] === 0 && ix.data[3] === 0 &&
    ix.accountIndices?.length === 3
  )
}

export function extractBlockhash(tx: Transaction): Blockhash | null {
  const compiled = decoder.decode(tx.messageBytes)
  const first = compiled.instructions[0]
  if (first && isAdvanceNonceInstruction(first, compiled.staticAccounts)) {
    return null
  }
  return compiled.lifetimeToken as Blockhash
}

A more rigorous variant would replace the arbitrary timeout entirely with an on-chain nonce check — getAccountInfo(nonceAccountAddress) → decode the stored nonce → compare to lifetimeToken. If they still match, the tx can still land; if not, it never will. But that's a larger change. The minimum viable fix is: detect nonce txs properly and keep a sensible block-height fallback.

Required test

A unit test that round-trips a real durable-nonce signed tx through extractBlockhash and asserts null. Without one, this regression can return silently on future refactors.


🟡 No unit tests for the new utilities

extractBlockhash and getConfirmedStatus are pure functions on a critical confirmation path. The package has unit specs elsewhere (KeypairWallet.unit.spec.ts, SolanaProvider.unit.spec.ts) but nothing covers:

  • extractBlockhash decoding a known wire-format tx and returning the expected blockhash
  • extractBlockhash returning null for a durable-nonce tx (this would have caught the issue above)
  • isConfirmedCommitment for each commitment value
  • getConfirmedStatus returning the status for confirmed/finalized and null for processed/null

🟡 replaceRecentBlockhash: true removal — trade-off worth confirming

Simulation in SolanaStandardWaitForTransactionTask.ts:54 runs immediately after signing, so blockhash freshness should be fine in the happy path. On slow networks or slow user signing, the signed blockhash may already be aging by the time simulation runs, producing a BlockhashNotFound simulation error — now surfaced as TransactionSimulationFailed rather than TransactionExpired. The error mapping at :68 covers it; just worth knowing the user-visible error bucket has changed.


🟢 Minor / nits

  1. expiryBlockHeight! non-null assertions at sendAndConfirmTransaction.ts:137 and sendAndConfirmBundle.ts:119. Control flow makes them safe but fragile to refactors. Hoisting the timeout check into a local helper that closes over the variable would encode the relationship in scope rather than assertions.

  2. Array.isArray(sigResponse.value) guards at sendAndConfirmBundle.ts:90,145 are leftover JS defensive coding — the typed RPC response already declares value as an array.

  3. Coupled loop cadence in sendAndConfirmTransaction. pollingPromise reads blockhashValid, sendingPromise writes it (every ~1s). The poll loop's effective timeout is therefore implicitly bounded by the send loop's cadence + RPC latency. Same shape as the prior coupling on blockHeight, but worth a comment, or moving validity checks into the polling loop so each loop is self-contained.

  4. File name getConfirmedStatus.ts vs. its exports. It exports two functions and a type; signatureStatus.ts or splitting the type into its own file would better reflect responsibilities.

  5. No retry on isBlockhashValid failure. If a single RPC consistently fails the validity check, that RPC's promise loops forever until abortController is tripped by another. Same failure mode as the old getBlockHeight path, so not a regression — but a max-attempts counter would harden it.


Verdict

Direction is right. Changes required for the nonce regression and accompanying unit-test coverage before merge. The rest are smaller follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants