Skip to content

fix(keepkey-webusb): correct stalled-read recovery + time-bound USB transfers#44

Closed
BitHighlander wants to merge 3 commits into
masterfrom
fix/webusb-stall-and-transfer-timeout
Closed

fix(keepkey-webusb): correct stalled-read recovery + time-bound USB transfers#44
BitHighlander wants to merge 3 commits into
masterfrom
fix/webusb-stall-and-transfer-timeout

Conversation

@BitHighlander

Copy link
Copy Markdown
Collaborator

Summary

Two robustness fixes to the KeepKey WebUSB transports, surfaced by a Windows USB audit of keepkey-vault (companion PR #227). These address the "device froze / had to restart the app" class of Windows reports (distinct from the "device not detected" symptom, which is fixed app-side in the vault).

Commit 1 — clearHalt on the correct endpoint when a read stalls (both transports)

readChunk() reads the IN endpoint (transferIn), but on a "stall" status it called clearHalt("out", …) — the wrong endpoint and direction. On WinUSB a halted IN pipe is only resumed by resetting that same pipe, so the IN pipe stayed halted and every subsequent read re-stalled until a physical replug. It also fell through and returned the stalled transfer's empty/short buffer, which the framing parser then rejected as "message not valid".

Fix: clearHalt("in", …) on the pipe that actually stalled, and throw a retryable "bad read" instead of returning a non-packet. Applied to both hdwallet-keepkey-nodewebusb (Node/libusb) and hdwallet-keepkey-webusb (browser).

Commit 2 — time-bound USB transfers (hdwallet-keepkey-nodewebusb)

transferIn/transferOut were submitted with no timeout (libusb defaults to infinite). A device that stops responding — mid-reboot, suspended by Windows USB selective-suspend, or unplugged mid-write — left the promise pending forever, holding the claimed interface and producing a false ConflictingApp (LIBUSB code 19) on the next pair.

Fix: wrap each transfer in a timeout —

  • writes: DEFAULT_TIMEOUT (5 s) — a write never legitimately blocks;
  • reads: LONG_TIMEOUT (5 min) — a read may legitimately wait on a device button press (PIN, passphrase, tx/seed confirmation), so it must not be cut short.

On timeout we throw; the caller's existing disconnect path closes the device, which aborts the orphaned native transfer and releases the interface.

Scope / notes

  • Read ceiling is LONG_TIMEOUT (5 min) because the transport can't see the per-call timeout — msgTimeout in hdwallet-keepkey is set but never plumbed down to readChunk. A follow-up could thread the call timeout through readChunk/writeChunk for tighter bounds on normal (non-button) operations. The current bound is strictly better than infinite and safe for all button flows.
  • FIX-3 (timeouts) is applied to the Node transport (the one the desktop vault uses, where the Windows hangs were reported); the browser webusb transport gets the clearHalt fix only. The same timeout pattern could be mirrored there.
  • Not built locally (no deps installed in the worktree), but the timeout helper mirrors the existing withTimeout pattern used in keepkey-vault, and core.DEFAULT_TIMEOUT/core.LONG_TIMEOUT are already used by hdwallet-keepkey/src/transport.ts.

Testing requested before merge

Exercise on real hardware on Windows: PIN entry, passphrase, a tx signing left to sit >60 s, and seed verification — confirm none time out — plus an unplug-mid-operation to confirm clean recovery (no false "device in use" on reconnect).

After merge, bump the modules/hdwallet submodule pointer in keepkey-vault.

🤖 Generated with Claude Code

BitHighlander and others added 2 commits June 7, 2026 14:55
readChunk() reads the IN endpoint (transferIn), but on a "stall" status it
called clearHalt("out", ...) -- the wrong endpoint and direction. On WinUSB
the halted IN pipe then stays halted and every subsequent read re-stalls
until a physical replug. It also fell through and returned the stalled
transfer's (empty/short) buffer, which the framing parser rejected as
"message not valid".

Clear the halt on the IN pipe that actually stalled and throw a retryable
"bad read" instead of returning a non-packet. Applied to both the Node
(nodewebusb) and browser (webusb) transports.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…can't hang forever

transferIn/transferOut were submitted with no timeout (libusb defaults to
infinite), so a device that stops responding -- mid-reboot, suspended by
Windows USB selective-suspend, or unplugged mid-write -- left the read/write
promise pending forever, holding the claimed interface and producing a false
ConflictingApp on the next pair.

Wrap each transfer in a timeout: DEFAULT_TIMEOUT (5s) for writes (which never
legitimately block) and LONG_TIMEOUT (5min) for reads (which may wait on a
device button press for PIN/passphrase/tx confirmation, so they must not be
cut short). On timeout we throw; the caller's existing disconnect path closes
the device, aborting the orphaned native transfer and releasing the interface.

Note: the read ceiling is LONG_TIMEOUT because the transport cannot see the
per-call timeout (msgTimeout in hdwallet-keepkey is set but never plumbed
down). A follow-up could thread the call timeout through readChunk for tighter
bounds on normal (non-button) operations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 7, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hdwallet-sandbox Ready Ready Preview, Comment Jun 7, 2026 8:45pm

Request Review

…sfer timeout

The prior comment overstated that the caller's disconnect path releases the
interface on timeout. It does on the syncState/getFeatures recovery path but NOT
on the sign/address RPC paths. Document the real behavior + the remaining options
(harden caller, or release in-transport).
@BitHighlander

Copy link
Copy Markdown
Collaborator Author

Superseded by the focused, lower-risk split: #45 contains just the clearHalt IN/OUT correctness fix. The transfer-timeout half of this PR is deferred (it touches the shared transport more invasively and needs its own cross-platform validation); it can come back as a separate PR if/when Windows USB diagnostics confirm a real wedged-transfer problem. Closing to keep the correctness fix clean and independently reviewable.

BitHighlander added a commit that referenced this pull request Jun 7, 2026
…#45)

readChunk() reads the IN endpoint (transferIn) but on a "stall" it cleared
clearHalt("out", ...) -- the wrong endpoint and direction. A halted IN pipe is
only resumed by resetting that same pipe, so it stayed halted. The browser
transport also fell through and returned the stalled (empty/short) buffer, which
the framing parser then rejected as "message not valid".

Reset the IN pipe and throw a retryable "bad read" instead of surfacing a
non-packet. Applied to both hdwallet-keepkey-nodewebusb (Node/libusb) and
hdwallet-keepkey-webusb (browser).

Split out from #44 (which also added transfer timeouts); this is the isolated,
low-risk correctness fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant