Surface FFI panic events instead of silently dropping them#343
Open
MaxHeimbrock wants to merge 1 commit into
Open
Surface FFI panic events instead of silently dropping them#343MaxHeimbrock wants to merge 1 commit into
MaxHeimbrock wants to merge 1 commit into
Conversation
FfiEvent.Panic is the FFI layer's unrecoverable-error notification: a background task died (watch_panic), a request handler panicked, or the FFI hit a state error with no callback left to carry it (e.g. the ready-handshake timeout force-closing a room after ConnectCallback). Unity's DispatchEvent had `case Panic: break` — a stub from an early protocol sync — so all of these were fully silent: rooms kept looking connected while their event pipeline was dead, and in-flight requests hung forever. DispatchEvent now logs the panic, cancels all pending callbacks (so awaiting instructions resolve with IsError instead of hanging; done before raising PanicReceived so requests issued by reacting user handlers are not swept up), and raises a new internal FfiClient.PanicReceived event. Each connected Room subscribes and tears itself down through the disconnect path apps already handle: Disconnected / DisconnectedWithReason(UnknownReason) + cleanup. Note release FFI binaries build with panic=abort, so genuine Rust panics crash the process before an event can be sent; this wiring covers debug builds and the deliberate send_panic error paths, which include the ready-timeout zombie-room case. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FfiEvent.Panicis the FFI layer's global unrecoverable-error notification, and Unity dropped it on the floor (DispatchEventhadcase Panic: break;— a stub from an early protocol sync that was never wired). This PR surfaces panics through the disconnect path apps already handle.What panic events are
The FFI sends
Panic { message }(no room handle — it's global) from three kinds of places:watch_panicwraps essentially every FFI task (room event forwarding, stream tasks, the connect task, ...) and forwards theJoinError. The Rust comment: "Recommended behaviour is to exit the process."catch_unwindincabi.rs).send_panicfor state errors with no callback left to carry them — currently the ready-handshake timeout, which force-closes a room afterConnectCallbackalready went out.After a panic the FFI's state is unreliable: a room whose event task died silently stops receiving events (including
Disconnecteditself — a zombie room), and in-flight async requests never complete (hungYieldInstructions).Caveat worth knowing: release FFI binaries build with
panic = "abort"(workspaceCargo.toml), so genuine Rust panics crash the process before any event is sent. In shipped builds this wiring covers the deliberatesend_panicpaths — including the ready-timeout zombie-room case — and it covers everything in local debug builds.What this does
Since "exit the process" is not acceptable in Unity (Editor / shipped games), the panic becomes an observable disconnect:
Utils.Error(works in Editor console and player logs; previously the only trace was a Rust-sidelog::error!that is invisible unlessLK_VERBOSE).ConnectInstruction) resolve withIsErrorinstead of hanging forever. Done before raising the event so requests issued by user handlers reacting to the panic (reconnect attempts) are not swept up.FfiClient.PanicReceivedevent; each connectedRoomsubscribes inOnConnectand tears itself down through the existing sequence:Disconnected,DisconnectedWithReason(UnknownReason), cleanup.Deliberate trade-off:
Paniccarries no room identity and the FFI declares its whole state unreliable, so all live rooms are torn down — conservative, but it converts silent zombie state into the reconnect flow apps already implement.Tests
PanicEventTests: drivesRouteFfiEventwith a synthetic panic from a non-main thread; asserts main-thread marshalling,PanicReceivedpayload, pending-callback cancellation (not completion), and entry removal.PanicRoomTeardownTests(E2E): connects a real room, injects a synthetic panic throughRouteFfiEvent, assertsDisconnected/DisconnectedWithReason(UnknownReason)fire andIsConnectedgoes false.LateJoinTrackSubscriptionTestsre-run as connect-path regression guard.All green locally against
livekit-server --dev.Follow-ups (not in this PR)
Disconnectedroom event on the ready-timeout path, and/or reconsider the 15sROOM_EVENT_READY_TIMEOUT.panic = "unwind"so real panics become observable events instead of process crashes (binary-size trade-off).🤖 Generated with Claude Code