From 31e9a42b541979db89869e302b161837d0c20589 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sat, 13 Jun 2026 07:27:26 -0400 Subject: [PATCH 01/16] Add MatrixRTC call troubleshooting guide Document how affected users can capture Activity Log JSON exports filtered to the Call category, what each diagnostic signal in the export means (credential discovery, token exchange, key distribution, frame routing), and how to file useful reports. Also covers the unified-log fallback via `log stream` for harder cases and notes which data is and isn't safe to share publicly. Assisted-By: Claude Sonnet 4.7 --- docs/troubleshooting-calls.md | 108 ++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 docs/troubleshooting-calls.md diff --git a/docs/troubleshooting-calls.md b/docs/troubleshooting-calls.md new file mode 100644 index 0000000..794ab64 --- /dev/null +++ b/docs/troubleshooting-calls.md @@ -0,0 +1,108 @@ +# Troubleshooting MatrixRTC calls + +If your calls fail to connect, or connect but show no audio or video, this page walks you through capturing the data we need to diagnose the issue. + +## Quick capture (3 minutes) + +1. Open **Window → Activity Log** (or press `⌥⌘A`). +2. In the search bar, click the **filter chip** and limit to the **Call** category. +3. Leave the Activity Log window open and reproduce the failing call. +4. Once the call has failed (or you have media issues), press `⌘S` in the Activity Log window to export the filtered events. +5. Save the file (default name `relay-activity-log.json`) and attach it to your bug report. + +That export is everything the developers need to triage a calling problem. + +## What's in the export + +The file is pretty-printed JSON: an array of events, each with `timestamp` (ISO 8601), `category` (will be `"call"` after filtering), `severity` (`debug` / `info` / `warning` / `error`), `source` (which subsystem logged it), `summary`, optional `detail`, optional `roomId`, and a `metadata` key-value map. + +Sample event: + +```json +{ + "timestamp": "2026-06-12T14:30:05.123Z", + "category": "call", + "severity": "info", + "source": "LiveKitCredentialService", + "summary": "SFU URL discovered", + "roomId": "!abc:example.org", + "metadata": {} +} +``` + +## What's safe to share + +The export contains: + +- Your Matrix room ID (`!…:server`) and device IDs +- Per-call membership UUIDs and key indices +- Your homeserver hostname +- SHA-256 fingerprints (first 8 hex chars) of encryption keys, **never the keys themselves** + +It does **not** contain: + +- Raw E2EE keys +- OpenID tokens or LiveKit JWTs +- Message contents, names, or avatars +- The OpenID access token used for SFU auth + +If you don't want your room IDs or device IDs in a public bug report, ask the maintainers for a DM in [#relayapp:matrix.org](https://matrix.to/#/#relayapp:matrix.org) and share the file there. + +## Reading the export yourself + +A few specific log lines act as signposts. If your file contains any of these, you can pre-diagnose your own issue: + +### Connection-time signals + +| Look for | What it means | +| --- | --- | +| `Fetching call credentials` | The call attempt started; subsequent events should show whether discovery and token exchange succeeded. | +| `SFU URL discovered` | Your homeserver advertises a LiveKit SFU. Good. | +| `Failed to fetch call credentials` with `detail: "This homeserver has no LiveKit call server configured…"` | Your homeserver doesn't expose `org.matrix.msc4143.rtc_foci` in `.well-known/matrix/client`, and the unstable transports endpoint isn't supported. Ask your homeserver admin to configure MatrixRTC. | +| `Call credentials obtained` | Token exchange succeeded. If the call still fails after this, the problem is downstream of credential acquisition. | + +### Connected-but-no-media signals + +If the call reaches the **Connected to call** event but you can't see or hear anyone, the failure is in the encryption-key exchange or frame routing. + +| Look for | What it means | +| --- | --- | +| `Distributed E2EE key to N user(s)` followed by `Received E2EE key from …` for each peer | Key exchange is happening. If you still have no media, the problem is in the frame-decoder routing — note the `Participant:` field in the `detail`, this is the identity LiveKit assigned to the peer. | +| No `Received E2EE key from …` events at all | Peers aren't sending you their keys, or the widget bridge isn't running. Check whether the room is configured as encrypted (E2EE is enabled only for encrypted Matrix rooms). | +| `Widget bridge started` but no later events | The widget driver is waiting for capability negotiation that never completes. Likely an SDK or homeserver-side issue. | + +### Patterns worth flagging in a bug report + +These specific event sequences point to a known class of failure: + +1. **No `Call credentials obtained` event after `Fetching call credentials`.** Credential exchange is failing. Almost always a homeserver-side or SFU-side configuration problem; we'll need to know which homeserver you're on. + +2. **`Connected to call` but no `Distributed E2EE key` event.** The Matrix call-member state event went out, but no peers existed at the time you connected, or our cache of call members is stale. If others were already in the call, this is a Relay bug worth reporting. + +3. **`Received E2EE key from …` events present, but you still see no media from those peers.** Frame-cryptor routing is misaligned with the LiveKit participant identity. This is currently a known issue we're working on; please attach the export and note the LiveKit `Participant:` identity you see in those events' `detail` field. + +## When the Activity Log isn't enough + +For really hard cases (the SFU is rejecting our JWT with no useful error, or the LiveKit room itself never finishes initialising) we sometimes need a unified-log capture, which records the low-level RTC trace from inside the LiveKit SDK. + +While the call is reproducing the issue, run in a terminal: + +```sh +log stream --predicate 'subsystem == "RelayKit" AND category BEGINSWITH "Call"' \ + --level info > relay-call-trace.log +``` + +Stop with `^C` once the call has failed, then share `relay-call-trace.log` alongside the Activity Log JSON. + +The unified-log capture contains more verbose internal trace including LiveKit SDK output. It's safe in the same way the Activity Log is (no key material, no tokens), but it does contain more verbose timing and routing data. Share it through the same channel you'd share the JSON. + +## Reporting + +File an issue at [github.com/subpop/Relay/issues](https://github.com/subpop/Relay/issues) or message [#relayapp:matrix.org](https://matrix.to/#/#relayapp:matrix.org). Please include: + +- The `relay-activity-log.json` export (filtered to the Call category) +- Your homeserver hostname (e.g. `matrix.example.org`) +- Whether other clients (Element X, Element Web) succeed at calling on the same account +- A one-line description of what you saw: "fails to connect", "connects but no audio", "connects but no video", etc. + +If you'd rather not put logs in a public issue, send them privately to maintainers in the Matrix room first. From f6ad4bc87cc377fb822af052a26fb3baae2885fc Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sat, 13 Jun 2026 13:50:56 -0400 Subject: [PATCH 02/16] Add Relay vs Element Call MatrixRTC deviation notes Engineering reference under docs/internal/ that maps every Relay call-path function to its Element Call / matrix-js-sdk counterpart. Each entry cites the exact source line in the upstream reference and the matching range in Relay, and flags deviations confirmed against MSCs, source code, or real-world user traces. Acts as the worklog for the rtc-element-call-alignment branch. Assisted-By: Claude Sonnet 4.7 --- docs/internal/rtc-element-call-diff.md | 152 +++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 docs/internal/rtc-element-call-diff.md diff --git a/docs/internal/rtc-element-call-diff.md b/docs/internal/rtc-element-call-diff.md new file mode 100644 index 0000000..314163d --- /dev/null +++ b/docs/internal/rtc-element-call-diff.md @@ -0,0 +1,152 @@ +# MatrixRTC: Relay vs Element Call deviations + +Engineering reference for the MatrixRTC implementation. Maps every Relay call-path function to its Element Call / matrix-js-sdk counterpart and flags deviations that have either been confirmed against real-world traces or against the published MSCs. + +## Sources used + +- **MSC4143** ([toger5/matrixRTC](https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/matrixRTC/proposals/4143-matrix-rtc.md)) — the not-yet-deployed `m.rtc.slot` / `m.rtc.member` sticky-event protocol. Production uses the legacy `m.call.member` shape. +- **MSC4195** ([hughns/matrixrtc-livekit](https://github.com/hughns/matrix-spec-proposals/blob/hughns/matrixrtc-livekit/proposals/4195-matrixrtc-livekit.md)) — the LiveKit `/get_token` endpoint and pseudonymous-identity scheme. +- **Element Call**, `src/livekit/openIDSFU.ts` ([livekit branch](https://github.com/element-hq/element-call/blob/livekit/src/livekit/openIDSFU.ts)) — the production credential-exchange path. +- **matrix-js-sdk**, `src/matrixrtc/` — the membership / encryption manager / LiveKit transport types. +- **lk-jwt-service**, `requests.go` + `handler.go` + `helper.go` ([element-hq/lk-jwt-service](https://github.com/element-hq/lk-jwt-service)) — the reference SFU auth service; what production homeservers actually run. + +## File map + +| Relay file | Responsibility | Element Call / js-sdk counterpart | +| --- | --- | --- | +| `RelayKit/Call/LiveKitCredentialService.swift` | Discover SFU URL, request OpenID token, exchange for LiveKit JWT | `element-call/src/livekit/openIDSFU.ts` | +| `RelayKit/Call/CallEncryptionService.swift` | Send `m.call.member` state event, derive HKDF keys, parse other peers from room state | `matrix-js-sdk/src/matrixrtc/MembershipManager.ts` + `RTCEncryptionManager.ts` | +| `RelayKit/Call/CallWidgetBridge.swift` | Speak the Widget API directly to the SDK's `WidgetDriver` to deliver Olm-encrypted to-device key payloads | `matrix-js-sdk/src/matrixrtc/ToDeviceKeyTransport.ts` (with SDK's `WidgetDriver` underneath) | +| `RelayKit/Call/CallViewModel.swift` | Orchestrate connect/disconnect sequencing, key install ordering, heartbeat | `matrix-js-sdk/src/matrixrtc/MatrixRTCSession.ts` | +| `RelayKit/Call/LiveKitLogBridge.swift` | Bridge LiveKit SDK logs into OSLog | (none — Element Call uses pino) | + +## Per-function deviations + +### `LiveKitCredentialService.fetchLiveKitTokenV2` + +Lines 178–205 in `LiveKitCredentialService.swift`. Reference: `getLiveunitJWTWithDelayDelegation` in `openIDSFU.ts`. + +| Field | Relay sends | Reference sends | Confirmed required by `lk-jwt-service`? | +| --- | --- | --- | --- | +| `room_id` | ✓ | ✓ | yes (`SFURequest.Validate()` in `requests.go`) | +| `slot_id` | **missing** | `"m.call#ROOM"` | **yes** (returns 400 `M_BAD_JSON` if missing) | +| `openid_token` | ✓ | ✓ | n/a (validated server-side) | +| `member.id` | `":"` | `memberId` (a UUID generated at membership creation) | n/a (passed through to identity hash) | +| `member.claimed_user_id` | ✓ | ✓ | n/a | +| `member.claimed_device_id` | ✓ | ✓ | n/a | +| `delay_id` / `delay_timeout` / `delay_cs_api_url` | not sent | optionally sent if configured | optional | + +**Impact**: Missing `slot_id` causes v2 to 400 every time. Relay's `try?` swallows the failure and silently falls through to legacy `/sfu/get`. **Tracked as Item 1.** + +**Secondary**: `member.id = ":"` differs from Element Call's UUID. The lk-jwt-service hashes `[matrixID, claimedDeviceID, memberID]` into the SFU identity. Different `member.id` → different pseudonymous identity → peers can't agree on routing. Only matters once v2 is reachable. **Tracked as Item 2.** + +### `LiveKitCredentialService.fetchLiveKitTokenLegacy` + +Lines 207–230. Reference: `getLiveunitJWT` in `openIDSFU.ts`. + +| Field | Relay sends | Reference sends | +| --- | --- | --- | +| `room` | ✓ | ✓ | +| `openid_token` | ✓ | ✓ | +| `device_id` | ✓ | ✓ | +| delay parts | not sent | optional | + +Matches. ✓ + +### `LiveKitCredentialService.discoverSFUURL` + +Lines 93–141. + +| Source | Relay tries | Reference tries | +| --- | --- | --- | +| Transports endpoint | `/_matrix/client/unstable/org.matrix.msc4143/rtc/transports` | MSC4195 says stable `/v1/rtc/transports`. Most servers implement neither yet. | +| `.well-known` | `org.matrix.msc4143.rtc_foci` key | Same | +| Existing peers' `m.call.member` `foci_preferred[0]` | **not consulted** | matrix-js-sdk uses this as the third fallback | + +**Impact**: On a homeserver with no `.well-known` configured, if there's already an active call with a SFU negotiated, Relay throws `sfuURLNotFound` instead of using the active SFU. **Tracked as Item 3.** + +### `LiveKitCredentialService.fetchLiveKitToken` (fallback logic) + +Lines 166–176. + +Relay: try v2 inside `try?`, fall back to legacy on *any* error. +Reference: try v2, fall back to legacy on HTTP 404 specifically; bubble up other errors. + +**Impact**: A v2 endpoint returning 5xx, 401, or our 400-due-to-missing-`slot_id` all silently route to legacy. The user sees `tokenExchangeFailed` with no detail. **Tracked as Item 4.** + +### `CallEncryptionService.sendCallMemberEvent` + +Lines 80–135. Reference: `SessionMembershipData` in `matrix-js-sdk/src/matrixrtc/membershipData/session.ts`. + +| Field | Relay value | Reference shape | +| --- | --- | --- | +| `application` | `"m.call"` | string | +| `call_id` | `""` | string (may be empty) | +| `created_ts` | `Int64(Date.now * 1000)` | optional number | +| `device_id` | ✓ | string | +| `expires` | `14400000` (4h) | optional, default 4h | +| `focus_active.type` | `"livekit"` | `"livekit"` | +| `focus_active.focus_selection` | `"oldest_membership"` | `"oldest_membership"` \| `"multi_sfu"` | +| `foci_preferred[].type` | `"livekit"` | `"livekit"` | +| `foci_preferred[].livekit_service_url` | ✓ | string | +| `foci_preferred[].livekit_alias` | `roomID` | string | +| `m.call.intent` | `"video"` | optional | +| `membershipID` | UUID | optional | +| `scope` | `"m.room"` | optional `"m.room"` \| `"m.user"` | + +Matches. ✓ + +State key: `___m.call`. Matches Element X's per-device convention. ✓ + +### `CallEncryptionService.fetchCallTargets` + +Lines 197–235. + +| Concern | Detail | +| --- | --- | +| Uses raw REST `/rooms/{id}/state` rather than SDK room state | Stale-cache hazard, and `homeserver` field may differ from the delegated client URL for some accounts. | +| State-key parser expects `___m.call` exactly | Filters out any peer using just `` or a different per-device key shape. | + +**Tracked as Item 5.** + +### `CallWidgetBridge.handleIncomingToDevice` (key routing) + +Lines 554–634. + +Routes inbound keys to `participantId` (the LiveKit-side identity our cryptor uses) by trying: + +1. `":"` +2. `":"` +3. `member.id` (the membership UUID) +4. `sender` alone + +The comment in code asserts Element Call connects to LiveKit with identity `@user:server:device`. **This is only true on the legacy path.** On v2 the identity is `unpadded_base64(sha256(canonical_json([matrixID, claimedDeviceID, memberID])))`. The legacy assumption is hardcoded into all four entries above. + +**Tracked as Item 2.** Fix requires capturing the JWT-side identity (from the JWT `sub` claim, or from `room.localParticipant.identity` after connect) and using it as the routing key when on v2. + +### `CallViewModel.connect` (lines 282–303) + +Local key install uses: +```swift +let localIdentity = "\(encryptionService.userID):\(encryptionService.deviceID)" +``` + +Comment cites `matrix-js-sdk CallMembership.ts line 101` — accurate for **legacy**. Same v2 mismatch. + +There's already a runtime warning on line 293: `"LiveKit identity X != matrix identity Y — frame encryption may misroute"`. This currently *only logs* the mismatch without acting on it. Item 2 should make us key the cryptor under whichever identity LiveKit actually assigned. + +### `CallViewModel.redistributeKey` (lines 590–617) + +Splits the LiveKit participant identity by `:` to reconstruct `(userId, deviceId)`. Hard-fails on v2 hashes (no colons → `components.count < 3` → log + return). **Tracked as Item 2.** + +### `CallViewModel.connect` — runtime instrumentation gap + +After `state = .connected` (line 391), the Activity Log has **no further events** until `disconnect()`. The LiveKit `RoomDelegate` (`Delegate` inner class in this file) handles `participantDidJoin`, `participantDidLeave`, `didPublishTrack`, etc., but nothing flows to the Activity Log. Real-world failure reports for "connected but no media" show traces ending at `Connected to call` with nothing actionable after. + +**Tracked as Item 0** (new — added after reviewing user `97853C31` activity log on 2026-06-13). + +## What this file is NOT + +- Not user-facing — see `docs/troubleshooting-calls.md` for that. +- Not exhaustive — only documents deviations we've confirmed against real source code, real specs, or real user traces. If you find a new deviation that matches a user report, add it here with a citation. +- Not a roadmap — the task list on the `rtc-element-call-alignment` branch tracks priority and ordering. From 9a1dd95002447e2c47798a55bc86e3dd8ca3ef83 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sat, 13 Jun 2026 19:18:24 -0400 Subject: [PATCH 03/16] Instrument post-connect call lifecycle in Activity Log User trace 97853C31 showed an Activity Log that ended at "Connected to call" with no further events: the trace stops the moment Relay finished its setup checklist, even though the user's call kept running (and failing) for minutes afterward. Without post-connect events, every "connected but no media" or "call dropped silently" report is undiagnosable from the JSON export alone. Wire the LiveKit RoomDelegate callbacks that produce diagnostic signals through to the Activity Log: - didUpdateConnectionState .disconnected (with previous state) - didFailToConnectWithError (SFU rejected the initial connect) - didDisconnectWithError (mid-call drop with cause) - didSubscribeTrack (now lands in the JSON, not just os_log) - didFailToSubscribeTrack (firewall/NAT/codec failures) - local didPublishTrack (proves our own media went out) - didUpdateE2EEState (per-track cryptor failures, encrypted rooms) The describe(_:) helper provides stable labels for LiveKit ConnectionState values that won't drift if the enum gains cases. Assisted-By: Claude Sonnet 4.7 --- RelayKit/Call/CallViewModel.swift | 149 +++++++++++++++++++++++++++++- 1 file changed, 144 insertions(+), 5 deletions(-) diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index a01e0d0..632590c 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -699,6 +699,13 @@ public final class CallViewModel: CallViewModelProtocol { if viewModel.state == .connected { viewModel.state = .disconnected } + logger.info("[RTC]LiveKit connection state: disconnected") + viewModel.activityLog?.log( + category: .call, severity: .warning, source: "CallViewModel", + summary: "LiveKit connection disconnected", + detail: "Previous state: \(Self.describe(oldValue))", + roomId: viewModel.roomID + ) case .reconnecting: logger.info("[RTC]Reconnecting…") viewModel.activityLog?.log( @@ -712,6 +719,63 @@ public final class CallViewModel: CallViewModelProtocol { } } + /// Fires when the SFU rejects the initial connection (auth, transport, + /// codec negotiation). Distinct from `didDisconnectWithError`, which + /// fires after a successful connect terminates. + func room(_ room: LiveKit.Room, didFailToConnectWithError error: LiveKitError?) { + let description = error?.localizedDescription ?? "no error reported" + Task { @MainActor [weak viewModel] in + guard let viewModel else { return } + logger.error("[RTC]LiveKit didFailToConnect: \(description, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "LiveKit connection rejected", + detail: description, + roomId: viewModel.roomID + ) + } + } + + /// Fires when an already-connected room disconnects, with an optional + /// error explaining why. A `nil` error indicates a clean local + /// disconnect; a non-nil error is the most useful signal we get when + /// a call drops mid-session. + func room(_ room: LiveKit.Room, didDisconnectWithError error: LiveKitError?) { + let description = error?.localizedDescription + Task { @MainActor [weak viewModel] in + guard let viewModel else { return } + if let description { + logger.error("[RTC]LiveKit didDisconnect: \(description, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "LiveKit connection lost", + detail: description, + roomId: viewModel.roomID + ) + } else { + logger.info("[RTC]LiveKit didDisconnect (clean)") + viewModel.activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "LiveKit disconnected cleanly", + roomId: viewModel.roomID + ) + } + } + } + + /// Human-readable label for a `LiveKit.ConnectionState` enum value. + /// Lives on the delegate so the activity-log detail strings stay + /// stable across LiveKit SDK updates. + nonisolated private static func describe(_ state: LiveKit.ConnectionState) -> String { + switch state { + case .connected: "connected" + case .disconnected: "disconnected" + case .reconnecting: "reconnecting" + case .connecting: "connecting" + case .disconnecting: "disconnecting" + } + } + func room(_ room: LiveKit.Room, participantDidConnect participant: RemoteParticipant) { Task { @MainActor [weak viewModel] in guard let viewModel else { return } @@ -733,11 +797,38 @@ public final class CallViewModel: CallViewModelProtocol { func room(_ room: LiveKit.Room, participant: RemoteParticipant, didSubscribeTrack publication: RemoteTrackPublication) { observeDimensions(of: publication) + let identityStr = participant.identity?.stringValue ?? "(none)" + let kind = publication.kind.rawValue + let sid = publication.sid Task { @MainActor [weak viewModel] in - let identityStr = participant.identity?.stringValue ?? "(none)" - let kind = publication.kind.rawValue - logger.info("[RTC]Subscribed to \(kind, privacy: .public) track from identity=\(identityStr, privacy: .public) trackSid=\(publication.sid, privacy: .public)") - viewModel?.syncParticipants(trackChanged: true) + guard let viewModel else { return } + logger.info("[RTC]Subscribed to \(kind, privacy: .public) track from identity=\(identityStr, privacy: .public) trackSid=\(sid, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Subscribed to remote \(kind) track", + detail: "Identity: \(identityStr), trackSid: \(sid)", + roomId: viewModel.roomID + ) + viewModel.syncParticipants(trackChanged: true) + } + } + + /// Fires when LiveKit can't subscribe to a remote track — the most + /// common cause is firewall / NAT blocking the media path while + /// signalling completes. This is the strongest signal for the + /// "connected, no media" failure shape. + func room(_ room: LiveKit.Room, participant: RemoteParticipant, didFailToSubscribeTrackWithSid trackSid: Track.Sid, error: LiveKitError) { + let identityStr = participant.identity?.stringValue ?? "(none)" + let description = error.localizedDescription + Task { @MainActor [weak viewModel] in + guard let viewModel else { return } + logger.error("[RTC]Failed to subscribe to track \(trackSid, privacy: .public) from \(identityStr, privacy: .public): \(description, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "Failed to subscribe to remote track", + detail: "Identity: \(identityStr), trackSid: \(trackSid), error: \(description)", + roomId: viewModel.roomID + ) } } @@ -765,8 +856,18 @@ public final class CallViewModel: CallViewModelProtocol { func room(_ room: LiveKit.Room, localParticipant: LocalParticipant, didPublishTrack publication: LocalTrackPublication) { observeDimensions(of: publication) + let kind = publication.kind.rawValue + let sid = publication.sid Task { @MainActor [weak viewModel] in - viewModel?.videoTrackRevision += 1 + guard let viewModel else { return } + logger.info("[RTC]Published local \(kind, privacy: .public) track sid=\(sid, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Published local \(kind) track", + detail: "trackSid: \(sid)", + roomId: viewModel.roomID + ) + viewModel.videoTrackRevision += 1 } } @@ -776,6 +877,44 @@ public final class CallViewModel: CallViewModelProtocol { } } + /// Per-track LiveKit E2EE state transitions. Only fires when E2EE is + /// enabled on the room. Normal lifecycle is `.new` → `.ok`. Any other + /// terminal state (`.missing_key`, `.encryption_failed`, + /// `.decryption_failed`, `.internal_error`) is the canonical signal + /// for "connected but no media" on encrypted rooms — surface them + /// loudly so users on Element-Call interop calls can see the + /// cryptor failing without having to read os_log. + func room(_ room: LiveKit.Room, trackPublication: TrackPublication, didUpdateE2EEState state: E2EEState) { + let stateLabel = state.toString() + let trackSid = trackPublication.sid + let trackKind = trackPublication.kind.rawValue + Task { @MainActor [weak viewModel] in + guard let viewModel else { return } + switch state { + case .ok, .new, .key_ratcheted: + return + case .missing_key: + logger.warning("[RTC]E2EE state=missing_key on \(trackKind, privacy: .public) sid=\(trackSid, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .warning, source: "CallViewModel", + summary: "E2EE missing key for \(trackKind) track", + detail: "trackSid: \(trackSid). Remote peer's encryption key hasn't been received yet or was rejected.", + roomId: viewModel.roomID + ) + case .encryption_failed, .decryption_failed, .internal_error: + logger.error("[RTC]E2EE state=\(stateLabel, privacy: .public) on \(trackKind, privacy: .public) sid=\(trackSid, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "E2EE failure on \(trackKind) track", + detail: "State: \(stateLabel), trackSid: \(trackSid)", + roomId: viewModel.roomID + ) + @unknown default: + return + } + } + } + // First-frame indicator: dimensions become valid here, so bump // videoTrackRevision so aspect-ratio observers re-read. func room(_ room: LiveKit.Room, participant: RemoteParticipant, trackPublication: RemoteTrackPublication, didUpdateStreamState streamState: StreamState) { From fea3b1f641c5c4b1d16b39d4aefc4d32f102bc43 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sat, 13 Jun 2026 19:21:05 -0400 Subject: [PATCH 04/16] Surface v2 /get_token error responses on credential fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous fall-back path used `try?` to swallow every v2 error and silently retry against legacy `/sfu/get`. That hid two diagnostic signals: - The actual reason v2 rejected us (Matrix `errcode` + `error`). - That the fall-back happened at all. Replace `try?` with a `do/catch` that: - Parses Matrix-style `{errcode, error}` envelopes from the response body so users see `M_BAD_JSON: The request body is missing 'room_id' or 'slot_id'` instead of generic `tokenExchangeFailed`. - Logs the v2 failure to both os_log and the Activity Log before trying legacy, so the silent fall-back is visible after the fact. - Carries the structured detail through a new `LiveKitCredentialError.tokenExchangeRejected` case that surfaces status, errcode, message, and which endpoint failed. This unblocks self-diagnosis for the upcoming `slot_id` and v2 identity fixes — once those land, this logging will confirm v2 is healthy without requiring users to hand-curate os_log traces. Assisted-By: Claude Sonnet 4.7 --- RelayKit/Call/LiveKitCredentialService.swift | 91 +++++++++++++++++--- 1 file changed, 81 insertions(+), 10 deletions(-) diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index 0e8b1cd..f33e18d 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -168,13 +168,44 @@ struct LiveKitCredentialService { roomID: String, openIDToken: OpenIDTokenPayload ) async throws -> (url: String, token: String) { - // Try the v2 endpoint first, fall back to legacy - if let result = try? await fetchLiveKitTokenV2(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) { - return result + // Try v2 first. If it fails, log the actual server response (status, + // Matrix errcode, message) before falling back to legacy — so users + // on v2-only deployments see actionable detail rather than the + // legacy endpoint's generic "tokenExchangeFailed". + do { + return try await fetchLiveKitTokenV2( + sfuURL: sfuURL, + roomID: roomID, + openIDToken: openIDToken + ) + } catch let v2Error { + logV2Failure(v2Error, sfuURL: sfuURL) } return try await fetchLiveKitTokenLegacy(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) } + /// Logs a v2 `/get_token` failure to os_log and the activity log so that + /// the silent fall-back to legacy is at least visible after the fact. + /// Format-aware: a `LiveKitCredentialError.tokenExchangeRejected` carries + /// structured detail; anything else falls through to its + /// `localizedDescription`. + private func logV2Failure(_ error: Error, sfuURL: String) { + let detail: String + if case let LiveKitCredentialError.tokenExchangeRejected(status, errcode, message, _) = error { + let errcodePart = errcode.map { " \($0)" } ?? "" + let messagePart = message.map { ": \($0)" } ?? "" + detail = "HTTP \(status)\(errcodePart)\(messagePart)" + } else { + detail = error.localizedDescription + } + logger.warning("[RTC]/get_token failed, falling back to /sfu/get — \(detail, privacy: .public)") + activityLog?.log( + category: .call, severity: .warning, source: "LiveKitCredentialService", + summary: "v2 /get_token rejected; falling back to legacy", + detail: detail + ) + } + private func fetchLiveKitTokenV2( sfuURL: String, roomID: String, @@ -196,8 +227,17 @@ struct LiveKitCredentialService { request.httpBody = try JSONEncoder().encode(body) let (data, response) = try await URLSession.shared.data(for: request) - guard let http = response as? HTTPURLResponse, http.statusCode == 200 else { - throw LiveKitCredentialError.tokenExchangeFailed + guard let http = response as? HTTPURLResponse else { + throw LiveKitCredentialError.serverError + } + guard http.statusCode == 200 else { + let (errcode, message) = Self.parseMatrixError(data) + throw LiveKitCredentialError.tokenExchangeRejected( + status: http.statusCode, + errcode: errcode, + message: message, + endpoint: "/get_token" + ) } let decoded = try JSONDecoder().decode(LiveKitTokenResponse.self, from: data) logger.info("[RTC]LiveKit credentials obtained via /get_token") @@ -221,13 +261,38 @@ struct LiveKitCredentialService { request.httpBody = try JSONEncoder().encode(body) let (data, response) = try await URLSession.shared.data(for: request) - guard let http = response as? HTTPURLResponse, http.statusCode == 200 else { - throw LiveKitCredentialError.tokenExchangeFailed + guard let http = response as? HTTPURLResponse else { + throw LiveKitCredentialError.serverError + } + guard http.statusCode == 200 else { + let (errcode, message) = Self.parseMatrixError(data) + throw LiveKitCredentialError.tokenExchangeRejected( + status: http.statusCode, + errcode: errcode, + message: message, + endpoint: "/sfu/get" + ) } let decoded = try JSONDecoder().decode(LiveKitTokenResponse.self, from: data) logger.info("[RTC]LiveKit credentials obtained via legacy /sfu/get") return (decoded.url, decoded.jwt) } + + /// Extracts `(errcode, error)` from a Matrix-style error response body. + /// Used to turn lk-jwt-service responses like + /// `{"errcode":"M_BAD_JSON","error":"The request body is missing..."}` + /// into a single human-readable line. Returns `(nil, nil)` if the body + /// isn't a Matrix error envelope. + private static func parseMatrixError(_ data: Data) -> (errcode: String?, message: String?) { + struct MatrixError: Decodable { + let errcode: String? + let error: String? + } + guard let parsed = try? JSONDecoder().decode(MatrixError.self, from: data) else { + return (nil, nil) + } + return (parsed.errcode, parsed.error) + } } // MARK: - Errors @@ -237,7 +302,11 @@ enum LiveKitCredentialError: LocalizedError { case invalidURL case serverError case openIDTokenFailed - case tokenExchangeFailed + /// The LiveKit JWT service rejected our request. Carries the HTTP + /// status, Matrix `errcode`/`error` if present, and which endpoint + /// produced the failure (`/get_token` or `/sfu/get`) so a user + /// support trace can identify both the path taken and the reason. + case tokenExchangeRejected(status: Int, errcode: String?, message: String?, endpoint: String) var errorDescription: String? { switch self { @@ -250,8 +319,10 @@ enum LiveKitCredentialError: LocalizedError { return "The homeserver returned an error while fetching call credentials." case .openIDTokenFailed: return "Failed to obtain an OpenID token from the homeserver." - case .tokenExchangeFailed: - return "The call server rejected the credential exchange." + case .tokenExchangeRejected(let status, let errcode, let message, let endpoint): + let errcodePart = errcode.map { " \($0)" } ?? "" + let messagePart = message.map { ": \($0)" } ?? "" + return "Call server rejected \(endpoint) with HTTP \(status)\(errcodePart)\(messagePart)" } } } From 7d3f7f8fc71f5ff844180a0512e985443ae5e3b9 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sat, 13 Jun 2026 19:37:36 -0400 Subject: [PATCH 05/16] Add slot_id and v2 LiveKit identity routing for /get_token MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two coupled changes that must ship together. Shipping only the first moves users from "fails to connect" to "connects but no media." Item 1 — slot_id ================ The v2 endpoint in lk-jwt-service requires `slot_id` in the request body; `SFURequest.Validate()` returns HTTP 400 M_BAD_JSON ("missing 'room_id' or 'slot_id'") if absent. Relay never sent it, so the v2 attempt failed and we silently fell back to /sfu/get. Hardcode "m.call#ROOM" to match Element Call's `getLiveunitJWTWithDelayDelegation`. Item 2 — v2 LiveKit identity routing ==================================== On v2, lk-jwt-service issues a JWT whose `sub` claim is `unpaddedBase64(sha256(json_marshal([matrixID, claimedDeviceID, memberID])))` (per `helper.go::LiveKitIdentityFor`). LiveKit uses that as the participant identity. On legacy, it's `::`. The frame cryptor routes frames to remote peers' decoders by exact string match on identity, so keying the cryptor under the legacy shape silently breaks every frame on v2. Three sites needed updating: - `CallEncryptionService.liveKitIdentity(matrixID:claimedDeviceID: memberID:)` ports the lk-jwt-service algorithm. Inputs are all ASCII (Matrix IDs, device IDs, UUIDs), so JSONSerialization is byte- identical to Go's `json.Marshal` and the resulting hash matches. - `CallViewModel.connect` now keys the local cryptor under `room.localParticipant.identity?.stringValue` instead of `:`. That's the JWT sub claim regardless of v1/v2. If LiveKit hasn't assigned an identity (shouldn't happen), the connect now fails fast with `CallViewModelError. missingLocalParticipantIdentity` rather than silently misrouting. - `CallWidgetBridge.handleIncomingToDevice` registers each inbound encryption key under every plausible LiveKit identity for that peer — both the legacy `:` and the v2 hash. The cryptor ring stores per-(participantId, index) entries and matches by identity, so registering both is safe and works against peers regardless of which endpoint they took. - `CallViewModel.redistributeKey` previously parsed the LiveKit identity by `:` to recover (userId, deviceId). On v2 the identity has no colons and the parse fails, so new peers never got our key. Drop the parse entirely; re-fetch `m.call.member` state and broadcast to all current targets. Matches Element Call's `RTCEncryptionManager` behaviour on membership change. Assisted-By: Claude Sonnet 4.7 --- RelayKit/Call/CallEncryptionService.swift | 36 ++++++++ RelayKit/Call/CallViewModel.swift | 93 +++++++++++++------- RelayKit/Call/CallWidgetBridge.swift | 68 ++++++++------ RelayKit/Call/LiveKitCredentialService.swift | 8 ++ 4 files changed, 149 insertions(+), 56 deletions(-) diff --git a/RelayKit/Call/CallEncryptionService.swift b/RelayKit/Call/CallEncryptionService.swift index 0201e61..de7dced 100644 --- a/RelayKit/Call/CallEncryptionService.swift +++ b/RelayKit/Call/CallEncryptionService.swift @@ -244,6 +244,42 @@ struct CallEncryptionService { return Data(bytes) } + // MARK: - LiveKit Identity (MSC4195) + + /// Reproduces lk-jwt-service's `LiveKitIdentityFor` in Swift so we can + /// route frame-cryptor keys to the same participant identity that the + /// JWT service assigned when it issued the access token. + /// + /// On the v2 (`/get_token`) path the LiveKit participant identity is + /// the unpadded-base64 SHA-256 hash of the JSON serialization of + /// `[matrixID, claimedDeviceID, memberID]`; keying our cryptor under + /// `:` (the legacy shape) silently misroutes every + /// frame on v2-only deployments. + /// + /// Inputs are all ASCII (Matrix IDs, device IDs, UUIDs), so Swift's + /// `JSONSerialization` produces byte-identical output to Go's + /// `json.Marshal` for the same array. Reference: + /// `lk-jwt-service/helper.go::LiveKitIdentityFor`. + static func liveKitIdentity( + matrixID: String, + claimedDeviceID: String, + memberID: String + ) -> String { + let parts: [String] = [matrixID, claimedDeviceID, memberID] + guard let jsonData = try? JSONSerialization.data( + withJSONObject: parts, + options: [] + ) else { + return "" + } + let digest = SHA256.hash(data: jsonData) + // SHA-256 outputs 32 bytes; standard base64 = 44 chars with exactly + // one '=' of padding. Strip it to match Go's `unpaddedBase64`. + return Data(digest) + .base64EncodedString() + .replacing("=", with: "") + } + // MARK: - Key Provider Setup /// Builds a `BaseKeyProvider` whose internal `LKRTCFrameCryptorKeyProvider` diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index 632590c..0fbe8f1 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -279,28 +279,36 @@ public final class CallViewModel: CallViewModelProtocol { // frames is encrypted with nothing the remote peer can decrypt — // and Element-X's video decoder stalls on that first undecodable // frame, resulting in perpetual black video. - if self.isE2eeEnabled, let keyProvider = self.keyProvider, let encryptionService { + // + // Key under the identity LiveKit assigned us. This was the JWT + // `sub` claim: `::` on the legacy + // `/sfu/get` path, or the unpadded-base64 SHA-256 hash of + // `[user, device, member_id]` on v2 `/get_token`. The cryptor + // routes frames to remote peers' decoders using the *same* + // identity string LiveKit hands the SFU, so registering under + // the matrix-shaped `:` silently misroutes + // outbound frames on v2. + if self.isE2eeEnabled, let keyProvider = self.keyProvider { let key = CallEncryptionService.generateKey() self.localEncryptionKey = key - // Legacy `m.call.member` rtcBackendIdentity is always - // `${sender}:${device_id}` (matrix-js-sdk CallMembership.ts - // line 101). This is what remote peers route our frames under, - // so our local sender cryptor MUST be keyed under the same - // byte sequence — do not trust `localParticipantID` (the - // identity LiveKit assigns from the SFU JWT), since a - // mismatched JWT identity would silently break decrypt. - let localIdentity = "\(encryptionService.userID):\(encryptionService.deviceID)" - if let livekitIdentity = self.localParticipantID, livekitIdentity != localIdentity { - logger.warning("[RTC]LiveKit identity \(livekitIdentity, privacy: .public) != matrix identity \(localIdentity, privacy: .public) — frame encryption may misroute") - } let keyIndex = self.localKeyIndex + guard let livekitIdentity = self.localParticipantID, !livekitIdentity.isEmpty else { + logger.error("[RTC]LiveKit local participant has no identity — cannot register cryptor key") + activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "LiveKit assigned no local identity", + detail: "Cannot install local E2EE key; outbound frames will be undecodable.", + roomId: roomID + ) + throw CallViewModelError.missingLocalParticipantIdentity + } CallEncryptionService.setRawKey( key, on: keyProvider, - participantId: localIdentity, + participantId: livekitIdentity, index: Int32(keyIndex) ) - logger.info("[RTC]Local E2EE key set (index \(keyIndex)) under participantId=\(localIdentity, privacy: .public) before camera publish") + logger.info("[RTC]Local E2EE key set (index \(keyIndex)) under participantId=\(livekitIdentity, privacy: .public) before camera publish") } // Set up MatrixRTC signaling and distribute the key **before** @@ -584,34 +592,42 @@ public final class CallViewModel: CallViewModelProtocol { // MARK: - E2EE Key Redistribution - /// Re-sends the local encryption key to a newly joined participant so they - /// can decrypt our media. Routes through the widget bridge so the SDK - /// Olm-encrypts the to-device payload. + /// Re-sends the local encryption key to all current call members so a + /// peer that just joined LiveKit can decrypt our media. + /// + /// Previously this method parsed the LiveKit participant identity + /// (`@user:server:device`) to recover a single user/device target. On + /// v2 the identity is an opaque base64 hash, so the parse fails and the + /// new peer never receives our key. Re-fetching `m.call.member` state + /// and broadcasting to everyone matches Element Call's + /// `RTCEncryptionManager` behaviour on membership changes — slightly + /// inefficient (existing peers receive our key twice) but correct on + /// both legacy and v2 paths. + /// + /// The `participantIdentity` parameter is now only used for logging. fileprivate func redistributeKey(to participantIdentity: String) { - guard let key = localEncryptionKey, let bridge = widgetBridge else { return } - - // Parse "user:device" from the LiveKit identity - // (format: `@userId:server:deviceId`). Element Call uses identities - // like `@user:server:DEVICEID`. - let components = participantIdentity.components(separatedBy: ":") - guard components.count >= 3 else { - logger.warning("[RTC]Cannot parse participant identity for key redistribution: \(participantIdentity, privacy: .private)") + guard let key = localEncryptionKey, + let bridge = widgetBridge, + let encryptionService else { return } - let userId = components[0] + ":" + components[1] - let deviceId = components.dropFirst(2).joined(separator: ":") let index = localKeyIndex Task { + let targets = await encryptionService.fetchCallTargets() + guard !targets.isEmpty else { + logger.info("[RTC]No call targets to redistribute key to (new participant \(participantIdentity, privacy: .public))") + return + } do { try await bridge.sendEncryptionKey( key, keyIndex: index, - toMembers: [userId: [deviceId]] + toMembers: targets ) - logger.info("[RTC]Redistributed key to \(participantIdentity, privacy: .private)") + logger.info("[RTC]Redistributed key to \(targets.count) user(s) (triggered by new participant \(participantIdentity, privacy: .public))") } catch { - logger.warning("[RTC]Key redistribution failed for \(participantIdentity, privacy: .private): \(error.localizedDescription, privacy: .private)") + logger.warning("[RTC]Key redistribution failed for \(participantIdentity, privacy: .public): \(error.localizedDescription, privacy: .public)") } } } @@ -956,3 +972,20 @@ public final class CallViewModel: CallViewModelProtocol { } } } + +// MARK: - Errors + +/// Errors raised by `CallViewModel.connect`. Only the cases that surface to +/// the user via the error reporter or the call sheet need a +/// `LocalizedError`; internal-only failures can be plain `Swift.Error`. +enum CallViewModelError: LocalizedError { + case missingLocalParticipantIdentity + + var errorDescription: String? { + switch self { + case .missingLocalParticipantIdentity: + return "LiveKit didn't assign an identity to the local participant; " + + "the call can't be encrypted. Try reconnecting." + } + } +} diff --git a/RelayKit/Call/CallWidgetBridge.swift b/RelayKit/Call/CallWidgetBridge.swift index 5a36d58..382fe41 100644 --- a/RelayKit/Call/CallWidgetBridge.swift +++ b/RelayKit/Call/CallWidgetBridge.swift @@ -584,23 +584,36 @@ public final class CallWidgetBridge: @unchecked Sendable { let topDeviceId = (content["device_id"] as? String) ?? "" let deviceId = !claimedDeviceId.isEmpty ? claimedDeviceId : topDeviceId - // LiveKit participant identity lookup order. Element Call connects to - // the SFU with identity `@user:server:deviceId` (confirmed in the - // MatrixRTC JWT grant), so that's what we need to key on for the - // LKRTCFrameCryptorKeyProvider to route the key to the right - // participant's decoder. + // Register the inbound key under every plausible LiveKit + // participant identity for this peer. Which shape LiveKit assigned + // depends on which credential path (legacy or v2) the peer took + // when they joined the SFU — we don't necessarily know that from + // the to-device payload alone, so register under every candidate + // and let the cryptor pick the one whose participantId matches the + // SFU-assigned identity. // - // `member.id` is the MSC4143 per-membership UUID — an *event*-level - // identifier, not a LiveKit participant identity. It only enters the - // fallback chain so older peers that somehow omit the device id still - // get routed. - let participantIdentity: String + // - Legacy (`/sfu/get`): identity = `:`. + // - v2 (`/get_token`): identity = unpadded-base64 SHA-256 of + // `[sender, claimed_device_id, member.id]` per + // `lk-jwt-service/helper.go::LiveKitIdentityFor`. + var participantIdentities: [String] = [] if !deviceId.isEmpty { - participantIdentity = "\(sender):\(deviceId)" - } else if !memberId.isEmpty { - participantIdentity = memberId - } else { - participantIdentity = sender + participantIdentities.append("\(sender):\(deviceId)") + } + if !claimedDeviceId.isEmpty && !memberId.isEmpty { + let v2Identity = CallEncryptionService.liveKitIdentity( + matrixID: sender, + claimedDeviceID: claimedDeviceId, + memberID: memberId + ) + if !v2Identity.isEmpty { + participantIdentities.append(v2Identity) + } + } + if participantIdentities.isEmpty { + // Last-resort fallback — older peers that omit both device_id + // and member.id. Element Call's parser does the same. + participantIdentities.append(memberId.isEmpty ? sender : memberId) } for entry in keyEntries { @@ -609,24 +622,27 @@ public final class CallWidgetBridge: @unchecked Sendable { let keyData = Data(base64Encoded: base64Key) else { continue } - CallEncryptionService.setRawKey( - keyData, - on: keyProvider, - participantId: participantIdentity, - index: Int32(index) - ) + for participantIdentity in participantIdentities { + CallEncryptionService.setRawKey( + keyData, + on: keyProvider, + participantId: participantIdentity, + index: Int32(index) + ) + } + let identitiesJoined = participantIdentities.joined(separator: ", ") // Log with `.public` so we can correlate the key routing - // identity (what we register the frame-decryption key under) + // identities (what we register the frame-decryption key under) // with the actual LiveKit participant identity (logged on - // connect) — if these do not match byte-for-byte, LiveKit will - // silently fail to decrypt this peer's frames. - logger.info("[RTC]Applied inbound key -> routed to LiveKit participantId=\(participantIdentity, privacy: .public) sender=\(sender, privacy: .public) device=\(deviceId, privacy: .public) member=\(memberId, privacy: .public) index=\(index)") + // connect) — if NONE of these matches byte-for-byte, LiveKit + // will silently fail to decrypt this peer's frames. + logger.info("[RTC]Applied inbound key -> routed to LiveKit participantId=[\(identitiesJoined, privacy: .public)] sender=\(sender, privacy: .public) device=\(deviceId, privacy: .public) member=\(memberId, privacy: .public) index=\(index)") Task { @MainActor [weak self] in guard let self else { return } self.activityLog?.log( category: .call, severity: .debug, source: "CallWidgetBridge", summary: "Received E2EE key from \(sender)", - detail: "Participant: \(participantIdentity), key index: \(index)", + detail: "Participants: [\(identitiesJoined)], key index: \(index)", roomId: self.roomId ) } diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index f33e18d..a2e00ae 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -221,6 +221,12 @@ struct LiveKitCredentialService { let body = GetTokenRequest( roomId: roomID, + // Element Call hardcodes "m.call#ROOM" for the application slot + // on the v2 endpoint. lk-jwt-service `SFURequest.Validate()` + // rejects requests where `slot_id` is empty with HTTP 400 + // M_BAD_JSON, which is what forced every previous Relay call + // to silently fall back to legacy `/sfu/get`. + slotId: "m.call#ROOM", openidToken: openIDToken, member: .init(id: "\(userID):\(deviceID)", claimedUserId: userID, claimedDeviceId: deviceID) ) @@ -372,6 +378,7 @@ struct OpenIDTokenPayload: Codable { private struct GetTokenRequest: Encodable { let roomId: String + let slotId: String let openidToken: OpenIDTokenPayload let member: Member struct Member: Encodable { @@ -386,6 +393,7 @@ private struct GetTokenRequest: Encodable { } enum CodingKeys: String, CodingKey { case roomId = "room_id" + case slotId = "slot_id" case openidToken = "openid_token" case member } From 8e7cd66f41ce34dc3790e907ce3a35a4b40054f8 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sun, 14 Jun 2026 07:07:49 -0400 Subject: [PATCH 06/16] Fall back to peer foci_preferred for SFU discovery When the homeserver advertises no MatrixRTC SFU (neither the unstable transports endpoint nor `.well-known org.matrix.msc4143.rtc_foci` is configured) but a call is already in progress in the room, walk the existing `m.call.member` state events and pick a peer's `foci_preferred[0].livekit_service_url`. Matches Element Call / matrix-js-sdk's third-fallback discovery behaviour. Previously Relay would throw `sfuURLNotFound` and refuse to join in this scenario, even though the SFU the existing participants were using was right there in room state. `discoverSFUURL` now takes the roomID so it can issue the `/rooms/{id}/state` request; the previous parameterless form is gone. Assisted-By: Claude Sonnet 4.7 --- RelayKit/Call/LiveKitCredentialService.swift | 57 +++++++++++++++++++- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index a2e00ae..e275f4e 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -61,7 +61,7 @@ struct LiveKitCredentialService { roomId: roomID ) do { - let sfuURL = try await discoverSFUURL() + let sfuURL = try await discoverSFUURL(roomID: roomID) logger.info("[RTC]SFU URL discovered: \(sfuURL)") activityLog?.log( category: .call, severity: .debug, source: "LiveKitCredentialService", @@ -90,7 +90,7 @@ struct LiveKitCredentialService { // MARK: - Step 1: Discover SFU URL - private func discoverSFUURL() async throws -> String { + private func discoverSFUURL(roomID: String) async throws -> String { // Prefer the MSC4143 transports endpoint if let url = try? await fetchRTCTransportsURL() { return url @@ -99,6 +99,15 @@ struct LiveKitCredentialService { if let url = try? await fetchWellKnownSFUURL() { return url } + // Last resort: read another active call participant's + // `foci_preferred[0]` from `m.call.member` state. Joining an + // in-progress call on a homeserver without `.well-known` configured + // would otherwise fail with `sfuURLNotFound` even though the SFU is + // visible in room state. Matches Element Call / matrix-js-sdk + // discovery behaviour. + if let url = try? await fetchSFUFromCallMembers(roomID: roomID) { + return url + } throw LiveKitCredentialError.sfuURLNotFound } @@ -140,6 +149,50 @@ struct LiveKitCredentialService { return first.livekitServiceUrl } + /// Walks `m.call.member` state events on the room and returns the first + /// `foci_preferred[].livekit_service_url` advertised by a peer with + /// non-empty content. Lets a user join an in-progress call when their + /// homeserver doesn't expose `.well-known org.matrix.msc4143.rtc_foci` + /// — the SFU the existing participants are already using is right there + /// in room state. + private func fetchSFUFromCallMembers(roomID: String) async throws -> String { + let base = homeserver.trimmingCharacters(in: .init(charactersIn: "/")) + let encodedRoomID = roomID.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) ?? roomID + guard let url = URL(string: "\(base)/_matrix/client/v3/rooms/\(encodedRoomID)/state") else { + throw LiveKitCredentialError.invalidURL + } + var request = URLRequest(url: url) + request.setValue("Bearer \(accessToken)", forHTTPHeaderField: "Authorization") + + let (data, response) = try await URLSession.shared.data(for: request) + guard let http = response as? HTTPURLResponse, http.statusCode == 200 else { + throw LiveKitCredentialError.serverError + } + + guard let events = try JSONSerialization.jsonObject(with: data) as? [[String: Any]] else { + throw LiveKitCredentialError.sfuURLNotFound + } + + for event in events { + guard let type = event["type"] as? String, + type == "org.matrix.msc3401.call.member", + let content = event["content"] as? [String: Any], + !content.isEmpty, + let fociPreferred = content["foci_preferred"] as? [[String: Any]] + else { continue } + for focus in fociPreferred { + guard let focusType = focus["type"] as? String, + focusType == "livekit", + let serviceURL = focus["livekit_service_url"] as? String, + !serviceURL.isEmpty + else { continue } + logger.info("[RTC]Recovered SFU URL from existing call member state") + return serviceURL + } + } + throw LiveKitCredentialError.sfuURLNotFound + } + // MARK: - Step 2: Request OpenID Token private func requestOpenIDToken() async throws -> OpenIDTokenPayload { From 4bcb129154b554c3ff369bf9cb33764b6a9b569e Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sun, 14 Jun 2026 07:18:54 -0400 Subject: [PATCH 07/16] Source fetchCallTargets from SDK room state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the raw-REST `/rooms/{id}/state` walk with `RoomInfo.activeRoomCallParticipants` from the Rust SDK. The SDK list is user-level only (no device IDs), so each user's device list becomes `["*"]` — the to-device wildcard — and the SDK fans the Olm-encrypted key payload out to all of that user's devices. Matches `matrix-js-sdk/src/matrixrtc/ToDeviceKeyTransport.ts`. Some warmed-up Olm sessions go to devices that aren't in the call, but the AES key is per-call and only consumed by a LiveKit cryptor that expects it — so the extra sessions are wasted, not unsafe. Element Call accepts the same trade-off. Removes the delegated-homeserver URL risk and the bespoke state-key parser (`___m.call`) that filtered out any peer using a non-Element-X key shape. Assisted-By: Claude Sonnet 4.7 --- RelayKit/Call/CallEncryptionService.swift | 56 +++++++---------------- docs/internal/rtc-element-call-diff.md | 19 ++++---- 2 files changed, 27 insertions(+), 48 deletions(-) diff --git a/RelayKit/Call/CallEncryptionService.swift b/RelayKit/Call/CallEncryptionService.swift index de7dced..a900de0 100644 --- a/RelayKit/Call/CallEncryptionService.swift +++ b/RelayKit/Call/CallEncryptionService.swift @@ -187,51 +187,27 @@ struct CallEncryptionService { } /// Returns a `userId -> [deviceId]` map of *other* users currently in the - /// call, parsed from `org.matrix.msc3401.call.member` state events. + /// call, sourced from the SDK's `RoomInfo.activeRoomCallParticipants`. + /// + /// The SDK's call-membership view is user-level only — no device IDs — + /// so each user's device list is `["*"]` (the to-device wildcard) and + /// the SDK fans out the Olm-encrypted to-device payload to all of that + /// user's devices. Matches `matrix-js-sdk/src/matrixrtc/ + /// ToDeviceKeyTransport.ts`. Some of those devices won't be in the + /// call, but the AES key we're broadcasting is per-call and the receiver + /// only consumes it if their LiveKit cryptor expects it — so the extra + /// Olm sessions are wasted, not unsafe. /// - /// Element-X writes per-device call-member events with state key - /// `___m.call`. We walk the full room state, filter for - /// non-empty call-member content (empty content means the participant - /// has left), and extract `(userId, deviceId)` from the state key. /// Our own `userID` is excluded. func fetchCallTargets() async -> [String: [String]] { - let base = homeserver.trimmingCharacters(in: .init(charactersIn: "/")) - let encodedRoomID = roomID.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) ?? roomID - - guard let url = URL(string: "\(base)/_matrix/client/v3/rooms/\(encodedRoomID)/state") else { return [:] } + guard let sdkRoom else { return [:] } + guard let info = try? await sdkRoom.roomInfo() else { return [:] } - var request = URLRequest(url: url) - request.setValue("Bearer \(accessToken)", forHTTPHeaderField: "Authorization") - - guard let (data, response) = try? await URLSession.shared.data(for: request), - let http = response as? HTTPURLResponse, http.statusCode == 200, - let events = try? JSONSerialization.jsonObject(with: data) as? [[String: Any]] else { - return [:] + var targets: [String: [String]] = [:] + for participantUserID in info.activeRoomCallParticipants where participantUserID != self.userID { + targets[participantUserID] = ["*"] } - - var targets: [String: Set] = [:] - for event in events { - guard let type = event["type"] as? String, - type == Self.callMemberEventType, - let stateKey = event["state_key"] as? String, - let content = event["content"] as? [String: Any], - !content.isEmpty else { continue } - - // State key format: `___m.call` where userId is - // itself `@localpart:server.tld`. Strip the leading underscore - // and the trailing `_m.call` marker, then split on the *last* - // underscore to separate deviceId from userId. - guard stateKey.hasPrefix("_"), stateKey.hasSuffix("_m.call") else { continue } - let trimmed = String(stateKey.dropFirst().dropLast("_m.call".count)) - guard let lastUnderscore = trimmed.lastIndex(of: "_") else { continue } - let userId = String(trimmed[..__m.call`. Matches Element X's per-device convent ### `CallEncryptionService.fetchCallTargets` -Lines 197–235. - -| Concern | Detail | -| --- | --- | -| Uses raw REST `/rooms/{id}/state` rather than SDK room state | Stale-cache hazard, and `homeserver` field may differ from the delegated client URL for some accounts. | -| State-key parser expects `___m.call` exactly | Filters out any peer using just `` or a different per-device key shape. | - -**Tracked as Item 5.** +Sources call participants from `RoomInfo.activeRoomCallParticipants` and +broadcasts our AES key to all of each user's devices via the to-device +`"*"` wildcard. Matches Element Call's +`matrix-js-sdk/src/matrixrtc/ToDeviceKeyTransport.ts` behaviour. The +SDK accessor is user-level only — no device IDs — so a few Olm +sessions to non-call devices get warmed up unnecessarily, but the key +itself is per-call and only consumed by a LiveKit cryptor that +expects it. + +(History: previously walked raw `/rooms/{id}/state` REST to parse +per-device state keys. Switched in Item 5.) ### `CallWidgetBridge.handleIncomingToDevice` (key routing) From 7e440501893f553cb5f8aae1966e03c6713293af Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sun, 14 Jun 2026 12:46:00 -0400 Subject: [PATCH 08/16] Restore legacy /sfu/get-first credential path for peer interop MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adding `slot_id` flipped Relay onto lk-jwt-service's v2 /get_token, which assigns LiveKit identities as `unpadded_base64(sha256(...))`. But matrix-js-sdk's `CallMembership.parseFromEvent` reads our legacy `org.matrix.msc3401.call.member` event under `MembershipKind.Session`, where `rtcBackendIdentity` is hardcoded to `${sender}:${device_id}` — the plain-concat form, not hashed. Peers running Element Call / Element X / Element Web looked for that colon identity, never found us on LiveKit, and dropped our video. Invert the order in `fetchLiveKitToken`: try legacy first, fall forward to v2 only when legacy fails. v2 path stays plumbed for when Relay also publishes MSC4143 sticky `m.rtc.member` events (tracked separately). Also surface `m.call.member` send failures in the Activity Log with a power-level-aware hint. The most common failure shape — M_FORBIDDEN from rooms whose `power_levels.events.org.matrix.msc3401.call.member` defaults to `state_default` (50) instead of the override Relay sets at room creation — silently locks non-admin participants out of E2EE call media (no membership event in room state → peers don't send keys → black tiles). And rename track-kind logs from `publication.kind.rawValue` (integer) to a named form (`audio`/`video`/`none`), and start writing remote `didPublishTrack` events to the Activity Log. Co-Authored-By: Claude Opus 4.7 --- RelayKit/Call/CallViewModel.swift | 63 ++++++++++++++++++-- RelayKit/Call/LiveKitCredentialService.swift | 37 +++++++----- docs/internal/rtc-element-call-diff.md | 38 +++++++++++- 3 files changed, 115 insertions(+), 23 deletions(-) diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index 0fbe8f1..7a60874 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -348,7 +348,9 @@ public final class CallViewModel: CallViewModelProtocol { membershipId: membershipId ) } catch { - logger.warning("[RTC]Call membership event failed: \(error.localizedDescription, privacy: .private)") + let description = String(reflecting: error) + logger.warning("[RTC]Call membership event failed: \(description, privacy: .private)") + self.logCallMembershipFailure(error, description: description) } // 2. Start the membership heartbeat. matrix-js-sdk's @@ -605,6 +607,34 @@ public final class CallViewModel: CallViewModelProtocol { /// both legacy and v2 paths. /// /// The `participantIdentity` parameter is now only used for logging. + /// Surfaces a `sendCallMemberEvent` failure to the Activity Log. The most + /// common failure shape in the wild is M_FORBIDDEN because the room's + /// `power_levels.events.org.matrix.msc3401.call.member` defaults to + /// `state_default` (50) instead of being explicitly lowered to 0 — when + /// hit, peers running Element Call / Element X have no Matrix-level + /// record of us joining the call, so they never send us their E2EE key + /// and our tiles stay black. Relay-created rooms set the override at + /// creation (see `MatrixService.callPowerLevels`); rooms created + /// elsewhere may not. + fileprivate func logCallMembershipFailure(_ error: Error, description: String) { + let isPowerLevelDenial = description.contains("M_FORBIDDEN") + && description.contains("org.matrix.msc3401.call.member") + && description.contains("power") + let summary = "Call membership state event rejected" + let detail: String + if isPowerLevelDenial { + detail = "Homeserver returned M_FORBIDDEN: this room requires a higher power level to send `org.matrix.msc3401.call.member`. Ask a room admin to set its required power level to 0 (Relay-created rooms do this automatically). Without this event in room state, other participants can't send you E2EE keys and your tiles will stay black on encrypted calls. Raw error: \(description)" + } else { + detail = "Without a successful call membership state event, peers can't see you as a call participant and won't send you E2EE keys. Raw error: \(description)" + } + activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: summary, + detail: detail, + roomId: roomID + ) + } + fileprivate func redistributeKey(to participantIdentity: String) { guard let key = localEncryptionKey, let bridge = widgetBridge, @@ -792,6 +822,18 @@ public final class CallViewModel: CallViewModelProtocol { } } + /// Human-readable label for a `LiveKit.Track.Kind`. The raw value is + /// `Int`-backed (`audio=0`, `video=1`, `none=2`) which is useless in + /// logs. + nonisolated fileprivate static func describe(_ kind: Track.Kind) -> String { + switch kind { + case .audio: "audio" + case .video: "video" + case .none: "none" + default: "unknown(\(kind.rawValue))" + } + } + func room(_ room: LiveKit.Room, participantDidConnect participant: RemoteParticipant) { Task { @MainActor [weak viewModel] in guard let viewModel else { return } @@ -814,7 +856,7 @@ public final class CallViewModel: CallViewModelProtocol { func room(_ room: LiveKit.Room, participant: RemoteParticipant, didSubscribeTrack publication: RemoteTrackPublication) { observeDimensions(of: publication) let identityStr = participant.identity?.stringValue ?? "(none)" - let kind = publication.kind.rawValue + let kind = Self.describe(publication.kind) let sid = publication.sid Task { @MainActor [weak viewModel] in guard let viewModel else { return } @@ -872,7 +914,7 @@ public final class CallViewModel: CallViewModelProtocol { func room(_ room: LiveKit.Room, localParticipant: LocalParticipant, didPublishTrack publication: LocalTrackPublication) { observeDimensions(of: publication) - let kind = publication.kind.rawValue + let kind = Self.describe(publication.kind) let sid = publication.sid Task { @MainActor [weak viewModel] in guard let viewModel else { return } @@ -888,8 +930,19 @@ public final class CallViewModel: CallViewModelProtocol { } func room(_ room: LiveKit.Room, participant: RemoteParticipant, didPublishTrack publication: RemoteTrackPublication) { + let identityStr = participant.identity?.stringValue ?? "(none)" + let kind = Self.describe(publication.kind) + let sid = publication.sid Task { @MainActor [weak viewModel] in - viewModel?.syncParticipants(trackChanged: true) + guard let viewModel else { return } + logger.info("[RTC]Remote published \(kind, privacy: .public) track from identity=\(identityStr, privacy: .public) trackSid=\(sid, privacy: .public)") + viewModel.activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Remote published \(kind) track", + detail: "Identity: \(identityStr), trackSid: \(sid)", + roomId: viewModel.roomID + ) + viewModel.syncParticipants(trackChanged: true) } } @@ -903,7 +956,7 @@ public final class CallViewModel: CallViewModelProtocol { func room(_ room: LiveKit.Room, trackPublication: TrackPublication, didUpdateE2EEState state: E2EEState) { let stateLabel = state.toString() let trackSid = trackPublication.sid - let trackKind = trackPublication.kind.rawValue + let trackKind = Self.describe(trackPublication.kind) Task { @MainActor [weak viewModel] in guard let viewModel else { return } switch state { diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index e275f4e..31d3ece 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -221,28 +221,33 @@ struct LiveKitCredentialService { roomID: String, openIDToken: OpenIDTokenPayload ) async throws -> (url: String, token: String) { - // Try v2 first. If it fails, log the actual server response (status, - // Matrix errcode, message) before falling back to legacy — so users - // on v2-only deployments see actionable detail rather than the - // legacy endpoint's generic "tokenExchangeFailed". + // Try legacy `/sfu/get` first. It assigns LiveKit identity + // `${user}:${device}` — which matches what matrix-js-sdk peers + // (Element Call / Element X / Element Web) compute as + // `rtcBackendIdentity` from our `org.matrix.msc3401.call.member` + // event (see `CallMembership.parseFromEvent` — + // `MembershipKind.Session` branch is the plain-concat form, not the + // hashed v2 form). If we use v2 `/get_token` we land on a hashed + // identity that peers reading our legacy session event cannot + // reconcile, breaking video routing. v2 only becomes viable once we + // also publish MSC4143 sticky `m.rtc.member` events. do { - return try await fetchLiveKitTokenV2( + return try await fetchLiveKitTokenLegacy( sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken ) - } catch let v2Error { - logV2Failure(v2Error, sfuURL: sfuURL) + } catch let legacyError { + logLegacyFailure(legacyError, sfuURL: sfuURL) } - return try await fetchLiveKitTokenLegacy(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) + return try await fetchLiveKitTokenV2(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) } - /// Logs a v2 `/get_token` failure to os_log and the activity log so that - /// the silent fall-back to legacy is at least visible after the fact. - /// Format-aware: a `LiveKitCredentialError.tokenExchangeRejected` carries - /// structured detail; anything else falls through to its - /// `localizedDescription`. - private func logV2Failure(_ error: Error, sfuURL: String) { + /// Logs a `/sfu/get` failure to os_log and the activity log so that the + /// fall-forward to v2 is at least visible after the fact. Format-aware: + /// a `LiveKitCredentialError.tokenExchangeRejected` carries structured + /// detail; anything else falls through to its `localizedDescription`. + private func logLegacyFailure(_ error: Error, sfuURL: String) { let detail: String if case let LiveKitCredentialError.tokenExchangeRejected(status, errcode, message, _) = error { let errcodePart = errcode.map { " \($0)" } ?? "" @@ -251,10 +256,10 @@ struct LiveKitCredentialService { } else { detail = error.localizedDescription } - logger.warning("[RTC]/get_token failed, falling back to /sfu/get — \(detail, privacy: .public)") + logger.warning("[RTC]/sfu/get failed, trying /get_token — \(detail, privacy: .public)") activityLog?.log( category: .call, severity: .warning, source: "LiveKitCredentialService", - summary: "v2 /get_token rejected; falling back to legacy", + summary: "Legacy /sfu/get rejected; trying v2", detail: detail ) } diff --git a/docs/internal/rtc-element-call-diff.md b/docs/internal/rtc-element-call-diff.md index 0ad704a..d0675bc 100644 --- a/docs/internal/rtc-element-call-diff.md +++ b/docs/internal/rtc-element-call-diff.md @@ -36,9 +36,9 @@ Lines 178–205 in `LiveKitCredentialService.swift`. Reference: `getLiveunitJWTW | `member.claimed_device_id` | ✓ | ✓ | n/a | | `delay_id` / `delay_timeout` / `delay_cs_api_url` | not sent | optionally sent if configured | optional | -**Impact**: Missing `slot_id` causes v2 to 400 every time. Relay's `try?` swallows the failure and silently falls through to legacy `/sfu/get`. **Tracked as Item 1.** +**Resolution (2026-06-14)**: Item 1 was implemented (we sent `slot_id: "m.call#ROOM"`) but caused a regression: matrix-js-sdk's `CallMembership.parseFromEvent` computes `rtcBackendIdentity` differently depending on membership kind. For `MembershipKind.Session` (the legacy `org.matrix.msc3401.call.member` event we publish) it returns the plain concatenation `${sender}:${device_id}`, **not** the v2 hash. So peers running Element Call / Element X / Element Web read our membership event, expect us on LiveKit as `@user:server:device`, but lk-jwt-service had placed us under the v2 hash → peers couldn't reconcile our LiveKit participant with our Matrix call membership → no video routing. -**Secondary**: `member.id = ":"` differs from Element Call's UUID. The lk-jwt-service hashes `[matrixID, claimedDeviceID, memberID]` into the SFU identity. Different `member.id` → different pseudonymous identity → peers can't agree on routing. Only matters once v2 is reachable. **Tracked as Item 2.** +**Current state**: `fetchLiveKitToken` tries legacy `/sfu/get` first and falls forward to v2 only if legacy fails. v2 only becomes viable once Relay also publishes MSC4143 sticky `m.rtc.member` events (a separate, larger change). `slot_id` and v2 identity remapping (Item 2) remain plumbed; they're inert on the legacy-first path but ready for the day we adopt sticky events. ### `LiveKitCredentialService.fetchLiveKitTokenLegacy` @@ -148,6 +148,40 @@ After `state = .connected` (line 391), the Activity Log has **no further events* **Tracked as Item 0** (new — added after reviewing user `97853C31` activity log on 2026-06-13). +## Design note: MSC4143 sticky-event dual-publish (future work) + +The legacy-first credential path landed on 2026-06-14 restored interop with the current ecosystem, but pins Relay to the legacy `org.matrix.msc3401.call.member` shape. matrix-js-sdk's `StickyEventMembershipManager` already publishes the MSC4143 `m.rtc.member` sticky shape in parallel with the legacy event on stacks that opt in. To unblock v2 `/get_token` (hashed identity) and remove the legacy-stack dependency, Relay will need to dual-publish too. + +### Scope + +1. **Membership UUID threading.** `CallEncryptionService` already generates a `membershipID` UUID for the m.call.member event. Extend it to be the canonical `member.id` and thread it into `LiveKitCredentialService.fetchLiveKitTokenV2` (currently we send `":"` there). The same UUID must appear in the sticky `m.rtc.member` event so peers computing `computeRtcIdentityRaw(user_id, device_id, member.id)` get the same hash lk-jwt-service assigns. + +2. **Sticky-event publish.** New code path in `CallEncryptionService` (or a peer file) that publishes the MSC4143 `m.rtc.member` shape via the SDK's future-event API (depends on Synapse MSC4140 — gate on a homeserver capability probe, skip silently if unsupported). This is *additive*: keep publishing the legacy event for peers that only read it. + +3. **Reader merge.** `CallEncryptionService.fetchCallTargets` currently sources participants from `RoomInfo.activeRoomCallParticipants` (user-level). When sticky events ship, the SDK's accessor should already merge both event types — verify before assuming. If it doesn't, parse both event types and dedupe by `(user_id, device_id)`. + +4. **Credential path selection.** With sticky publish in place, we can revert to `fetchLiveKitToken` trying v2 first again. Peers reading our sticky event compute hashed identity; peers reading only our legacy event compute colon identity. Both must work simultaneously — meaning lk-jwt-service must place us under *one* identity but peers reading the "wrong" event lose us. Mitigation: matrix-js-sdk peers prefer the sticky event when both are present (verify in `parseFromEvent` / its caller). So as long as we sticky-publish, hashed-identity peers find us; legacy-only peers (vicr123-style older Relay builds) won't, but that's the same tradeoff matrix-js-sdk made. + +5. **Encryption key routing.** `CallWidgetBridge.handleIncomingToDevice` already registers under both identity shapes (Item 2 belt-and-braces). Keep. `CallViewModel.connect` and `redistributeKey` currently key off `room.localParticipant.identity` post-connect — that continues to work, since LiveKit tells us our actual identity directly regardless of which credential path won. + +### Out of scope for this work + +- Switching the to-device key transport from custom-payload Olm to the SDK's `ToDeviceKeyTransport`. Orthogonal. +- Replacing `MembershipManager` plumbing wholesale with the SDK's `StickyEventMembershipManager`. The Widget API doesn't expose that manager directly; we'd be reimplementing it on top of the same primitives. Out of scope until a clear interop bug forces the switch. + +### Open questions + +- **Does Synapse's MSC4140 implementation handle sticky events for E2EE rooms reliably?** matrix-js-sdk has had bugs here. Test against production homeservers before assuming. +- **Does Element X currently *publish* sticky events, or only read them?** If only reads, our dual-publish is purely forward-looking and we won't see immediate interop benefit until Element X also publishes. +- **What's the right capability probe?** No standard exists yet. Likely: try the publish, treat 400/404 with specific errcodes as "feature unavailable", cache the result per-homeserver. + +### Files this will touch + +- `RelayKit/Call/CallEncryptionService.swift` — membership UUID threading, sticky-event publish, optional reader merge +- `RelayKit/Call/LiveKitCredentialService.swift` — revert `fetchLiveKitToken` to v2-first, thread UUID into request body +- `RelayKit/Call/CallWidgetBridge.swift` — no changes expected (key routing already dual-registers) +- `RelayKit/Call/CallViewModel.swift` — no changes expected + ## What this file is NOT - Not user-facing — see `docs/troubleshooting-calls.md` for that. From 791ac0ef39105856d613f3fdaa128c946e221cb8 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sun, 14 Jun 2026 13:20:25 -0400 Subject: [PATCH 09/16] Honor focus_selection: oldest_membership when picking SFU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `discoverSFUURL` previously preferred local SFU discovery (MSC4143 transports endpoint / `.well-known`) and only consulted peers' `foci_preferred` as a last-resort fallback. That breaks federated calls: when another homeserver's user starts the call first, they advertise their SFU under `focus_active.focus_selection == "oldest_membership"`, but a later-joining Relay client would ignore that and connect to its own homeserver's SFU instead — splitting the call across two SFUs and stranding both sides on "waiting for media". Reorder discovery so the existing-call SFU wins. Refactor `fetchSFUFromCallMembers` to pick the *oldest* surviving membership rather than the first one found, and to drop expired (`created_ts + expires < now`) and tombstoned (empty content) entries so a stale leftover can't outvote the live participants. Local discovery stays as the bootstrap path for the first joiner. Co-Authored-By: Claude Opus 4.7 --- RelayKit/Call/LiveKitCredentialService.swift | 76 +++++++++++++------- 1 file changed, 49 insertions(+), 27 deletions(-) diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index 31d3ece..021333f 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -91,21 +91,21 @@ struct LiveKitCredentialService { // MARK: - Step 1: Discover SFU URL private func discoverSFUURL(roomID: String) async throws -> String { - // Prefer the MSC4143 transports endpoint - if let url = try? await fetchRTCTransportsURL() { + // Existing-call SFU wins. Per `focus_active.focus_selection == + // "oldest_membership"` (matrix-js-sdk's `MatrixRTCSession`), every + // joiner must converge on the SFU advertised by the *oldest* + // active call membership. If we picked our own homeserver's SFU + // here instead, a federated call would split across two SFUs and + // media never reaches the other side. + if let url = try? await fetchSFUFromCallMembers(roomID: roomID) { return url } - // Fall back to .well-known - if let url = try? await fetchWellKnownSFUURL() { + // Bootstrap path — we're the first to join. Prefer MSC4143 + // transports endpoint, fall back to `.well-known`. + if let url = try? await fetchRTCTransportsURL() { return url } - // Last resort: read another active call participant's - // `foci_preferred[0]` from `m.call.member` state. Joining an - // in-progress call on a homeserver without `.well-known` configured - // would otherwise fail with `sfuURLNotFound` even though the SFU is - // visible in room state. Matches Element Call / matrix-js-sdk - // discovery behaviour. - if let url = try? await fetchSFUFromCallMembers(roomID: roomID) { + if let url = try? await fetchWellKnownSFUURL() { return url } throw LiveKitCredentialError.sfuURLNotFound @@ -149,12 +149,15 @@ struct LiveKitCredentialService { return first.livekitServiceUrl } - /// Walks `m.call.member` state events on the room and returns the first - /// `foci_preferred[].livekit_service_url` advertised by a peer with - /// non-empty content. Lets a user join an in-progress call when their - /// homeserver doesn't expose `.well-known org.matrix.msc4143.rtc_foci` - /// — the SFU the existing participants are already using is right there - /// in room state. + /// Reads `m.call.member` state events on the room and returns the + /// `foci_preferred[].livekit_service_url` advertised by the **oldest** + /// active call membership — the SFU every joiner is supposed to + /// converge on per `focus_active.focus_selection == "oldest_membership"`. + /// + /// Skips tombstoned (empty-content) and expired memberships so a stale + /// leftover doesn't outvote the live participants. Returns + /// `sfuURLNotFound` when nobody is in the call yet, signalling the + /// caller to bootstrap via local discovery. private func fetchSFUFromCallMembers(roomID: String) async throws -> String { let base = homeserver.trimmingCharacters(in: .init(charactersIn: "/")) let encodedRoomID = roomID.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) ?? roomID @@ -173,24 +176,43 @@ struct LiveKitCredentialService { throw LiveKitCredentialError.sfuURLNotFound } + struct Candidate { + let createdTs: Int64 + let sfuURL: String + } + var candidates: [Candidate] = [] + let nowMs = Int64(Date().timeIntervalSince1970 * 1000) + for event in events { guard let type = event["type"] as? String, type == "org.matrix.msc3401.call.member", let content = event["content"] as? [String: Any], !content.isEmpty, - let fociPreferred = content["foci_preferred"] as? [[String: Any]] + let fociPreferred = content["foci_preferred"] as? [[String: Any]], + let focus = fociPreferred.first(where: { ($0["type"] as? String) == "livekit" }), + let serviceURL = focus["livekit_service_url"] as? String, + !serviceURL.isEmpty else { continue } - for focus in fociPreferred { - guard let focusType = focus["type"] as? String, - focusType == "livekit", - let serviceURL = focus["livekit_service_url"] as? String, - !serviceURL.isEmpty - else { continue } - logger.info("[RTC]Recovered SFU URL from existing call member state") - return serviceURL + + // Drop expired memberships. Default `expires` is 4h + // (14400000ms). `created_ts` falls back to event-level + // `origin_server_ts` so very old non-tombstoned events still + // get pruned. + let createdTs = (content["created_ts"] as? NSNumber)?.int64Value + ?? (event["origin_server_ts"] as? NSNumber)?.int64Value + ?? 0 + let expires = (content["expires"] as? NSNumber)?.int64Value ?? 14400000 + if createdTs > 0 && createdTs + expires < nowMs { + continue } + candidates.append(Candidate(createdTs: createdTs, sfuURL: serviceURL)) } - throw LiveKitCredentialError.sfuURLNotFound + + guard let oldest = candidates.min(by: { $0.createdTs < $1.createdTs }) else { + throw LiveKitCredentialError.sfuURLNotFound + } + logger.info("[RTC]Joining existing call SFU per oldest_membership: \(oldest.sfuURL, privacy: .public)") + return oldest.sfuURL } // MARK: - Step 2: Request OpenID Token From e53d0c412325229a2f43a9c5c7f0137dba6c5f02 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Sun, 14 Jun 2026 13:35:42 -0400 Subject: [PATCH 10/16] Surface ongoing-call state in the room toolbar button Plumb `RoomInfo.hasRoomCall` from the SDK through to the room toolbar. When a call is in progress, the call button flips from icon-only to icon + "Join Call" label, rendered in the app's accent color so the state change is visible at a glance, and the confirmation dialog re-words to "Join" instead of "Start". - Add `hasRoomCall: Bool` to `RelayInterface.RoomSummary` - Mirror `info.hasRoomCall` into the observable summary inside `RoomListManager.applyRoomInfo` - Branch the toolbar button label/foreground style and confirmation copy on `currentRoom?.hasRoomCall` Co-Authored-By: Claude Opus 4.7 --- .../RelayInterface/Models/RoomSummary.swift | 10 ++++++- Relay/Views/MainView.swift | 28 ++++++++++++++----- RelayKit/Services/RoomListManager.swift | 1 + 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/Packages/RelayInterface/Sources/RelayInterface/Models/RoomSummary.swift b/Packages/RelayInterface/Sources/RelayInterface/Models/RoomSummary.swift index 7065ab4..c5f3aa5 100644 --- a/Packages/RelayInterface/Sources/RelayInterface/Models/RoomSummary.swift +++ b/Packages/RelayInterface/Sources/RelayInterface/Models/RoomSummary.swift @@ -152,6 +152,12 @@ public final class RoomSummary: Identifiable { /// space filter bar. public var parentSpaceIds: Set + /// Whether this room currently has an ongoing MatrixRTC call. + /// + /// Mirrors the SDK's `RoomInfo.hasRoomCall`, which is true whenever any + /// non-expired `m.call.member` state event exists in the room. + public var hasRoomCall: Bool + /// Creates a new ``RoomSummary`` instance. /// /// - Parameters: @@ -192,7 +198,8 @@ public final class RoomSummary: Identifiable { inviterName: String? = nil, inviterAvatarURL: String? = nil, isSpace: Bool = false, - parentSpaceIds: Set = [] + parentSpaceIds: Set = [], + hasRoomCall: Bool = false ) { self.id = id self.name = name @@ -213,5 +220,6 @@ public final class RoomSummary: Identifiable { self.inviterAvatarURL = inviterAvatarURL self.isSpace = isSpace self.parentSpaceIds = parentSpaceIds + self.hasRoomCall = hasRoomCall } } diff --git a/Relay/Views/MainView.swift b/Relay/Views/MainView.swift index 4cbf024..0d5c2be 100644 --- a/Relay/Views/MainView.swift +++ b/Relay/Views/MainView.swift @@ -373,25 +373,39 @@ struct MainView: View { // swiftlint:disable:this type_body_length } private func startCallButton(roomId: String) -> some View { - Button { + let hasOngoingCall = currentRoom?.hasRoomCall ?? false + let label = hasOngoingCall ? "Join Call" : "Start Call" + let confirmTitle = hasOngoingCall ? "Join Call" : "Start Call" + let confirmAction = hasOngoingCall ? "Join" : "Call" + return Button { showCallConfirmation = true } label: { - Label("Start Call", systemImage: "phone.fill") + // Force the title to render alongside the icon on + // ongoing-call state so the toolbar pill visibly changes + // — default macOS toolbar style would hide the title and + // leave the pill indistinguishable from the idle state. + if hasOngoingCall { + Label(label, systemImage: "phone.fill") + .labelStyle(.titleAndIcon) + .foregroundStyle(Color.accentColor) + } else { + Label(label, systemImage: "phone.fill") + } } - .help("Start Call") + .help(label) .disabled(callManager.hasActiveCall) .confirmationDialog( - "Start Call", + confirmTitle, isPresented: $showCallConfirmation ) { - Button("Call") { + Button(confirmAction) { startCall(roomId: roomId) } } message: { if let name = currentRoom?.name { - Text("Start a call in \(name)?") + Text(hasOngoingCall ? "Join the call in \(name)?" : "Start a call in \(name)?") } else { - Text("Start a call in this room?") + Text(hasOngoingCall ? "Join the call in this room?" : "Start a call in this room?") } } } diff --git a/RelayKit/Services/RoomListManager.swift b/RelayKit/Services/RoomListManager.swift index 8c682d5..7ecfdc1 100644 --- a/RelayKit/Services/RoomListManager.swift +++ b/RelayKit/Services/RoomListManager.swift @@ -535,6 +535,7 @@ private final class RoomEntry: Identifiable { summary.pinnedEventIds = info.pinnedEventIds summary.isFavourite = info.isFavourite summary.successorRoomId = info.successorRoom?.roomId + summary.hasRoomCall = info.hasRoomCall // Map SDK membership to RelayInterface type switch info.membership { From 0c39e2076eb4669dbd2ee81015ae235b4586485a Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Mon, 15 Jun 2026 09:15:09 -0400 Subject: [PATCH 11/16] Unify call-path logging on ActivityLog and fix federation key race MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related threads: 1. Unify logging on the Activity Log. Drop every `logger.X("[RTC]…")` call in the four call-path files — `LiveKitCredentialService`, `CallEncryptionService`, `CallWidgetBridge`, and `CallViewModel`. Where an Activity Log entry already covered the same information, the logger line was redundant; where only a logger line existed, port the content into a new Activity Log entry first. The Activity Log auto-routes events to Console, so anything previously only visible there now lands in both Console and the exported Activity Log JSON. `CallEncryptionService.makeHKDFKeyProvider` now returns `(provider, hkdfInstalled, fallbackReason)`, and `setRawKey` now returns an optional failure reason string instead of os_log'ing internally — both move responsibility for surfacing the outcome to the caller, which has access to `activityLog` and the surrounding call context. `CallViewModel.startHeartbeat` now takes `activityLog` and `roomID` so heartbeat refresh failures still land in the Activity Log without a local Logger. 2. Fix federation E2EE key-distribution race. When Relay is the first joiner and a federated peer arrives later, LiveKit's `participantDidConnect` fires before the peer's `m.call.member` event reaches the SDK. Our existing `redistributeKey(to:)` then calls `fetchCallTargets`, which reads `RoomInfo.activeRoomCallParticipants` — empty at that moment — and exits without sending. The peer never receives our key, and Element Call / Element X show our tile as black. Add a second trigger that fires after the m.call.member event actually arrives. `CallWidgetBridge` exposes a new `onCallMemberStateChanged` callback invoked when the widget driver delivers any inbound `org.matrix.msc3401.call.member` event; `CallViewModel` wires it to `redistributeKeyOnMembershipChange()`, which re-fetches targets and re-sends the key — but only when the *user set* differs from the previous snapshot, so periodic heartbeats from existing peers don't cause us to spam Olm-encrypted to-device payloads. Co-Authored-By: Claude Opus 4.7 --- RelayKit/Call/CallEncryptionService.swift | 103 ++++++---- RelayKit/Call/CallViewModel.swift | 195 +++++++++++++++---- RelayKit/Call/CallWidgetBridge.swift | 186 ++++++++++++++---- RelayKit/Call/LiveKitCredentialService.swift | 38 ++-- 4 files changed, 392 insertions(+), 130 deletions(-) diff --git a/RelayKit/Call/CallEncryptionService.swift b/RelayKit/Call/CallEncryptionService.swift index a900de0..c6bde0f 100644 --- a/RelayKit/Call/CallEncryptionService.swift +++ b/RelayKit/Call/CallEncryptionService.swift @@ -16,11 +16,8 @@ import CryptoKit import Foundation import LiveKit import MatrixRustSDK -import OSLog import RelayInterface -private let logger = Logger(subsystem: "RelayKit", category: "CallEncryption") - /// Helpers for MatrixRTC call-member state signaling, power-level bootstrap, /// and LiveKit key provider plumbing. /// @@ -118,18 +115,15 @@ struct CallEncryptionService { let jsonString = String(data: jsonData, encoding: .utf8) ?? "{}" // Body + state key contain device IDs and per-call membership UUIDs; // not raw secrets but routing data we don't need leaking to Console. - logger.debug("[RTC]Call member event body: \(jsonString, privacy: .private)") - logger.debug("[RTC]Call member state key: \(stateKey, privacy: .private)") - _ = try await sdkRoom.sendStateEventRaw( eventType: Self.callMemberEventType, stateKey: stateKey, content: jsonString ) - logger.info("[RTC]Sent call membership state event") activityLog?.log( category: .call, severity: .debug, source: "CallEncryptionService", summary: "Sent call membership state event", + detail: "state_key: \(stateKey), membershipID: \(membership), foci_preferred SFU: \(serviceURL).", roomId: roomID ) } @@ -146,7 +140,6 @@ struct CallEncryptionService { stateKey: stateKey, content: "{}" ) - logger.info("[RTC]Removed call membership state event") activityLog?.log( category: .call, severity: .debug, source: "CallEncryptionService", summary: "Removed call membership state event", @@ -173,16 +166,52 @@ struct CallEncryptionService { return } + struct MemberSummary { + let stateKey: String + let isActive: Bool + let sfuURL: String? + let membershipID: String? + } + var summaries: [MemberSummary] = [] + for event in events { guard let type = event["type"] as? String, type == Self.callMemberEventType else { continue } let stateKey = event["state_key"] as? String ?? "(none)" - if let content = event["content"], - let contentData = try? JSONSerialization.data(withJSONObject: content, options: [.sortedKeys]), - let contentStr = String(data: contentData, encoding: .utf8) { - // .private — call routing data + device IDs, not for Console. - logger.debug("[RTC]Existing call member [key=\(stateKey, privacy: .private)]: \(contentStr, privacy: .private)") + let contentDict = event["content"] as? [String: Any] ?? [:] + let isActive = !contentDict.isEmpty + let sfu = (contentDict["foci_preferred"] as? [[String: Any]])? + .first(where: { ($0["type"] as? String) == "livekit" })?["livekit_service_url"] as? String + let membership = contentDict["membershipID"] as? String + summaries.append(MemberSummary( + stateKey: stateKey, + isActive: isActive, + sfuURL: sfu, + membershipID: membership + )) + } + + let active = summaries.filter { $0.isActive } + let tombstoned = summaries.count - active.count + if active.isEmpty { + activityLog?.log( + category: .call, severity: .debug, source: "CallEncryptionService", + summary: "No active call members in room", + detail: "Total `m.call.member` events scanned: \(summaries.count) (\(tombstoned) tombstoned).", + roomId: roomID + ) + } else { + let lines = active.map { summary -> String in + let sfu = summary.sfuURL ?? "(no SFU advertised)" + let mid = summary.membershipID ?? "(no membershipID)" + return " \(summary.stateKey) — SFU: \(sfu), membershipID: \(mid)" } + activityLog?.log( + category: .call, severity: .debug, source: "CallEncryptionService", + summary: "Active call members in room: \(active.count)", + detail: "Scanned \(summaries.count) `m.call.member` events (\(tombstoned) tombstoned).\n\(lines.joined(separator: "\n"))", + roomId: roomID + ) } } @@ -282,7 +311,7 @@ struct CallEncryptionService { static func makeHKDFKeyProvider( ratchetWindowSize: Int32 = 10, keyRingSize: Int32 = 256 - ) -> BaseKeyProvider { + ) -> (provider: BaseKeyProvider, hkdfInstalled: Bool, fallbackReason: String?) { let options = KeyProviderOptions( sharedKey: false, ratchetWindowSize: ratchetWindowSize, @@ -291,8 +320,7 @@ struct CallEncryptionService { let provider = BaseKeyProvider(options: options) guard let cls = NSClassFromString("LKRTCFrameCryptorKeyProvider") as? NSObject.Type else { - logger.error("[RTC]LKRTCFrameCryptorKeyProvider class not found at runtime; HKDF swap skipped — E2EE interop with Element Call will fail (PBKDF2 vs HKDF mismatch)") - return provider + return (provider, false, "LKRTCFrameCryptorKeyProvider class not found at runtime") } let initSel = NSSelectorFromString( @@ -307,8 +335,7 @@ struct CallEncryptionService { ) let allocated = allocImp(cls, allocSel) guard (allocated as AnyObject).responds(to: initSel) else { - logger.error("[RTC]LKRTCFrameCryptorKeyProvider does not expose keyDerivationAlgorithm: init; webrtc-xcframework may be < 144.x — falling back to PBKDF2 (Element Call interop will fail)") - return provider + return (provider, false, "LKRTCFrameCryptorKeyProvider does not expose keyDerivationAlgorithm: init (webrtc-xcframework may be < 144.x)") } typealias InitFunc = @convention(c) ( @@ -334,12 +361,10 @@ struct CallEncryptionService { ) guard let ivar = class_getInstanceVariable(BaseKeyProvider.self, "rtcKeyProvider") else { - logger.error("[RTC]rtcKeyProvider ivar not found on BaseKeyProvider; HKDF swap skipped") - return provider + return (provider, false, "rtcKeyProvider ivar not found on BaseKeyProvider") } object_setIvar(provider, ivar, hkdfRtc) - logger.info("[RTC]Installed HKDF-backed LKRTCFrameCryptorKeyProvider (Element Call interop path)") - return provider + return (provider, true, nil) } /// Sets a raw key on a `BaseKeyProvider` for the given participant, bypassing @@ -349,15 +374,22 @@ struct CallEncryptionService { /// `BaseKeyProvider` is decorated with `@objcMembers`, so its internal /// `rtcKeyProvider` (an `LKRTCFrameCryptorKeyProvider`) is accessible via KVC. /// The ObjC provider accepts `NSData` directly. + /// Sets a raw AES key on the provider for `participantId`. Returns + /// `nil` on success, or a short failure reason string the caller can + /// surface in the Activity Log. The fingerprint of the raw IKM is + /// computed by the caller (via the SHA-256 it already keeps for its + /// own bookkeeping) — diverging fingerprints across local/peer + /// records are the #1 root cause of "maximum ratchet attempts + /// exceeded" on an otherwise-correct key-exchange handshake. + @discardableResult static func setRawKey( _ keyData: Data, on keyProvider: BaseKeyProvider, participantId: String, index: Int32 = 0 - ) { + ) -> String? { guard let rtcProvider = keyProvider.value(forKey: "rtcKeyProvider") as AnyObject? else { - logger.error("[RTC]Could not access rtcKeyProvider via KVC") - return + return "Could not access rtcKeyProvider via KVC" } // LKRTCFrameCryptorKeyProvider is an ObjC class with: @@ -367,8 +399,7 @@ struct CallEncryptionService { typealias SetKeyFunc = @convention(c) (AnyObject, Selector, NSData, Int32, NSString) -> Void let selector = NSSelectorFromString("setKey:withIndex:forParticipant:") guard (rtcProvider as? NSObject)?.responds(to: selector) == true else { - logger.error("[RTC]rtcKeyProvider does not respond to setKey:withIndex:forParticipant:") - return + return "rtcKeyProvider does not respond to setKey:withIndex:forParticipant:" } let imp = unsafeBitCast( @@ -376,28 +407,22 @@ struct CallEncryptionService { to: SetKeyFunc.self ) imp(rtcProvider, selector, keyData as NSData, index, participantId as NSString) - // SHA-256 fingerprint of the raw IKM so we can confirm the exact same - // 16 bytes end up on the wire. Matches the fingerprint logged in - // CallWidgetBridge.sendEncryptionKey. Diverging fingerprints mean - // our local frame cryptor and the peer are using different keys — - // the #1 root cause of "maximum ratchet attempts exceeded" on an - // otherwise-correct key-exchange handshake. - let fp = SHA256.hash(data: keyData).prefix(8).map { String(format: "%02x", $0) }.joined() - logger.info("[RTC]Set raw encryption key for participant \(participantId, privacy: .public) at index \(index) bytes=\(keyData.count) sha256[0..8]=\(fp, privacy: .public)") + return nil } /// Convenience: sets a raw key using base64-encoded key data. + /// Returns `nil` on success or a short failure reason. + @discardableResult static func setRawKey( base64Key: String, on keyProvider: BaseKeyProvider, participantId: String, index: Int32 = 0 - ) { + ) -> String? { guard let keyData = Data(base64Encoded: base64Key) else { - logger.error("[RTC]Invalid base64 key for participant \(participantId, privacy: .private)") - return + return "Invalid base64 key for participant \(participantId)" } - setRawKey(keyData, on: keyProvider, participantId: participantId, index: index) + return setRawKey(keyData, on: keyProvider, participantId: participantId, index: index) } } diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index 7a60874..2bbf038 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -15,11 +15,8 @@ import Foundation import LiveKit import RelayInterface -import OSLog import SwiftUI -private let logger = Logger(subsystem: "RelayKit", category: "Call") - /// A concrete ``CallViewModelProtocol`` implementation backed by the LiveKit Swift SDK. /// /// ``CallViewModel`` owns a `LiveKit.Room` instance and bridges its delegate callbacks @@ -72,6 +69,11 @@ public final class CallViewModel: CallViewModelProtocol { /// The LiveKit key provider used for per-participant AES-GCM frame encryption. @ObservationIgnored private var keyProvider: BaseKeyProvider? + /// `true` when the HKDF-SHA256 LKRTCFrameCryptorKeyProvider was + /// successfully swapped in. `false` means we fell back to the + /// default PBKDF2 provider and interop with Element Call will fail. + @ObservationIgnored + private var hkdfKeyProviderInstalled: Bool = false /// The local participant's current encryption key (raw 16 bytes). @ObservationIgnored private var localEncryptionKey: Data? @@ -188,10 +190,12 @@ public final class CallViewModel: CallViewModelProtocol { // so the two sides produce different AES keys from matching // fingerprints, and every frame's auth tag fails on the peer. // See CallEncryptionService.makeHKDFKeyProvider for details. - self.keyProvider = CallEncryptionService.makeHKDFKeyProvider( + let result = CallEncryptionService.makeHKDFKeyProvider( ratchetWindowSize: 10, keyRingSize: 256 ) + self.keyProvider = result.provider + self.hkdfKeyProviderInstalled = result.hkdfInstalled } self.matrixRoom = encryptionContext.matrixRoom } @@ -225,9 +229,22 @@ public final class CallViewModel: CallViewModelProtocol { EncryptionOptions(keyProvider: $0, encryptionType: .gcm) } if isE2eeEnabled { - logger.info("[RTC]E2EE enabled (encrypted Matrix room)") + let kdfDetail = hkdfKeyProviderInstalled + ? "HKDF-SHA256 key derivation active (Element Call interop path)." + : "WARNING: HKDF swap failed — using default PBKDF2. Element Call peers will produce different AES keys from the same IKM and frames will fail to decrypt." + activityLog?.log( + category: .call, severity: hkdfKeyProviderInstalled ? .debug : .warning, source: "CallViewModel", + summary: "LiveKit E2EE enabled", + detail: "GCM frame encryption active. \(kdfDetail)", + roomId: roomID + ) } else { - logger.info("[RTC]E2EE disabled (unencrypted Matrix room)") + activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "LiveKit E2EE disabled", + detail: "Unencrypted Matrix room — frames sent in the clear to the SFU.", + roomId: roomID + ) } let roomOpts = RoomOptions( defaultVideoPublishOptions: VideoPublishOptions( @@ -248,7 +265,12 @@ public final class CallViewModel: CallViewModelProtocol { roomOptions: roomOpts ) localParticipantID = room.localParticipant.identity?.stringValue - logger.info("[RTC]Connected with LiveKit identity: \(self.localParticipantID ?? "unknown", privacy: .public)") + activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Connected to LiveKit", + detail: "Local identity: \(localParticipantID ?? "unknown"). Peers reading our `m.call.member` event expect this to match `${sender}:${device_id}` for legacy session events.", + roomId: roomID + ) // Spin up the headless widget bridge *only* for encrypted rooms. // For unencrypted rooms the bridge adds no value (no keys to @@ -265,10 +287,18 @@ public final class CallViewModel: CallViewModelProtocol { keyProvider: self.keyProvider ) bridge.activityLog = self.activityLog + bridge.onCallMemberStateChanged = { [weak self] in + self?.redistributeKeyOnMembershipChange() + } bridge.start() self.widgetBridge = bridge } catch { - logger.error("[RTC]Failed to create CallWidgetBridge: \(error.localizedDescription, privacy: .private)") + activityLog?.log( + category: .call, severity: .error, source: "CallViewModel", + summary: "Failed to create CallWidgetBridge", + detail: "E2EE key exchange will not work; remote tiles will stay black. Error: \(error.localizedDescription)", + roomId: roomID + ) } } @@ -293,7 +323,6 @@ public final class CallViewModel: CallViewModelProtocol { self.localEncryptionKey = key let keyIndex = self.localKeyIndex guard let livekitIdentity = self.localParticipantID, !livekitIdentity.isEmpty else { - logger.error("[RTC]LiveKit local participant has no identity — cannot register cryptor key") activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "LiveKit assigned no local identity", @@ -302,13 +331,19 @@ public final class CallViewModel: CallViewModelProtocol { ) throw CallViewModelError.missingLocalParticipantIdentity } - CallEncryptionService.setRawKey( + let setKeyFailure = CallEncryptionService.setRawKey( key, on: keyProvider, participantId: livekitIdentity, index: Int32(keyIndex) ) - logger.info("[RTC]Local E2EE key set (index \(keyIndex)) under participantId=\(livekitIdentity, privacy: .public) before camera publish") + let failureNote = setKeyFailure.map { " setRawKey failure: \($0)." } ?? "" + activityLog?.log( + category: .call, severity: setKeyFailure == nil ? .debug : .error, source: "CallViewModel", + summary: "Local E2EE key installed", + detail: "Index: \(keyIndex), participantId: \(livekitIdentity). Frame cryptor will use this key for outbound frames before camera/mic publish.\(failureNote)", + roomId: roomID + ) } // Set up MatrixRTC signaling and distribute the key **before** @@ -349,7 +384,6 @@ public final class CallViewModel: CallViewModelProtocol { ) } catch { let description = String(reflecting: error) - logger.warning("[RTC]Call membership event failed: \(description, privacy: .private)") self.logCallMembershipFailure(error, description: description) } @@ -360,7 +394,9 @@ public final class CallViewModel: CallViewModelProtocol { self.heartbeatTask = Self.startHeartbeat( encryptionService: encryptionService, sfuServiceURL: sfuServiceURL, - membershipId: membershipId + membershipId: membershipId, + activityLog: self.activityLog, + roomID: self.roomID ) // 3. Distribute the already-generated local key via the @@ -373,20 +409,28 @@ public final class CallViewModel: CallViewModelProtocol { if self.isE2eeEnabled, let bridge, let localKey { let targets = await encryptionService.fetchCallTargets() self.callMembers = targets - logger.info("[RTC]Distributing key to \(targets.count) remote user(s) BEFORE media publish") + let targetList = targets.keys.sorted().joined(separator: ", ") + activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Distributing E2EE key to \(targets.count) user(s) before media publish", + detail: "Recipients: \(targetList.isEmpty ? "(none)" : targetList).", + roomId: roomID + ) do { try await bridge.sendEncryptionKey( localKey, keyIndex: keyIndex, toMembers: targets ) + // Success entry — including fp — already written by + // CallWidgetBridge.sendEncryptionKey. + } catch { activityLog?.log( - category: .call, severity: .debug, source: "CallViewModel", - summary: "Distributed E2EE key to \(targets.count) user(s)", + category: .call, severity: .warning, source: "CallViewModel", + summary: "E2EE key distribution failed", + detail: "Tried sending to \(targets.count) user(s): \(targetList). Peers will see `missing_key` and our media will appear as black tiles to them. Error: \(error.localizedDescription)", roomId: roomID ) - } catch { - logger.warning("[RTC]Widget-bridge key distribution failed: \(error.localizedDescription, privacy: .private)") } } } @@ -406,8 +450,6 @@ public final class CallViewModel: CallViewModelProtocol { roomId: roomID ) } catch { - logger.error("[RTC]Connect failed: \(error.localizedDescription, privacy: .private)") - // The native WebRTC audio engine returns -9000 // (kAudioEngineErrorInsufficientDevicePermission) when // microphone access is denied. The LiveKit SDK wraps this in a @@ -476,12 +518,11 @@ public final class CallViewModel: CallViewModelProtocol { nonisolated private static func startHeartbeat( encryptionService: CallEncryptionService, sfuServiceURL: String, - membershipId: String? + membershipId: String?, + activityLog: ActivityLog?, + roomID: String? ) -> Task { Task.detached(priority: .background) { - // Local logger — the file-scope `logger` is inferred as - // MainActor-isolated and isn't reachable from a detached task. - let log = Logger(subsystem: "RelayKit", category: "Call") while !Task.isCancelled { do { try await Task.sleep(for: heartbeatInterval) @@ -494,9 +535,18 @@ public final class CallViewModel: CallViewModelProtocol { sfuServiceURL: sfuServiceURL, membershipId: membershipId ) - log.debug("[RTC]Heartbeat refreshed call.member state event") + // Success entry already written by + // `CallEncryptionService.sendCallMemberEvent`. } catch { - log.warning("[RTC]Heartbeat refresh failed: \(error.localizedDescription, privacy: .private)") + let description = error.localizedDescription + await MainActor.run { + activityLog?.log( + category: .call, severity: .warning, source: "CallViewModel", + summary: "Call membership heartbeat refresh failed", + detail: "Other participants may treat us as having left when our event expires. Error: \(description)", + roomId: roomID + ) + } } } } @@ -635,6 +685,62 @@ public final class CallViewModel: CallViewModelProtocol { ) } + /// Re-distributes our local E2EE key in response to an inbound + /// `m.call.member` state change. The widget bridge fires the + /// callback whenever it sees one of these events; we use that as a + /// signal to refresh our recipient set, because the SDK's + /// `RoomInfo.activeRoomCallParticipants` accessor lags behind + /// LiveKit's `participantDidConnect` (which is what + /// ``redistributeKey(to:)`` keys off). + /// + /// Guarded against heartbeat refreshes: skips when the *user* set + /// of targets hasn't changed since the last send. + fileprivate func redistributeKeyOnMembershipChange() { + guard let key = localEncryptionKey, + let bridge = widgetBridge, + let encryptionService else { + return + } + let index = localKeyIndex + + Task { [weak self] in + guard let self else { return } + let targets = await encryptionService.fetchCallTargets() + let targetUserIDs = Set(targets.keys) + let previousUserIDs = await MainActor.run { Set(self.callMembers.keys) } + // Heartbeat / unchanged-member case: no new peer, nothing to do. + if targetUserIDs.isEmpty || targetUserIDs == previousUserIDs { return } + + let targetList = targets.keys.sorted().joined(separator: ", ") + do { + try await bridge.sendEncryptionKey( + key, + keyIndex: index, + toMembers: targets + ) + await MainActor.run { + self.callMembers = targets + self.activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "Redistributed E2EE key on m.call.member change", + detail: "Recipients: \(targetList). Index: \(index).", + roomId: self.roomID + ) + } + } catch { + let description = error.localizedDescription + await MainActor.run { + self.activityLog?.log( + category: .call, severity: .warning, source: "CallViewModel", + summary: "E2EE key redistribution failed (m.call.member trigger)", + detail: "Targets: \(targetList). Error: \(description)", + roomId: self.roomID + ) + } + } + } + } + fileprivate func redistributeKey(to participantIdentity: String) { guard let key = localEncryptionKey, let bridge = widgetBridge, @@ -646,18 +752,34 @@ public final class CallViewModel: CallViewModelProtocol { Task { let targets = await encryptionService.fetchCallTargets() guard !targets.isEmpty else { - logger.info("[RTC]No call targets to redistribute key to (new participant \(participantIdentity, privacy: .public))") + await MainActor.run { + activityLog?.log( + category: .call, severity: .debug, source: "CallViewModel", + summary: "No call targets to redistribute key to", + detail: "Trigger: new participant \(participantIdentity). `fetchCallTargets` returned an empty map.", + roomId: roomID + ) + } return } + let targetList = targets.keys.sorted().joined(separator: ", ") do { try await bridge.sendEncryptionKey( key, keyIndex: index, toMembers: targets ) - logger.info("[RTC]Redistributed key to \(targets.count) user(s) (triggered by new participant \(participantIdentity, privacy: .public))") + // Success entry — including fp — already written by + // CallWidgetBridge.sendEncryptionKey. } catch { - logger.warning("[RTC]Key redistribution failed for \(participantIdentity, privacy: .public): \(error.localizedDescription, privacy: .public)") + await MainActor.run { + activityLog?.log( + category: .call, severity: .warning, source: "CallViewModel", + summary: "E2EE key redistribution failed", + detail: "Trigger: new participant \(participantIdentity). Targets: \(targetList). Error: \(error.localizedDescription)", + roomId: roomID + ) + } } } } @@ -745,7 +867,6 @@ public final class CallViewModel: CallViewModelProtocol { if viewModel.state == .connected { viewModel.state = .disconnected } - logger.info("[RTC]LiveKit connection state: disconnected") viewModel.activityLog?.log( category: .call, severity: .warning, source: "CallViewModel", summary: "LiveKit connection disconnected", @@ -753,7 +874,6 @@ public final class CallViewModel: CallViewModelProtocol { roomId: viewModel.roomID ) case .reconnecting: - logger.info("[RTC]Reconnecting…") viewModel.activityLog?.log( category: .call, severity: .warning, source: "CallViewModel", summary: "Call reconnecting", @@ -772,7 +892,6 @@ public final class CallViewModel: CallViewModelProtocol { let description = error?.localizedDescription ?? "no error reported" Task { @MainActor [weak viewModel] in guard let viewModel else { return } - logger.error("[RTC]LiveKit didFailToConnect: \(description, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "LiveKit connection rejected", @@ -791,7 +910,6 @@ public final class CallViewModel: CallViewModelProtocol { Task { @MainActor [weak viewModel] in guard let viewModel else { return } if let description { - logger.error("[RTC]LiveKit didDisconnect: \(description, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "LiveKit connection lost", @@ -799,7 +917,6 @@ public final class CallViewModel: CallViewModelProtocol { roomId: viewModel.roomID ) } else { - logger.info("[RTC]LiveKit didDisconnect (clean)") viewModel.activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", summary: "LiveKit disconnected cleanly", @@ -839,11 +956,11 @@ public final class CallViewModel: CallViewModelProtocol { guard let viewModel else { return } let identityStr = participant.identity?.stringValue ?? "(none)" let sidStr = participant.sid?.stringValue ?? "(none)" - logger.info("[RTC]Remote participant connected: identity=\(identityStr, privacy: .public) sid=\(sidStr, privacy: .public) name=\(participant.name ?? "(none)", privacy: .public)") + let displayName = participant.name ?? "(none)" viewModel.activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", summary: "Remote participant connected", - detail: "Identity: \(identityStr)", + detail: "Identity: \(identityStr), sid: \(sidStr), name: \(displayName)", roomId: viewModel.roomID ) viewModel.syncParticipants(trackChanged: true) @@ -860,7 +977,6 @@ public final class CallViewModel: CallViewModelProtocol { let sid = publication.sid Task { @MainActor [weak viewModel] in guard let viewModel else { return } - logger.info("[RTC]Subscribed to \(kind, privacy: .public) track from identity=\(identityStr, privacy: .public) trackSid=\(sid, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", summary: "Subscribed to remote \(kind) track", @@ -880,7 +996,6 @@ public final class CallViewModel: CallViewModelProtocol { let description = error.localizedDescription Task { @MainActor [weak viewModel] in guard let viewModel else { return } - logger.error("[RTC]Failed to subscribe to track \(trackSid, privacy: .public) from \(identityStr, privacy: .public): \(description, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "Failed to subscribe to remote track", @@ -918,7 +1033,6 @@ public final class CallViewModel: CallViewModelProtocol { let sid = publication.sid Task { @MainActor [weak viewModel] in guard let viewModel else { return } - logger.info("[RTC]Published local \(kind, privacy: .public) track sid=\(sid, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", summary: "Published local \(kind) track", @@ -935,7 +1049,6 @@ public final class CallViewModel: CallViewModelProtocol { let sid = publication.sid Task { @MainActor [weak viewModel] in guard let viewModel else { return } - logger.info("[RTC]Remote published \(kind, privacy: .public) track from identity=\(identityStr, privacy: .public) trackSid=\(sid, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", summary: "Remote published \(kind) track", @@ -963,7 +1076,6 @@ public final class CallViewModel: CallViewModelProtocol { case .ok, .new, .key_ratcheted: return case .missing_key: - logger.warning("[RTC]E2EE state=missing_key on \(trackKind, privacy: .public) sid=\(trackSid, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .warning, source: "CallViewModel", summary: "E2EE missing key for \(trackKind) track", @@ -971,7 +1083,6 @@ public final class CallViewModel: CallViewModelProtocol { roomId: viewModel.roomID ) case .encryption_failed, .decryption_failed, .internal_error: - logger.error("[RTC]E2EE state=\(stateLabel, privacy: .public) on \(trackKind, privacy: .public) sid=\(trackSid, privacy: .public)") viewModel.activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "E2EE failure on \(trackKind) track", diff --git a/RelayKit/Call/CallWidgetBridge.swift b/RelayKit/Call/CallWidgetBridge.swift index 382fe41..d4fe3a7 100644 --- a/RelayKit/Call/CallWidgetBridge.swift +++ b/RelayKit/Call/CallWidgetBridge.swift @@ -24,8 +24,6 @@ import MatrixRustSDK import os import RelayInterface -private let logger = Logger(subsystem: "RelayKit", category: "CallWidgetBridge") - /// Headless widget-driver bridge for MatrixRTC E2EE. /// /// Relay embeds LiveKit natively for media but needs the Matrix Widget Driver @@ -84,6 +82,11 @@ public final class CallWidgetBridge: @unchecked Sendable { private let roomId: String /// Activity log for surfacing widget bridge events in the Activity Log window. weak var activityLog: ActivityLog? + /// Fires whenever an `org.matrix.msc3401.call.member` state event is + /// observed via the widget driver — used by ``CallViewModel`` to retry + /// E2EE key distribution after a peer's membership lands in room + /// state. Closing on `[weak self]` is the caller's responsibility. + var onCallMemberStateChanged: (() -> Void)? /// Per-call MatrixRTC membership UUID. Must match the `membershipID` /// field in the `org.matrix.msc3401.call.member` state event and the /// `member.id` field in outbound `io.element.call.encryption_keys` @@ -191,8 +194,15 @@ public final class CallWidgetBridge: @unchecked Sendable { let capabilitiesProvider = self.capabilitiesProvider driverTask = Task { [weak self] in await driver.run(room: room, capabilitiesProvider: capabilitiesProvider) - logger.info("[RTC]WidgetDriver.run returned; driver exited") - self?.resolveReady() + guard let self else { return } + await MainActor.run { + self.activityLog?.log( + category: .call, severity: .debug, source: "CallWidgetBridge", + summary: "WidgetDriver.run returned (driver exited)", + roomId: self.roomId + ) + } + self.resolveReady() } recvTask = Task { [weak self] in @@ -202,20 +212,37 @@ public final class CallWidgetBridge: @unchecked Sendable { // Kick the state machine off the "Unset" state. Fire-and-forget — // the response just echoes back through recvLoop. Task { [weak self] in + guard let self else { return } + let widgetId = self.widgetId do { - try await self?.sendRequest(action: "content_loaded", data: [:]) - logger.info("[RTC]Widget content_loaded acknowledged by driver") + try await self.sendRequest(action: "content_loaded", data: [:]) + await MainActor.run { + self.activityLog?.log( + category: .call, severity: .debug, source: "CallWidgetBridge", + summary: "Widget content_loaded acknowledged by driver", + detail: "widgetId: \(widgetId)", + roomId: self.roomId + ) + } } catch { - logger.warning("[RTC]content_loaded failed: \(error.localizedDescription, privacy: .private)") + let description = error.localizedDescription + await MainActor.run { + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "content_loaded failed", + detail: "widgetId: \(widgetId). Error: \(description)", + roomId: self.roomId + ) + } } } - logger.info("[RTC]CallWidgetBridge started (widgetId=\(self.widgetId, privacy: .public))") Task { @MainActor [weak self] in guard let self else { return } self.activityLog?.log( category: .call, severity: .debug, source: "CallWidgetBridge", summary: "Widget bridge started", + detail: "widgetId: \(self.widgetId)", roomId: self.roomId ) } @@ -239,7 +266,6 @@ public final class CallWidgetBridge: @unchecked Sendable { } resolveReady() - logger.info("[RTC]CallWidgetBridge shut down") Task { @MainActor [weak self] in guard let self else { return } self.activityLog?.log( @@ -353,13 +379,12 @@ public final class CallWidgetBridge: @unchecked Sendable { let fp = SHA256.hash(data: key).prefix(8).map { String(format: "%02x", $0) }.joined() _ = try await sendRequest(action: "send_to_device", data: data) - logger.info("[RTC]Sent encryption key (index \(keyIndex)) to \(toMembers.count) user(s) member.id=\(self.membershipId, privacy: .public) sha256[0..8]=\(fp, privacy: .public)") Task { @MainActor [weak self] in guard let self else { return } self.activityLog?.log( category: .call, severity: .debug, source: "CallWidgetBridge", summary: "Sent E2EE key to \(toMembers.count) user(s)", - detail: "Key index: \(keyIndex)", + detail: "Key index: \(keyIndex), member.id: \(self.membershipId), sha256[0..8]: \(fp).", roomId: self.roomId ) } @@ -382,7 +407,15 @@ public final class CallWidgetBridge: @unchecked Sendable { ] _ = try await sendRequest(action: "send_event", data: data) - logger.info("[RTC]Sent call member state event (state_key=\(stateKey, privacy: .public))") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .debug, source: "CallWidgetBridge", + summary: "Sent call member state event via widget", + detail: "state_key: \(stateKey)", + roomId: self.roomId + ) + } } // MARK: - Request / Response plumbing @@ -425,20 +458,34 @@ public final class CallWidgetBridge: @unchecked Sendable { private func recvLoop(handle: WidgetDriverHandle) async { while !Task.isCancelled { guard let raw = await handle.recv() else { - logger.info("[RTC]WidgetDriverHandle.recv returned nil; loop exiting") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .info, source: "CallWidgetBridge", + summary: "Widget driver recv loop exited", + detail: "WidgetDriverHandle.recv returned nil.", + roomId: self.roomId + ) + } break } - // SECURITY: never log the raw widget JSON. Outbound and inbound + // SECURITY: never surface raw widget JSON. Outbound and inbound // `send_to_device` payloads of type `io.element.call.encryption_keys` // carry raw AES keys in the `keys.key` field — those would land - // unredacted in the system log. Action / type only; full bodies - // are .private so they're stripped from non-debug Console output. - logger.debug("[RTC]widget recv (\(raw.count) bytes)") + // unredacted in any log sink. We track action / type only. guard let data = raw.data(using: .utf8), let msg = try? JSONSerialization.jsonObject(with: data) as? [String: Any] else { - logger.warning("[RTC]Non-JSON message from widget driver: \(raw, privacy: .private)") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "Non-JSON message from widget driver", + detail: "Length: \(raw.count) bytes.", + roomId: self.roomId + ) + } continue } @@ -462,7 +509,15 @@ public final class CallWidgetBridge: @unchecked Sendable { // Incoming SDK-initiated requests (toWidget). guard let action = msg["action"] as? String else { - logger.warning("[RTC]Widget message missing action: \(raw, privacy: .private)") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "Widget message missing action", + detail: "Message has neither `response` nor `action` keys; ignoring.", + roomId: self.roomId + ) + } continue } let requestId = (msg["requestId"] as? String) ?? "" @@ -503,9 +558,28 @@ public final class CallWidgetBridge: @unchecked Sendable { case "send_event", "update_state": // Incoming Matrix events observed by the widget driver. // MatrixRTC member state is handled by Element Call peers - // directly; we just need to ack these. Log and move on. + // directly; we just need to ack these. Log and — for + // `org.matrix.msc3401.call.member` — also poke the view model + // to retry E2EE key distribution, since the SDK's + // `RoomInfo.activeRoomCallParticipants` accessor lags behind + // LiveKit's `participantDidConnect` signal: a peer can join + // the SFU before their membership state event arrives, which + // leaves our `redistributeKey(to:)` path unable to find them. if let type = data["type"] as? String { - logger.info("[RTC]widget incoming \(action, privacy: .public) type=\(type, privacy: .public)") + let logAction = action + let logType = type + let isMemberEvent = (type == CallEncryptionService.callMemberEventType) + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .debug, source: "CallWidgetBridge", + summary: "Widget incoming \(logAction) (\(logType))", + roomId: self.roomId + ) + if isMemberEvent { + self.onCallMemberStateChanged?() + } + } } responseBody = [:] @@ -513,7 +587,16 @@ public final class CallWidgetBridge: @unchecked Sendable { responseBody = [:] default: - logger.info("[RTC]widget unhandled action=\(action, privacy: .public); acking with {}") + let logAction = action + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .debug, source: "CallWidgetBridge", + summary: "Widget unhandled action: \(logAction)", + detail: "Acking with empty response.", + roomId: self.roomId + ) + } responseBody = [:] } @@ -540,12 +623,28 @@ public final class CallWidgetBridge: @unchecked Sendable { if !requestId.isEmpty { reply["requestId"] = requestId } guard let json = try? Self.encode(reply) else { - logger.error("[RTC]Failed to encode widget reply") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .error, source: "CallWidgetBridge", + summary: "Failed to encode widget reply", + roomId: self.roomId + ) + } return } let ok = await handle.send(msg: json) if !ok { - logger.warning("[RTC]handle.send returned false replying to action=\(original["action"] as? String ?? "?", privacy: .public)") + let originalAction = original["action"] as? String ?? "?" + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "Widget handle.send returned false", + detail: "Replying to action=\(originalAction).", + roomId: self.roomId + ) + } } } @@ -559,7 +658,15 @@ public final class CallWidgetBridge: @unchecked Sendable { } let content = (data["content"] as? [String: Any]) ?? [:] guard let keyProvider else { - logger.warning("[RTC]No keyProvider; dropping inbound key from \(sender, privacy: .private)") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "Dropping inbound encryption key — no keyProvider", + detail: "Sender: \(sender). The local frame cryptor isn't wired up yet.", + roomId: self.roomId + ) + } return } @@ -574,7 +681,15 @@ public final class CallWidgetBridge: @unchecked Sendable { } else if let single = content["keys"] as? [String: Any] { keyEntries = [single] } else { - logger.warning("[RTC]encryption_keys to-device missing keys from \(sender, privacy: .private)") + Task { @MainActor [weak self] in + guard let self else { return } + self.activityLog?.log( + category: .call, severity: .warning, source: "CallWidgetBridge", + summary: "encryption_keys to-device missing `keys` payload", + detail: "Sender: \(sender).", + roomId: self.roomId + ) + } return } @@ -622,27 +737,26 @@ public final class CallWidgetBridge: @unchecked Sendable { let keyData = Data(base64Encoded: base64Key) else { continue } + var setFailures: [String] = [] for participantIdentity in participantIdentities { - CallEncryptionService.setRawKey( + if let reason = CallEncryptionService.setRawKey( keyData, on: keyProvider, participantId: participantIdentity, index: Int32(index) - ) + ) { + setFailures.append("\(participantIdentity): \(reason)") + } } let identitiesJoined = participantIdentities.joined(separator: ", ") - // Log with `.public` so we can correlate the key routing - // identities (what we register the frame-decryption key under) - // with the actual LiveKit participant identity (logged on - // connect) — if NONE of these matches byte-for-byte, LiveKit - // will silently fail to decrypt this peer's frames. - logger.info("[RTC]Applied inbound key -> routed to LiveKit participantId=[\(identitiesJoined, privacy: .public)] sender=\(sender, privacy: .public) device=\(deviceId, privacy: .public) member=\(memberId, privacy: .public) index=\(index)") + let fp = SHA256.hash(data: keyData).prefix(8).map { String(format: "%02x", $0) }.joined() + let failureNote = setFailures.isEmpty ? "" : " setRawKey failures: \(setFailures.joined(separator: "; "))." Task { @MainActor [weak self] in guard let self else { return } self.activityLog?.log( - category: .call, severity: .debug, source: "CallWidgetBridge", + category: .call, severity: setFailures.isEmpty ? .debug : .warning, source: "CallWidgetBridge", summary: "Received E2EE key from \(sender)", - detail: "Participants: [\(identitiesJoined)], key index: \(index)", + detail: "Routed to LiveKit participantIds: [\(identitiesJoined)]. Sender: \(sender), device: \(deviceId), member: \(memberId), index: \(index), sha256[0..8]: \(fp).\(failureNote)", roomId: self.roomId ) } diff --git a/RelayKit/Call/LiveKitCredentialService.swift b/RelayKit/Call/LiveKitCredentialService.swift index 021333f..a656619 100644 --- a/RelayKit/Call/LiveKitCredentialService.swift +++ b/RelayKit/Call/LiveKitCredentialService.swift @@ -13,11 +13,8 @@ // limitations under the License. import Foundation -import os import RelayInterface -private let logger = Logger(subsystem: "RelayKit", category: "LiveKitCredentialService") - /// Fetches LiveKit credentials (WebSocket URL + JWT) for a Matrix room by /// implementing the MatrixRTC credential exchange flow (MSC4143). /// @@ -54,22 +51,26 @@ struct LiveKitCredentialService { /// Returns `(livekitWebSocketURL, livekitJWT, sfuServiceURL)` for the given Matrix room. /// The `sfuServiceURL` is the SFU service URL from discovery, used in call member events. func credentials(for roomID: String) async throws -> (url: String, token: String, sfuServiceURL: String) { - logger.info("[RTC]Fetching LiveKit credentials for room \(roomID, privacy: .private)") activityLog?.log( category: .call, severity: .info, source: "LiveKitCredentialService", summary: "Fetching call credentials", + detail: "Room: \(roomID)", roomId: roomID ) do { let sfuURL = try await discoverSFUURL(roomID: roomID) - logger.info("[RTC]SFU URL discovered: \(sfuURL)") activityLog?.log( category: .call, severity: .debug, source: "LiveKitCredentialService", summary: "SFU URL discovered", + detail: "SFU: \(sfuURL)", roomId: roomID ) let openIDToken = try await requestOpenIDToken() - logger.debug("[RTC]OpenID token obtained") + activityLog?.log( + category: .call, severity: .debug, source: "LiveKitCredentialService", + summary: "OpenID token obtained", + roomId: roomID + ) let (url, jwt) = try await fetchLiveKitToken(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) activityLog?.log( category: .call, severity: .info, source: "LiveKitCredentialService", @@ -211,7 +212,11 @@ struct LiveKitCredentialService { guard let oldest = candidates.min(by: { $0.createdTs < $1.createdTs }) else { throw LiveKitCredentialError.sfuURLNotFound } - logger.info("[RTC]Joining existing call SFU per oldest_membership: \(oldest.sfuURL, privacy: .public)") + activityLog?.log( + category: .call, severity: .debug, source: "LiveKitCredentialService", + summary: "Joining existing call SFU (oldest_membership)", + detail: "SFU: \(oldest.sfuURL). Picked from \(candidates.count) active member(s)." + ) return oldest.sfuURL } @@ -265,9 +270,9 @@ struct LiveKitCredentialService { return try await fetchLiveKitTokenV2(sfuURL: sfuURL, roomID: roomID, openIDToken: openIDToken) } - /// Logs a `/sfu/get` failure to os_log and the activity log so that the - /// fall-forward to v2 is at least visible after the fact. Format-aware: - /// a `LiveKitCredentialError.tokenExchangeRejected` carries structured + /// Surfaces a `/sfu/get` failure so the fall-forward to v2 is visible + /// after the fact. Format-aware: a + /// `LiveKitCredentialError.tokenExchangeRejected` carries structured /// detail; anything else falls through to its `localizedDescription`. private func logLegacyFailure(_ error: Error, sfuURL: String) { let detail: String @@ -278,7 +283,6 @@ struct LiveKitCredentialService { } else { detail = error.localizedDescription } - logger.warning("[RTC]/sfu/get failed, trying /get_token — \(detail, privacy: .public)") activityLog?.log( category: .call, severity: .warning, source: "LiveKitCredentialService", summary: "Legacy /sfu/get rejected; trying v2", @@ -326,7 +330,11 @@ struct LiveKitCredentialService { ) } let decoded = try JSONDecoder().decode(LiveKitTokenResponse.self, from: data) - logger.info("[RTC]LiveKit credentials obtained via /get_token") + activityLog?.log( + category: .call, severity: .debug, source: "LiveKitCredentialService", + summary: "Credentials obtained via v2 /get_token", + detail: "LiveKit URL: \(decoded.url)" + ) return (decoded.url, decoded.jwt) } @@ -360,7 +368,11 @@ struct LiveKitCredentialService { ) } let decoded = try JSONDecoder().decode(LiveKitTokenResponse.self, from: data) - logger.info("[RTC]LiveKit credentials obtained via legacy /sfu/get") + activityLog?.log( + category: .call, severity: .debug, source: "LiveKitCredentialService", + summary: "Credentials obtained via legacy /sfu/get", + detail: "LiveKit URL: \(decoded.url)" + ) return (decoded.url, decoded.jwt) } From 71d25eb0ccfac45452b186460fce926fa3ab648c Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Mon, 15 Jun 2026 09:28:06 -0400 Subject: [PATCH 12/16] Surface the active connecting step on slow networks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On poor connections the call window's "Joining call…" spinner can sit unchanged for 20+ seconds while the homeserver round-trips a `m.call.member` state event or LiveKit attaches media. The user has no way to tell whether something is wrong or whether the network is just slow. Add a `connectingPhase: String?` to `CallViewModelProtocol` and update it inside `CallViewModel.connect` as the connect path moves through LiveKit attach, encryption prep, membership publish, key distribution, and media start. The view layer reveals the phase label after a 300ms delay (and only if the same phase is still current), so on a fast network nothing flashes on screen — but a stalled step on bad wifi gets a concrete description after a third of a second. Co-Authored-By: Claude Opus 4.7 --- .../Protocols/CallViewModelProtocol.swift | 6 +++ Relay/ViewModels/PreviewCallViewModel.swift | 1 + Relay/Views/Call/CallView.swift | 38 +++++++++++++++++++ RelayKit/Call/CallViewModel.swift | 13 +++++++ 4 files changed, 58 insertions(+) diff --git a/Packages/RelayInterface/Sources/RelayInterface/Protocols/CallViewModelProtocol.swift b/Packages/RelayInterface/Sources/RelayInterface/Protocols/CallViewModelProtocol.swift index 20c72b3..0b018e3 100644 --- a/Packages/RelayInterface/Sources/RelayInterface/Protocols/CallViewModelProtocol.swift +++ b/Packages/RelayInterface/Sources/RelayInterface/Protocols/CallViewModelProtocol.swift @@ -83,6 +83,12 @@ public protocol CallViewModelProtocol: AnyObject, Observable { /// The identity of the local participant, set after connection. var localParticipantID: String? { get } + /// Human-readable description of the current connection step. Only + /// non-nil while `state == .connecting`. The UI is expected to hide + /// transient phases (steps shorter than ~300ms) so the indicator + /// only surfaces during genuinely slow joins on poor networks. + var connectingPhase: String? { get } + /// A monotonically increasing counter that is bumped whenever video tracks change /// (publish, unpublish, camera toggle, etc.). SwiftUI views should read this value /// to ensure ``NSViewRepresentable`` bridges receive `updateNSView` calls when the diff --git a/Relay/ViewModels/PreviewCallViewModel.swift b/Relay/ViewModels/PreviewCallViewModel.swift index 5017083..2bbeb04 100644 --- a/Relay/ViewModels/PreviewCallViewModel.swift +++ b/Relay/ViewModels/PreviewCallViewModel.swift @@ -29,6 +29,7 @@ final class PreviewCallViewModel: CallViewModelProtocol { var isLocalMicrophoneEnabled: Bool = false var localParticipantID: String? = nil var videoTrackRevision: UInt = 0 + var connectingPhase: String? = nil func connect(url: String, token: String, sfuServiceURL: String) async throws { state = .connecting diff --git a/Relay/Views/Call/CallView.swift b/Relay/Views/Call/CallView.swift index f9cddc6..3e99184 100644 --- a/Relay/Views/Call/CallView.swift +++ b/Relay/Views/Call/CallView.swift @@ -35,6 +35,11 @@ struct CallView: View { @State private var serverURL: String = "" @State private var accessToken: String = "" @State private var isJoining: Bool = false + /// Connecting-phase label that has stuck around long enough to be + /// worth showing the user. Updated from + /// ``CallViewModelProtocol/connectingPhase`` with a ~300 ms reveal + /// delay so brief phases on a fast network never flash on screen. + @State private var visibleConnectingPhase: String? // NOTE: The earlier implementation auto-hid the control bar after a // timeout using a `controlsVisible` @State + `.animation(.easeInOut(..), // value: controlsVisible)` on the control bar's opacity, plus a @@ -379,6 +384,14 @@ struct CallView: View { Text("Joining call…") .font(.headline) .foregroundStyle(.white.opacity(0.7)) + // Step indicator — only appears once a phase has been the + // current phase for ~300 ms. Hidden on fast networks. + if let visibleConnectingPhase { + Text(visibleConnectingPhase) + .font(.subheadline) + .foregroundStyle(.white.opacity(0.55)) + .transition(.opacity) + } Button("Cancel") { Task { await viewModel.disconnect() } } @@ -386,6 +399,31 @@ struct CallView: View { .foregroundStyle(.white) Spacer() } + .animation(.easeInOut(duration: 0.2), value: visibleConnectingPhase) + .onChange(of: viewModel.connectingPhase, initial: true) { _, newPhase in + scheduleConnectingPhaseReveal(newPhase) + } + } + + /// Reveals `phase` as the visible connecting-phase label after a + /// short delay (~300 ms). If `connectingPhase` changes again before + /// the delay fires, the previous phase is never shown — so brief + /// steps on a fast network don't flash on screen. + private func scheduleConnectingPhaseReveal(_ phase: String?) { + guard let phase else { + visibleConnectingPhase = nil + return + } + // Already showing it — nothing to do. + if visibleConnectingPhase == phase { return } + Task { @MainActor in + try? await Task.sleep(for: .milliseconds(300)) + // Confirm the view model is still on the same phase before + // committing it to the visible state. + if viewModel.connectingPhase == phase { + visibleConnectingPhase = phase + } + } } // MARK: - Failed Overlay diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index 2bbf038..cd4e5ee 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -34,6 +34,11 @@ public final class CallViewModel: CallViewModelProtocol { public private(set) var isLocalCameraEnabled: Bool = false public private(set) var isLocalMicrophoneEnabled: Bool = false public private(set) var localParticipantID: String? + /// Human-readable label for the current step inside `.connecting`. + /// Updated as the connect path moves through credential exchange, + /// LiveKit attach, membership publish, key distribution, and media + /// start. Cleared when the call reaches `.connected` or `.failed`. + public private(set) var connectingPhase: String? /// Incremented whenever video tracks change, triggering SwiftUI to /// re-evaluate `videoContent(for:)` and pick up new or removed tracks. public private(set) var videoTrackRevision: UInt = 0 @@ -204,6 +209,7 @@ public final class CallViewModel: CallViewModelProtocol { public func connect(url: String, token: String, sfuServiceURL: String = "") async throws { state = .connecting + connectingPhase = "Joining call server…" activityLog?.log( category: .call, severity: .info, source: "CallViewModel", summary: "Connecting to call", @@ -264,6 +270,7 @@ public final class CallViewModel: CallViewModelProtocol { connectOptions: connectOpts, roomOptions: roomOpts ) + connectingPhase = "Preparing encryption…" localParticipantID = room.localParticipant.identity?.stringValue activityLog?.log( category: .call, severity: .debug, source: "CallViewModel", @@ -377,6 +384,7 @@ public final class CallViewModel: CallViewModelProtocol { // `MatrixService.callPowerLevels`); we no longer try to // mutate them at join time, matching Element Call. let membershipId = bridge?.membershipId + connectingPhase = "Announcing presence to the room…" do { try await encryptionService.sendCallMemberEvent( sfuServiceURL: sfuServiceURL, @@ -407,6 +415,7 @@ public final class CallViewModel: CallViewModelProtocol { // state events already present on the room. The SDK // then Olm-encrypts the payload per-device. if self.isE2eeEnabled, let bridge, let localKey { + connectingPhase = "Distributing encryption keys…" let targets = await encryptionService.fetchCallTargets() self.callMembers = targets let targetList = targets.keys.sorted().joined(separator: ", ") @@ -437,12 +446,14 @@ public final class CallViewModel: CallViewModelProtocol { // Key is now installed locally and (best-effort) distributed to // any existing call participants. Safe to publish media. + connectingPhase = "Starting camera & microphone…" try await room.localParticipant.setMicrophone(enabled: true) try await room.localParticipant.setCamera(enabled: true) isLocalCameraEnabled = true isLocalMicrophoneEnabled = true state = .connected + connectingPhase = nil videoTrackRevision += 1 activityLog?.log( category: .call, severity: .info, source: "CallViewModel", @@ -462,6 +473,7 @@ public final class CallViewModel: CallViewModelProtocol { } state = .failed(message) + connectingPhase = nil activityLog?.log( category: .call, severity: .error, source: "CallViewModel", summary: "Call connection failed", @@ -481,6 +493,7 @@ public final class CallViewModel: CallViewModelProtocol { // Update UI state immediately — SwiftUI re-renders to the // disconnected state while the awaited cleanup runs. state = .disconnected + connectingPhase = nil participants = [] isLocalCameraEnabled = false isLocalMicrophoneEnabled = false From 46658e72cec82f6e976499e2440cd37947686fda Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Mon, 15 Jun 2026 09:36:50 -0400 Subject: [PATCH 13/16] Convert MatrixRTC troubleshooting doc to a macOS Help Book The troubleshooting guide is now shipped as a Help Book bundle inside the app rather than as a markdown file in the repo. Replaces docs/troubleshooting-calls.md with Relay/Resources/Relay.help, registers the bundle via the new CFBundleHelpBookFolder and CFBundleHelpBookName Info.plist keys, and adds a light system-font stylesheet that matches macOS Help Viewer conventions in both light and dark mode. Also drops the in-repo docs/internal/rtc-element-call-diff.md engineering note (maintainer prefers agent-maintained notes stay out of the tree). Note: the Relay.help folder needs to be added to the Relay target in Xcode as a Folder Reference inside Copy Bundle Resources so the build copies it verbatim. The project.pbxproj edit that wires this up is not included here. Co-Authored-By: Claude Opus 4.7 --- Relay/Info.plist | 4 + .../Resources/Relay.help/Contents/Info.plist | 30 +++ .../Contents/Resources/Shared/relay-help.css | 160 +++++++++++++++ .../Contents/Resources/en.lproj/Relay.html | 31 +++ .../en.lproj/pages/troubleshooting-calls.html | 159 +++++++++++++++ docs/internal/rtc-element-call-diff.md | 189 ------------------ docs/troubleshooting-calls.md | 108 ---------- 7 files changed, 384 insertions(+), 297 deletions(-) create mode 100644 Relay/Resources/Relay.help/Contents/Info.plist create mode 100644 Relay/Resources/Relay.help/Contents/Resources/Shared/relay-help.css create mode 100644 Relay/Resources/Relay.help/Contents/Resources/en.lproj/Relay.html create mode 100644 Relay/Resources/Relay.help/Contents/Resources/en.lproj/pages/troubleshooting-calls.html delete mode 100644 docs/internal/rtc-element-call-diff.md delete mode 100644 docs/troubleshooting-calls.md diff --git a/Relay/Info.plist b/Relay/Info.plist index aeadf07..42b56ab 100644 --- a/Relay/Info.plist +++ b/Relay/Info.plist @@ -4,6 +4,10 @@ ITSAppUsesNonExemptEncryption + CFBundleHelpBookFolder + Relay.help + CFBundleHelpBookName + io.github.subpop.relay.help UTExportedTypeDeclarations diff --git a/Relay/Resources/Relay.help/Contents/Info.plist b/Relay/Resources/Relay.help/Contents/Info.plist new file mode 100644 index 0000000..231c8bb --- /dev/null +++ b/Relay/Resources/Relay.help/Contents/Info.plist @@ -0,0 +1,30 @@ + + + + + CFBundleDevelopmentRegion + en + CFBundleIdentifier + io.github.subpop.relay.help + CFBundleInfoDictionaryVersion + 6.0 + CFBundleName + Relay Help + CFBundlePackageType + BNDL + CFBundleShortVersionString + 1.0 + CFBundleSignature + hbwr + HPDBookAccessPath + Relay.html + HPDBookIconPath + Shared/icon.png + HPDBookIndexPath + Relay.helpindex + HPDBookTitle + Relay Help + HPDBookType + 3 + + diff --git a/Relay/Resources/Relay.help/Contents/Resources/Shared/relay-help.css b/Relay/Resources/Relay.help/Contents/Resources/Shared/relay-help.css new file mode 100644 index 0000000..44de1b6 --- /dev/null +++ b/Relay/Resources/Relay.help/Contents/Resources/Shared/relay-help.css @@ -0,0 +1,160 @@ +/* Apple Help Book styling for Relay. Apple Help Viewer renders + the page inside a fixed-width WebKit pane (~520pt); these styles + match the default macOS help look-and-feel. */ + +:root { + color-scheme: light dark; + --fg: #1d1d1f; + --fg-muted: #6e6e73; + --bg: #ffffff; + --rule: #d2d2d7; + --accent: #0066cc; + --code-bg: #f5f5f7; +} + +@media (prefers-color-scheme: dark) { + :root { + --fg: #f5f5f7; + --fg-muted: #a1a1a6; + --bg: #1d1d1f; + --rule: #3a3a3c; + --accent: #2997ff; + --code-bg: #2c2c2e; + } +} + +html, body { + margin: 0; + padding: 0; + background: var(--bg); + color: var(--fg); + font: 13px -apple-system, BlinkMacSystemFont, "Helvetica Neue", Helvetica, sans-serif; + line-height: 1.5; +} + +body { + padding: 24px 32px 48px 32px; + max-width: 640px; +} + +h1, h2, h3 { + color: var(--fg); + font-weight: 600; + line-height: 1.25; + margin-top: 1.6em; +} + +h1 { + font-size: 22px; + margin-top: 0; +} + +h2 { + font-size: 16px; + border-bottom: 1px solid var(--rule); + padding-bottom: 4px; +} + +h3 { + font-size: 14px; +} + +p, ul, ol, table { + margin: 0.6em 0; +} + +ul, ol { + padding-left: 24px; +} + +li { + margin: 0.2em 0; +} + +code, kbd { + font-family: ui-monospace, Menlo, monospace; + font-size: 0.95em; + background: var(--code-bg); + padding: 1px 5px; + border-radius: 4px; +} + +pre { + background: var(--code-bg); + border: 1px solid var(--rule); + border-radius: 6px; + padding: 10px 12px; + overflow-x: auto; + font-family: ui-monospace, Menlo, monospace; + font-size: 0.95em; + line-height: 1.45; +} + +pre code { + background: none; + padding: 0; +} + +a { + color: var(--accent); + text-decoration: none; +} + +a:hover { + text-decoration: underline; +} + +table { + border-collapse: collapse; + width: 100%; + margin: 0.8em 0; +} + +th, td { + border: 1px solid var(--rule); + padding: 6px 10px; + text-align: left; + vertical-align: top; +} + +th { + background: var(--code-bg); + font-weight: 600; +} + +.callout { + background: var(--code-bg); + border-left: 3px solid var(--accent); + padding: 8px 12px; + margin: 1em 0; + border-radius: 4px; +} + +.muted { + color: var(--fg-muted); +} + +nav.topic-list { + list-style: none; + padding: 0; +} + +nav.topic-list li { + border-bottom: 1px solid var(--rule); + padding: 10px 0; +} + +nav.topic-list li:last-child { + border-bottom: none; +} + +nav.topic-list a { + display: block; + font-weight: 500; +} + +nav.topic-list .subtitle { + color: var(--fg-muted); + font-weight: 400; + margin-top: 2px; +} diff --git a/Relay/Resources/Relay.help/Contents/Resources/en.lproj/Relay.html b/Relay/Resources/Relay.help/Contents/Resources/en.lproj/Relay.html new file mode 100644 index 0000000..c23803a --- /dev/null +++ b/Relay/Resources/Relay.help/Contents/Resources/en.lproj/Relay.html @@ -0,0 +1,31 @@ + + + + + + + + + + + Relay Help + + +

Relay Help

+ +

Relay is a native macOS client for the Matrix chat network. This help book collects troubleshooting guides and notes that go beyond what fits inline in the app.

+ +

Topics

+ + +

Getting more help

+

If a topic in this help book doesn't cover your problem, file an issue at github.com/subpop/Relay/issues or join the conversation in #relayapp:matrix.org.

+ + diff --git a/Relay/Resources/Relay.help/Contents/Resources/en.lproj/pages/troubleshooting-calls.html b/Relay/Resources/Relay.help/Contents/Resources/en.lproj/pages/troubleshooting-calls.html new file mode 100644 index 0000000..ed152ea --- /dev/null +++ b/Relay/Resources/Relay.help/Contents/Resources/en.lproj/pages/troubleshooting-calls.html @@ -0,0 +1,159 @@ + + + + + + + + + + Troubleshooting MatrixRTC calls + + +

Relay Help › Troubleshooting calls

+ +

Troubleshooting MatrixRTC calls

+ +

If your calls fail to connect, or connect but show no audio or video, this page walks you through capturing the data Relay's maintainers need to diagnose the issue.

+ +

Quick capture (3 minutes)

+ +
    +
  1. Open Window › Activity Log (or press ⌥⌘A).
  2. +
  3. In the search bar, click the filter chip and limit to the Call category.
  4. +
  5. Leave the Activity Log window open and reproduce the failing call.
  6. +
  7. Once the call has failed (or you have media issues), press ⌘S in the Activity Log window to export the filtered events.
  8. +
  9. Save the file (default name relay-activity-log.json) and attach it to your bug report.
  10. +
+ +

That export is everything the developers need to triage a calling problem.

+ +

What's in the export

+ +

The file is pretty-printed JSON: an array of events, each with timestamp (ISO 8601), category (will be "call" after filtering), severity (debug / info / warning / error), source (which subsystem logged it), summary, optional detail, optional roomId, and a metadata key-value map.

+ +

Sample event:

+ +
{
+  "timestamp": "2026-06-12T14:30:05.123Z",
+  "category": "call",
+  "severity": "info",
+  "source": "LiveKitCredentialService",
+  "summary": "SFU URL discovered",
+  "roomId": "!abc:example.org",
+  "metadata": {}
+}
+ +

What's safe to share

+ +

The export contains:

+
    +
  • Your Matrix room ID (!…:server) and device IDs
  • +
  • Per-call membership UUIDs and key indices
  • +
  • Your homeserver hostname
  • +
  • SHA-256 fingerprints (first 8 hex chars) of encryption keys, never the keys themselves
  • +
+ +

It does not contain:

+
    +
  • Raw E2EE keys
  • +
  • OpenID tokens or LiveKit JWTs
  • +
  • Message contents, names, or avatars
  • +
  • The OpenID access token used for SFU auth
  • +
+ +
+ If you don't want your room IDs or device IDs in a public bug report, ask the maintainers for a DM in #relayapp:matrix.org and share the file there. +
+ +

Reading the export yourself

+ +

A few specific log lines act as signposts. If your file contains any of these, you can pre-diagnose your own issue.

+ +

Connection-time signals

+ + + + + + + + + + + + + + + + + + + + + + + +
Look forWhat it means
Fetching call credentialsThe call attempt started; subsequent events should show whether discovery and token exchange succeeded.
SFU URL discoveredYour homeserver advertises a LiveKit SFU. Good.
Failed to fetch call credentials with "This homeserver has no LiveKit call server configured…"Your homeserver doesn't expose org.matrix.msc4143.rtc_foci in .well-known/matrix/client, and the unstable transports endpoint isn't supported. Ask your homeserver admin to configure MatrixRTC.
Call credentials obtainedToken exchange succeeded. If the call still fails after this, the problem is downstream of credential acquisition.
+ +

Connected-but-no-media signals

+ +

If the call reaches the Connected to call event but you can't see or hear anyone, the failure is in the encryption-key exchange or frame routing.

+ + + + + + + + + + + + + + + + + + + +
Look forWhat it means
Distributed E2EE key to N user(s) followed by Received E2EE key from … for each peerKey exchange is happening. If you still have no media, the problem is in the frame-decoder routing — note the participant identity in the event's detail, this is the identity LiveKit assigned to the peer.
No Received E2EE key from … events at allPeers aren't sending you their keys, or the widget bridge isn't running. Check whether the room is configured as encrypted (E2EE is enabled only for encrypted Matrix rooms).
Widget bridge started but no later eventsThe widget driver is waiting for capability negotiation that never completes. Likely an SDK or homeserver-side issue.
+ +

Patterns worth flagging in a bug report

+ +

These specific event sequences point to a known class of failure:

+ +
    +
  1. No Call credentials obtained event after Fetching call credentials. Credential exchange is failing. Almost always a homeserver-side or SFU-side configuration problem; the maintainers will need to know which homeserver you're on.
  2. +
  3. Connected to call but no Distributed E2EE key event. The Matrix call-member state event went out, but no peers existed at the time you connected, or the local cache of call members is stale. If others were already in the call, this is a Relay bug worth reporting.
  4. +
  5. Received E2EE key from … events present, but you still see no media from those peers. Frame-cryptor routing is misaligned with the LiveKit participant identity. Please attach the export and note the LiveKit participant identity you see in those events' detail field.
  6. +
+ +

When the Activity Log isn't enough

+ +

For really hard cases (the SFU is rejecting our JWT with no useful error, or the LiveKit room itself never finishes initialising) the maintainers sometimes need a unified-log capture, which records the low-level RTC trace from inside the LiveKit SDK.

+ +

While the call is reproducing the issue, run in a terminal:

+ +
log stream --predicate 'subsystem == "RelayKit" AND category BEGINSWITH "Call"' \
+           --level info > relay-call-trace.log
+ +

Stop with ⌃C once the call has failed, then share relay-call-trace.log alongside the Activity Log JSON.

+ +

The unified-log capture contains more verbose internal trace including LiveKit SDK output. It's safe in the same way the Activity Log is (no key material, no tokens), but it does contain more verbose timing and routing data. Share it through the same channel you'd share the JSON.

+ +

Reporting

+ +

File an issue at github.com/subpop/Relay/issues or message #relayapp:matrix.org. Please include:

+ +
    +
  • The relay-activity-log.json export (filtered to the Call category)
  • +
  • Your homeserver hostname (e.g. matrix.example.org)
  • +
  • Whether other clients (Element X, Element Web) succeed at calling on the same account
  • +
  • A one-line description of what you saw: "fails to connect", "connects but no audio", "connects but no video", etc.
  • +
+ +

If you'd rather not put logs in a public issue, send them privately to maintainers in the Matrix room first.

+ + diff --git a/docs/internal/rtc-element-call-diff.md b/docs/internal/rtc-element-call-diff.md deleted file mode 100644 index d0675bc..0000000 --- a/docs/internal/rtc-element-call-diff.md +++ /dev/null @@ -1,189 +0,0 @@ -# MatrixRTC: Relay vs Element Call deviations - -Engineering reference for the MatrixRTC implementation. Maps every Relay call-path function to its Element Call / matrix-js-sdk counterpart and flags deviations that have either been confirmed against real-world traces or against the published MSCs. - -## Sources used - -- **MSC4143** ([toger5/matrixRTC](https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/matrixRTC/proposals/4143-matrix-rtc.md)) — the not-yet-deployed `m.rtc.slot` / `m.rtc.member` sticky-event protocol. Production uses the legacy `m.call.member` shape. -- **MSC4195** ([hughns/matrixrtc-livekit](https://github.com/hughns/matrix-spec-proposals/blob/hughns/matrixrtc-livekit/proposals/4195-matrixrtc-livekit.md)) — the LiveKit `/get_token` endpoint and pseudonymous-identity scheme. -- **Element Call**, `src/livekit/openIDSFU.ts` ([livekit branch](https://github.com/element-hq/element-call/blob/livekit/src/livekit/openIDSFU.ts)) — the production credential-exchange path. -- **matrix-js-sdk**, `src/matrixrtc/` — the membership / encryption manager / LiveKit transport types. -- **lk-jwt-service**, `requests.go` + `handler.go` + `helper.go` ([element-hq/lk-jwt-service](https://github.com/element-hq/lk-jwt-service)) — the reference SFU auth service; what production homeservers actually run. - -## File map - -| Relay file | Responsibility | Element Call / js-sdk counterpart | -| --- | --- | --- | -| `RelayKit/Call/LiveKitCredentialService.swift` | Discover SFU URL, request OpenID token, exchange for LiveKit JWT | `element-call/src/livekit/openIDSFU.ts` | -| `RelayKit/Call/CallEncryptionService.swift` | Send `m.call.member` state event, derive HKDF keys, parse other peers from room state | `matrix-js-sdk/src/matrixrtc/MembershipManager.ts` + `RTCEncryptionManager.ts` | -| `RelayKit/Call/CallWidgetBridge.swift` | Speak the Widget API directly to the SDK's `WidgetDriver` to deliver Olm-encrypted to-device key payloads | `matrix-js-sdk/src/matrixrtc/ToDeviceKeyTransport.ts` (with SDK's `WidgetDriver` underneath) | -| `RelayKit/Call/CallViewModel.swift` | Orchestrate connect/disconnect sequencing, key install ordering, heartbeat | `matrix-js-sdk/src/matrixrtc/MatrixRTCSession.ts` | -| `RelayKit/Call/LiveKitLogBridge.swift` | Bridge LiveKit SDK logs into OSLog | (none — Element Call uses pino) | - -## Per-function deviations - -### `LiveKitCredentialService.fetchLiveKitTokenV2` - -Lines 178–205 in `LiveKitCredentialService.swift`. Reference: `getLiveunitJWTWithDelayDelegation` in `openIDSFU.ts`. - -| Field | Relay sends | Reference sends | Confirmed required by `lk-jwt-service`? | -| --- | --- | --- | --- | -| `room_id` | ✓ | ✓ | yes (`SFURequest.Validate()` in `requests.go`) | -| `slot_id` | **missing** | `"m.call#ROOM"` | **yes** (returns 400 `M_BAD_JSON` if missing) | -| `openid_token` | ✓ | ✓ | n/a (validated server-side) | -| `member.id` | `":"` | `memberId` (a UUID generated at membership creation) | n/a (passed through to identity hash) | -| `member.claimed_user_id` | ✓ | ✓ | n/a | -| `member.claimed_device_id` | ✓ | ✓ | n/a | -| `delay_id` / `delay_timeout` / `delay_cs_api_url` | not sent | optionally sent if configured | optional | - -**Resolution (2026-06-14)**: Item 1 was implemented (we sent `slot_id: "m.call#ROOM"`) but caused a regression: matrix-js-sdk's `CallMembership.parseFromEvent` computes `rtcBackendIdentity` differently depending on membership kind. For `MembershipKind.Session` (the legacy `org.matrix.msc3401.call.member` event we publish) it returns the plain concatenation `${sender}:${device_id}`, **not** the v2 hash. So peers running Element Call / Element X / Element Web read our membership event, expect us on LiveKit as `@user:server:device`, but lk-jwt-service had placed us under the v2 hash → peers couldn't reconcile our LiveKit participant with our Matrix call membership → no video routing. - -**Current state**: `fetchLiveKitToken` tries legacy `/sfu/get` first and falls forward to v2 only if legacy fails. v2 only becomes viable once Relay also publishes MSC4143 sticky `m.rtc.member` events (a separate, larger change). `slot_id` and v2 identity remapping (Item 2) remain plumbed; they're inert on the legacy-first path but ready for the day we adopt sticky events. - -### `LiveKitCredentialService.fetchLiveKitTokenLegacy` - -Lines 207–230. Reference: `getLiveunitJWT` in `openIDSFU.ts`. - -| Field | Relay sends | Reference sends | -| --- | --- | --- | -| `room` | ✓ | ✓ | -| `openid_token` | ✓ | ✓ | -| `device_id` | ✓ | ✓ | -| delay parts | not sent | optional | - -Matches. ✓ - -### `LiveKitCredentialService.discoverSFUURL` - -Lines 93–141. - -| Source | Relay tries | Reference tries | -| --- | --- | --- | -| Transports endpoint | `/_matrix/client/unstable/org.matrix.msc4143/rtc/transports` | MSC4195 says stable `/v1/rtc/transports`. Most servers implement neither yet. | -| `.well-known` | `org.matrix.msc4143.rtc_foci` key | Same | -| Existing peers' `m.call.member` `foci_preferred[0]` | **not consulted** | matrix-js-sdk uses this as the third fallback | - -**Impact**: On a homeserver with no `.well-known` configured, if there's already an active call with a SFU negotiated, Relay throws `sfuURLNotFound` instead of using the active SFU. **Tracked as Item 3.** - -### `LiveKitCredentialService.fetchLiveKitToken` (fallback logic) - -Lines 166–176. - -Relay: try v2 inside `try?`, fall back to legacy on *any* error. -Reference: try v2, fall back to legacy on HTTP 404 specifically; bubble up other errors. - -**Impact**: A v2 endpoint returning 5xx, 401, or our 400-due-to-missing-`slot_id` all silently route to legacy. The user sees `tokenExchangeFailed` with no detail. **Tracked as Item 4.** - -### `CallEncryptionService.sendCallMemberEvent` - -Lines 80–135. Reference: `SessionMembershipData` in `matrix-js-sdk/src/matrixrtc/membershipData/session.ts`. - -| Field | Relay value | Reference shape | -| --- | --- | --- | -| `application` | `"m.call"` | string | -| `call_id` | `""` | string (may be empty) | -| `created_ts` | `Int64(Date.now * 1000)` | optional number | -| `device_id` | ✓ | string | -| `expires` | `14400000` (4h) | optional, default 4h | -| `focus_active.type` | `"livekit"` | `"livekit"` | -| `focus_active.focus_selection` | `"oldest_membership"` | `"oldest_membership"` \| `"multi_sfu"` | -| `foci_preferred[].type` | `"livekit"` | `"livekit"` | -| `foci_preferred[].livekit_service_url` | ✓ | string | -| `foci_preferred[].livekit_alias` | `roomID` | string | -| `m.call.intent` | `"video"` | optional | -| `membershipID` | UUID | optional | -| `scope` | `"m.room"` | optional `"m.room"` \| `"m.user"` | - -Matches. ✓ - -State key: `___m.call`. Matches Element X's per-device convention. ✓ - -### `CallEncryptionService.fetchCallTargets` - -Sources call participants from `RoomInfo.activeRoomCallParticipants` and -broadcasts our AES key to all of each user's devices via the to-device -`"*"` wildcard. Matches Element Call's -`matrix-js-sdk/src/matrixrtc/ToDeviceKeyTransport.ts` behaviour. The -SDK accessor is user-level only — no device IDs — so a few Olm -sessions to non-call devices get warmed up unnecessarily, but the key -itself is per-call and only consumed by a LiveKit cryptor that -expects it. - -(History: previously walked raw `/rooms/{id}/state` REST to parse -per-device state keys. Switched in Item 5.) - -### `CallWidgetBridge.handleIncomingToDevice` (key routing) - -Lines 554–634. - -Routes inbound keys to `participantId` (the LiveKit-side identity our cryptor uses) by trying: - -1. `":"` -2. `":"` -3. `member.id` (the membership UUID) -4. `sender` alone - -The comment in code asserts Element Call connects to LiveKit with identity `@user:server:device`. **This is only true on the legacy path.** On v2 the identity is `unpadded_base64(sha256(canonical_json([matrixID, claimedDeviceID, memberID])))`. The legacy assumption is hardcoded into all four entries above. - -**Tracked as Item 2.** Fix requires capturing the JWT-side identity (from the JWT `sub` claim, or from `room.localParticipant.identity` after connect) and using it as the routing key when on v2. - -### `CallViewModel.connect` (lines 282–303) - -Local key install uses: -```swift -let localIdentity = "\(encryptionService.userID):\(encryptionService.deviceID)" -``` - -Comment cites `matrix-js-sdk CallMembership.ts line 101` — accurate for **legacy**. Same v2 mismatch. - -There's already a runtime warning on line 293: `"LiveKit identity X != matrix identity Y — frame encryption may misroute"`. This currently *only logs* the mismatch without acting on it. Item 2 should make us key the cryptor under whichever identity LiveKit actually assigned. - -### `CallViewModel.redistributeKey` (lines 590–617) - -Splits the LiveKit participant identity by `:` to reconstruct `(userId, deviceId)`. Hard-fails on v2 hashes (no colons → `components.count < 3` → log + return). **Tracked as Item 2.** - -### `CallViewModel.connect` — runtime instrumentation gap - -After `state = .connected` (line 391), the Activity Log has **no further events** until `disconnect()`. The LiveKit `RoomDelegate` (`Delegate` inner class in this file) handles `participantDidJoin`, `participantDidLeave`, `didPublishTrack`, etc., but nothing flows to the Activity Log. Real-world failure reports for "connected but no media" show traces ending at `Connected to call` with nothing actionable after. - -**Tracked as Item 0** (new — added after reviewing user `97853C31` activity log on 2026-06-13). - -## Design note: MSC4143 sticky-event dual-publish (future work) - -The legacy-first credential path landed on 2026-06-14 restored interop with the current ecosystem, but pins Relay to the legacy `org.matrix.msc3401.call.member` shape. matrix-js-sdk's `StickyEventMembershipManager` already publishes the MSC4143 `m.rtc.member` sticky shape in parallel with the legacy event on stacks that opt in. To unblock v2 `/get_token` (hashed identity) and remove the legacy-stack dependency, Relay will need to dual-publish too. - -### Scope - -1. **Membership UUID threading.** `CallEncryptionService` already generates a `membershipID` UUID for the m.call.member event. Extend it to be the canonical `member.id` and thread it into `LiveKitCredentialService.fetchLiveKitTokenV2` (currently we send `":"` there). The same UUID must appear in the sticky `m.rtc.member` event so peers computing `computeRtcIdentityRaw(user_id, device_id, member.id)` get the same hash lk-jwt-service assigns. - -2. **Sticky-event publish.** New code path in `CallEncryptionService` (or a peer file) that publishes the MSC4143 `m.rtc.member` shape via the SDK's future-event API (depends on Synapse MSC4140 — gate on a homeserver capability probe, skip silently if unsupported). This is *additive*: keep publishing the legacy event for peers that only read it. - -3. **Reader merge.** `CallEncryptionService.fetchCallTargets` currently sources participants from `RoomInfo.activeRoomCallParticipants` (user-level). When sticky events ship, the SDK's accessor should already merge both event types — verify before assuming. If it doesn't, parse both event types and dedupe by `(user_id, device_id)`. - -4. **Credential path selection.** With sticky publish in place, we can revert to `fetchLiveKitToken` trying v2 first again. Peers reading our sticky event compute hashed identity; peers reading only our legacy event compute colon identity. Both must work simultaneously — meaning lk-jwt-service must place us under *one* identity but peers reading the "wrong" event lose us. Mitigation: matrix-js-sdk peers prefer the sticky event when both are present (verify in `parseFromEvent` / its caller). So as long as we sticky-publish, hashed-identity peers find us; legacy-only peers (vicr123-style older Relay builds) won't, but that's the same tradeoff matrix-js-sdk made. - -5. **Encryption key routing.** `CallWidgetBridge.handleIncomingToDevice` already registers under both identity shapes (Item 2 belt-and-braces). Keep. `CallViewModel.connect` and `redistributeKey` currently key off `room.localParticipant.identity` post-connect — that continues to work, since LiveKit tells us our actual identity directly regardless of which credential path won. - -### Out of scope for this work - -- Switching the to-device key transport from custom-payload Olm to the SDK's `ToDeviceKeyTransport`. Orthogonal. -- Replacing `MembershipManager` plumbing wholesale with the SDK's `StickyEventMembershipManager`. The Widget API doesn't expose that manager directly; we'd be reimplementing it on top of the same primitives. Out of scope until a clear interop bug forces the switch. - -### Open questions - -- **Does Synapse's MSC4140 implementation handle sticky events for E2EE rooms reliably?** matrix-js-sdk has had bugs here. Test against production homeservers before assuming. -- **Does Element X currently *publish* sticky events, or only read them?** If only reads, our dual-publish is purely forward-looking and we won't see immediate interop benefit until Element X also publishes. -- **What's the right capability probe?** No standard exists yet. Likely: try the publish, treat 400/404 with specific errcodes as "feature unavailable", cache the result per-homeserver. - -### Files this will touch - -- `RelayKit/Call/CallEncryptionService.swift` — membership UUID threading, sticky-event publish, optional reader merge -- `RelayKit/Call/LiveKitCredentialService.swift` — revert `fetchLiveKitToken` to v2-first, thread UUID into request body -- `RelayKit/Call/CallWidgetBridge.swift` — no changes expected (key routing already dual-registers) -- `RelayKit/Call/CallViewModel.swift` — no changes expected - -## What this file is NOT - -- Not user-facing — see `docs/troubleshooting-calls.md` for that. -- Not exhaustive — only documents deviations we've confirmed against real source code, real specs, or real user traces. If you find a new deviation that matches a user report, add it here with a citation. -- Not a roadmap — the task list on the `rtc-element-call-alignment` branch tracks priority and ordering. diff --git a/docs/troubleshooting-calls.md b/docs/troubleshooting-calls.md deleted file mode 100644 index 794ab64..0000000 --- a/docs/troubleshooting-calls.md +++ /dev/null @@ -1,108 +0,0 @@ -# Troubleshooting MatrixRTC calls - -If your calls fail to connect, or connect but show no audio or video, this page walks you through capturing the data we need to diagnose the issue. - -## Quick capture (3 minutes) - -1. Open **Window → Activity Log** (or press `⌥⌘A`). -2. In the search bar, click the **filter chip** and limit to the **Call** category. -3. Leave the Activity Log window open and reproduce the failing call. -4. Once the call has failed (or you have media issues), press `⌘S` in the Activity Log window to export the filtered events. -5. Save the file (default name `relay-activity-log.json`) and attach it to your bug report. - -That export is everything the developers need to triage a calling problem. - -## What's in the export - -The file is pretty-printed JSON: an array of events, each with `timestamp` (ISO 8601), `category` (will be `"call"` after filtering), `severity` (`debug` / `info` / `warning` / `error`), `source` (which subsystem logged it), `summary`, optional `detail`, optional `roomId`, and a `metadata` key-value map. - -Sample event: - -```json -{ - "timestamp": "2026-06-12T14:30:05.123Z", - "category": "call", - "severity": "info", - "source": "LiveKitCredentialService", - "summary": "SFU URL discovered", - "roomId": "!abc:example.org", - "metadata": {} -} -``` - -## What's safe to share - -The export contains: - -- Your Matrix room ID (`!…:server`) and device IDs -- Per-call membership UUIDs and key indices -- Your homeserver hostname -- SHA-256 fingerprints (first 8 hex chars) of encryption keys, **never the keys themselves** - -It does **not** contain: - -- Raw E2EE keys -- OpenID tokens or LiveKit JWTs -- Message contents, names, or avatars -- The OpenID access token used for SFU auth - -If you don't want your room IDs or device IDs in a public bug report, ask the maintainers for a DM in [#relayapp:matrix.org](https://matrix.to/#/#relayapp:matrix.org) and share the file there. - -## Reading the export yourself - -A few specific log lines act as signposts. If your file contains any of these, you can pre-diagnose your own issue: - -### Connection-time signals - -| Look for | What it means | -| --- | --- | -| `Fetching call credentials` | The call attempt started; subsequent events should show whether discovery and token exchange succeeded. | -| `SFU URL discovered` | Your homeserver advertises a LiveKit SFU. Good. | -| `Failed to fetch call credentials` with `detail: "This homeserver has no LiveKit call server configured…"` | Your homeserver doesn't expose `org.matrix.msc4143.rtc_foci` in `.well-known/matrix/client`, and the unstable transports endpoint isn't supported. Ask your homeserver admin to configure MatrixRTC. | -| `Call credentials obtained` | Token exchange succeeded. If the call still fails after this, the problem is downstream of credential acquisition. | - -### Connected-but-no-media signals - -If the call reaches the **Connected to call** event but you can't see or hear anyone, the failure is in the encryption-key exchange or frame routing. - -| Look for | What it means | -| --- | --- | -| `Distributed E2EE key to N user(s)` followed by `Received E2EE key from …` for each peer | Key exchange is happening. If you still have no media, the problem is in the frame-decoder routing — note the `Participant:` field in the `detail`, this is the identity LiveKit assigned to the peer. | -| No `Received E2EE key from …` events at all | Peers aren't sending you their keys, or the widget bridge isn't running. Check whether the room is configured as encrypted (E2EE is enabled only for encrypted Matrix rooms). | -| `Widget bridge started` but no later events | The widget driver is waiting for capability negotiation that never completes. Likely an SDK or homeserver-side issue. | - -### Patterns worth flagging in a bug report - -These specific event sequences point to a known class of failure: - -1. **No `Call credentials obtained` event after `Fetching call credentials`.** Credential exchange is failing. Almost always a homeserver-side or SFU-side configuration problem; we'll need to know which homeserver you're on. - -2. **`Connected to call` but no `Distributed E2EE key` event.** The Matrix call-member state event went out, but no peers existed at the time you connected, or our cache of call members is stale. If others were already in the call, this is a Relay bug worth reporting. - -3. **`Received E2EE key from …` events present, but you still see no media from those peers.** Frame-cryptor routing is misaligned with the LiveKit participant identity. This is currently a known issue we're working on; please attach the export and note the LiveKit `Participant:` identity you see in those events' `detail` field. - -## When the Activity Log isn't enough - -For really hard cases (the SFU is rejecting our JWT with no useful error, or the LiveKit room itself never finishes initialising) we sometimes need a unified-log capture, which records the low-level RTC trace from inside the LiveKit SDK. - -While the call is reproducing the issue, run in a terminal: - -```sh -log stream --predicate 'subsystem == "RelayKit" AND category BEGINSWITH "Call"' \ - --level info > relay-call-trace.log -``` - -Stop with `^C` once the call has failed, then share `relay-call-trace.log` alongside the Activity Log JSON. - -The unified-log capture contains more verbose internal trace including LiveKit SDK output. It's safe in the same way the Activity Log is (no key material, no tokens), but it does contain more verbose timing and routing data. Share it through the same channel you'd share the JSON. - -## Reporting - -File an issue at [github.com/subpop/Relay/issues](https://github.com/subpop/Relay/issues) or message [#relayapp:matrix.org](https://matrix.to/#/#relayapp:matrix.org). Please include: - -- The `relay-activity-log.json` export (filtered to the Call category) -- Your homeserver hostname (e.g. `matrix.example.org`) -- Whether other clients (Element X, Element Web) succeed at calling on the same account -- A one-line description of what you saw: "fails to connect", "connects but no audio", "connects but no video", etc. - -If you'd rather not put logs in a public issue, send them privately to maintainers in the Matrix room first. From 8842ee58b7b0a7d11a86b680075190844eb64396 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Mon, 15 Jun 2026 20:05:12 -0400 Subject: [PATCH 14/16] Fix optional unwrap on encryptionService in identity-mismatch check Follow-up to the upstream/main merge: the "LiveKit identity mismatch" diagnostic block referenced `encryptionService.userID` directly, but `encryptionService` is an optional on this branch. Wrap the local identity construction in `encryptionService.map { ... }` and gate the warning behind both the LiveKit identity and the Matrix-side identity being present. Co-Authored-By: Claude Opus 4.7 --- RelayKit/Call/CallViewModel.swift | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/RelayKit/Call/CallViewModel.swift b/RelayKit/Call/CallViewModel.swift index c50a038..e58188a 100644 --- a/RelayKit/Call/CallViewModel.swift +++ b/RelayKit/Call/CallViewModel.swift @@ -337,8 +337,10 @@ public final class CallViewModel: CallViewModelProtocol { // is normally silent; if it fires we've landed on the // v2 hash identity and peer-side decryption will fail // until we also publish MSC4143 sticky events. - let matrixSidIdentity = "\(encryptionService.userID):\(encryptionService.deviceID)" - if let livekitIdentity = self.localParticipantID, livekitIdentity != matrixSidIdentity { + let matrixSidIdentity: String? = encryptionService.map { "\($0.userID):\($0.deviceID)" } + if let livekitIdentity = self.localParticipantID, + let matrixSidIdentity, + livekitIdentity != matrixSidIdentity { activityLog?.log( category: .call, severity: .warning, source: "CallViewModel", summary: "LiveKit identity mismatch — frame encryption may misroute", From 7edbf2e004979f37e3ba07714c31c35e7e7fdbd1 Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Tue, 16 Jun 2026 07:27:10 -0400 Subject: [PATCH 15/16] Update Relay/Resources/Relay.help/Contents/Info.plist Co-authored-by: Link Dupont --- Relay/Resources/Relay.help/Contents/Info.plist | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Relay/Resources/Relay.help/Contents/Info.plist b/Relay/Resources/Relay.help/Contents/Info.plist index 231c8bb..e0e911c 100644 --- a/Relay/Resources/Relay.help/Contents/Info.plist +++ b/Relay/Resources/Relay.help/Contents/Info.plist @@ -5,7 +5,7 @@ CFBundleDevelopmentRegion en CFBundleIdentifier - io.github.subpop.relay.help + app.subpop.Relay.help CFBundleInfoDictionaryVersion 6.0 CFBundleName From 4a7cb35ec445ec4e7a8925536cbc52ac1bfabd3b Mon Sep 17 00:00:00 2001 From: Andrew Hunter Date: Tue, 16 Jun 2026 07:27:25 -0400 Subject: [PATCH 16/16] Update Relay/Info.plist Co-authored-by: Link Dupont --- Relay/Info.plist | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Relay/Info.plist b/Relay/Info.plist index 42b56ab..cd860fa 100644 --- a/Relay/Info.plist +++ b/Relay/Info.plist @@ -7,7 +7,7 @@ CFBundleHelpBookFolder Relay.help CFBundleHelpBookName - io.github.subpop.relay.help + app.subpop.Relay.help UTExportedTypeDeclarations