Skip to content

fix(record): stop leaking same-origin iframe state on removal#3570

Open
gruessi wants to merge 8 commits into
PostHog:mainfrom
gruessi:fix/iframe-memory-leak
Open

fix(record): stop leaking same-origin iframe state on removal#3570
gruessi wants to merge 8 commits into
PostHog:mainfrom
gruessi:fix/iframe-memory-leak

Conversation

@gruessi
Copy link
Copy Markdown

@gruessi gruessi commented May 12, 2026

Problem

Recording a page that mounts and unmounts same-origin iframes leaks each iframe's Document, every node serialized into the mirror, one MutationObserver per mount, and its Window. End-to-end heap test against a host that cycles 5 blob-URL iframes every 2s for 110s: +118 MB / +390 leaked HTMLDocuments, ~300 KB per mount. Long-running tabs eventually hit the browser's memory limit.

Cross-origin and about:blank iframes go through a different code path and weren't affected.

Changes

Eight independent retention chains in packages/rrweb/. Each was masking the next — fixing any subset still showed leak in heap snapshots.

  • Listener leaks. onceIframeLoaded now returns a disposer (load listener + timer); IframeManager's pagehide handler is named and tracked. Both are removed in removeIframeById.
  • recordCrossOriginIframes cleanup gate dropped so the same-origin path actually runs cleanup. cleanupDetachedIframes() sweeps orphans when an iframe is removed via its parent.
  • iframe.src = 'about:blank' before unmount (typical React cleanup) swaps contentDocument/contentWindow so the mirror walks an empty doc and nestedIframeListeners: Map<Window> misses its entry. New attachedDocuments / attachedWindows WeakMaps capture every doc/window observed across the iframe's lifetime; detach walks all of them.
  • Global arrays accumulated forever. mutationBuffers[] and handlers[] had no iframe-removal path. findAndRemoveIframeBuffer now matches by original doc; new runAndDetachIframeCleanup splices closures out of handlers[].
Per-chain detail (file-by-file)
  1. rrweb-snapshot/src/snapshot.tsonceIframeLoaded. Returns a disposer that removes the listener and clears the pending timer. The readyState === 'complete' branch previously registered a second listener next to setTimeout(listener, 0) — that one is now named (onSubsequentLoad) and removed by the same disposer. serializeNodeWithId/snapshot gain an optional onIframeListenerRegistered?(iframe, disposer) callback so the disposer travels back to IframeManager.
  2. IframeManager.attachIframe. Anonymous pagehide handler became a named function tracked in pageHideHandlers: WeakMap<HTMLIFrameElement, Set<{win, handler}>>. Set-of-handlers because an iframe can run through several Windows during its lifetime.
  3. record/index.ts wrappedMutationEmit. Cleanup gate on recordCrossOriginIframes removed — same-origin leaks too.
  4. IframeManager.cleanupDetachedIframes(). Sweeps attachedIframes for ids no longer in the mirror to catch nested iframe removals (mutation buffer only reports the directly-removed ancestor).
  5. attachedDocuments: WeakMap<HTMLIFrameElement, Set<Document>>. Captures every contentDocument observed across an iframe's lifetime (initial blank → loaded src → cleanup blank). Detach walks the Set and calls mirror.removeNodeFromMap(doc) on each.
  6. mutationBuffers[] cleanup. MutationBuffer.bufferDoc() exposes the doc; findAndRemoveIframeBuffer takes optional knownDocs: Set<Document> to match by original contentDocument. reset() nulls this.doc as defence in depth.
  7. handlers[] cleanup. runAndDetachIframeCleanup(iframeId) runs the cleanup and splices the closure from both iframeObserverCleanups and handlers. iframeObserverCleanups is Map<number, Set<cleanup>> so navigations don't overwrite earlier cleanups.
  8. attachedWindows: WeakMap<HTMLIFrameElement, Set<Window>>. Mirrors attachedDocuments. removeIframeById walks the captured Set and removes each entry from nestedIframeListeners and crossOriginIframeMap. Fixes a stale code comment that incorrectly claimed Map<Window> keys don't prevent GC.

Validation

End-to-end heap snapshot, same browser, same recorder config, 61 cycles per run (≈305 mount/unmount events).

Unpatched After fix
Post-GC heap delta over 110s +118 MB −5 MB
native:HTMLDocument Δ +390 0
native:MutationObserver Δ +391 0
native:MutationObserverRegistration Δ +390 −1
closure:onIframeLoad Δ +195 +1

The negative post-GC delta is because the heap-snapshot operation forces a thorough GC that releases other accumulated state too — the real read is the constructor-level deltas. Top growth in the post-cycle snapshot is V8/JIT internals and transient PerformanceLongTaskTiming entries, not leaked iframe state.

Tests

packages/rrweb/rrweb/test/record/memory-leaks.test.ts30/30 pass (4 new regressions for this fix: same-origin cleanup, nested parent removal, captured-doc walk, multi-doc lifecycle). check-types clean for both @posthog/rrweb and @posthog/rrweb-snapshot.

Release info Sub-libraries affected

Libraries affected

  • All of them
  • posthog-js (web)
  • posthog-js-lite (web lite)
  • posthog-node
  • posthog-react-native
  • @posthog/react
  • @posthog/ai
  • @posthog/convex
  • @posthog/next
  • @posthog/nextjs-config
  • @posthog/nuxt
  • @posthog/rollup-plugin
  • @posthog/webpack-plugin
  • @posthog/types

The changeset bumps @posthog/rrweb and @posthog/rrweb-snapshot (patch). updateInternalDependencies: "patch" cascades the bump through rrweb-recordposthog-js.

Checklist

  • Tests for new code
  • Accounted for the impact of any changes across different platforms
  • Accounted for backwards compatibility of any changes (no breaking changes!)
  • Took care not to unnecessarily increase the bundle size

No public API changes. The new onIframeListenerRegistered? option is optional and additive; everything else is internal to IframeManager / MutationBuffer / record/index.ts. Bundle-size impact is small (~handful of WeakMap declarations + cleanup helpers) — worth confirming via bundle-size CI on this PR.

If releasing new changes

  • Ran pnpm changeset to generate a changeset file

gruessi added 2 commits May 11, 2026 16:53
Same-origin iframes mounted and unmounted while session recording is
active leaked the iframe Document, every node serialized into the
mirror, and one MutationObserver per mount. Per-iframe retained cost
≈300 KB; for hosts that mount/unmount many iframes (e.g. flow editors
that render previews via blob URLs) memory can grow into the hundreds
of MB over an extended session.

Nine retainer chains had to be closed:

- onceIframeLoaded (snapshot.ts) never removed its load listener; the
  closure pinned the iframe via captured `mirror` / `iframeManager`.
  It now returns a disposer the caller can invoke on detach. The
  branch that re-registered a load listener after the iframe was
  already loaded is collapsed into the disposer-tracked path. The
  blank-frame-during-navigation guard from the original code is
  preserved: when win.location.href is about:blank but iframeEl.src
  is a non-empty/non-about:blank URL we register only the load
  listener (no immediate serialization), so the recording does not
  capture the transient about:blank document Chrome assigns before
  the real navigation completes.

- IframeManager.attachIframe registered an anonymous `pagehide`
  handler with no way to remove it. Convert it to a named function
  and track per iframe in `pageHideHandlers`. attachIframe runs once
  per iframe load, so a single iframe can collect handlers across
  several Windows; the map stores a Set so every handler is detached
  on removeIframeById, not just the latest. Same shape applied to
  load-listener disposers via `loadListenerDisposers`.

- The cleanup branch in `record/index.ts`'s wrappedMutationEmit was
  gated on `recordCrossOriginIframes`; same-origin iframes never had
  their iframeObserverCleanups entry called or
  iframeManager.removeIframeById invoked. Drop the gate. Also add
  IframeManager.cleanupDetachedIframes so iframes removed inside a
  removed subtree (where only the ancestor's id appears in
  m.removes) get cleaned up too.

- Hosts can swap `iframe.src` to `about:blank` before removing the
  iframe — typical React unmount pattern. By detach time
  `iframe.contentDocument` and `iframe.contentWindow` no longer match
  what we attached. Capture every (Document, Window) we ever saw per
  iframe in `attachedDocuments` / `attachedWindows` (WeakMaps keyed
  by iframe element, holding a Set of values), and walk the captured
  sets on detach to release `mirror.idNodeMap` entries and to unwind
  `nestedIframeListeners` (a strong Map keyed by Window).

- Iframes can be removed before their first load fires. attachIframe
  has not run yet so attachedIframes has no entry, and by the time
  wrappedMutationEmit calls removeIframeById the mirror entry has
  typically already been cleared by MutationBuffer.emit, so the
  fallback mirror.getNode lookup also returns null. New
  `iframeElementsById` Map (populated in registerLoadListenerDisposer
  and cleared in removeIframeById) lets the cleanup path find the
  iframe and run its disposer in this pre-load case.

- Iframe navigations overwrite per-iframe observer cleanups.
  loadListener fires once per load, but iframeObserverCleanups stored
  only one cleanup per iframe id — repeated navigations pushed every
  cleanup into `handlers[]` while leaving only the most recent one
  reachable for runAndDetachIframeCleanup. Earlier MutationObservers
  and MutationBuffers stayed alive until stopRecording.
  iframeObserverCleanups is now `Map<number, Set<listenerHandler>>`;
  detach iterates the Set, splicing each entry out of `handlers[]`.

- Reparenting an iframe must not be confused with removal.
  MutationBuffer.emit clears the mirror entry before re-serializing
  the add, so a moved iframe can come back with a fresh id — the
  previous same-id check in wrappedMutationEmit missed that case
  and disposed the (still-needed) listener and observers, breaking
  recording for the moved iframe until the next full snapshot.
  Cross-check by element identity: walk m.adds, look up each new id
  in the mirror, and if the removed-id's element matches one of
  those, treat it as a move and only drop the stale id mapping
  (new IframeManager.forgetIframeId helper) without disposing
  anything.

- IframeManager.destroy() reassigned `loadListenerDisposers` and
  `pageHideHandlers` to fresh WeakMaps without iterating them, so
  the registered DOM listeners and pending onceIframeLoaded timers
  outlived stopRecording(). A late iframe load could then call back
  into iframeManager.attachIframe and emit mutations after the user
  thought recording had stopped. destroy() now enumerates tracked
  iframes via the iterable id-keyed Maps and runs disposers/removers
  before dropping the WeakMaps.

- The global `mutationBuffers[]` array (observer.ts) and the local
  `handlers[]` array (record/index.ts) accumulated per-iframe entries
  forever — `findAndRemoveIframeBuffer` was only invoked on
  pagehide, and `handlers[]` was never cleaned up at all. Splice
  both on iframe detach via a new `runAndDetachIframeCleanup` helper.
  `findAndRemoveIframeBuffer` now accepts an optional captured-docs
  set so it can match a buffer by its original document even after
  iframe.contentDocument has been swapped. `MutationBuffer.reset()`
  clears `this.doc` as defence in depth.

Validated end-to-end with a host page that mounts and unmounts five
blob-URL iframes every 2s for 110s. In a clean Chrome profile (no
extensions, no PostHog identity):

- Before: post-GC heap +118 MB, +390 leaked HTMLDocuments,
  +391 MutationObservers.
- After: post-GC heap ~0 MB, +1 leaked HTMLDocument,
  0 MutationObservers.

30/30 unit tests in test/record/memory-leaks.test.ts pass; new tests
cover the captured-doc walk, same-origin cleanup without
recordCrossOriginIframes, nested-removal sweep, pre-load removal,
iframe reparenting, and post-stop disposer execution.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

@gruessi is attempting to deploy a commit to the PostHog Team on Vercel.

A member of the Team first needs to authorize it.

# Conflicts:
#	packages/rrweb/rrweb/src/record/observer.ts
@gruessi gruessi marked this pull request as ready for review May 12, 2026 09:50
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Comments Outside Diff (1)

  1. packages/rrweb/rrweb/test/record/memory-leaks.test.ts, line 811-815 (link)

    P2 Repeated IframeManager setup violates Once-and-Only-Once

    The four new unit tests all construct an identical IframeManager with the same mock mirror, mutationCb, wrappedEmit, and stylesheetManager stub. A small factory helper (e.g., makeIframeManager(overrides?)) would collapse five copies of the same 15-line block into one, making future option changes a single-line edit and keeping the tests focused on the behaviour under test.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/rrweb/rrweb/test/record/memory-leaks.test.ts
    Line: 811-815
    
    Comment:
    **Repeated `IframeManager` setup violates Once-and-Only-Once**
    
    The four new unit tests all construct an identical `IframeManager` with the same mock `mirror`, `mutationCb`, `wrappedEmit`, and `stylesheetManager` stub. A small factory helper (e.g., `makeIframeManager(overrides?)`) would collapse five copies of the same 15-line block into one, making future option changes a single-line edit and keeping the tests focused on the behaviour under test.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/rrweb/rrweb/src/record/mutation.ts:254-263
`bufferDoc()` and `reset()` both reach through `(this as unknown as ...)` to access the `private doc` field. This cast bypasses TypeScript's type system and will silently break if `doc` is renamed. Exposing `doc` via a real getter or using a single consistent cast would be cleaner and safer.

```suggestion
  public reset() {
    this.shadowDomManager.reset();
    this.canvasManager.reset();
    // Defence in depth: clear doc in case a stale closure still pins the buffer.
    this.doc = null as unknown as observerParam['doc'];
  }

  public bufferDoc(): Document | null {
    return this.doc as Document | null;
  }
```

### Issue 2 of 2
packages/rrweb/rrweb/test/record/memory-leaks.test.ts:811-815
**Repeated `IframeManager` setup violates Once-and-Only-Once**

The four new unit tests all construct an identical `IframeManager` with the same mock `mirror`, `mutationCb`, `wrappedEmit`, and `stylesheetManager` stub. A small factory helper (e.g., `makeIframeManager(overrides?)`) would collapse five copies of the same 15-line block into one, making future option changes a single-line edit and keeping the tests focused on the behaviour under test.

Reviews (1): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Comment thread packages/rrweb/rrweb/src/record/mutation.ts
- mutation.ts: replace receiver-cast on private `doc` with value-side
  cast using `observerParam['doc']`, so renames stay type-safe.
- memory-leaks.test.ts: extract a `makeIframeManager` factory in the
  `IframeManager.attachedDocuments` describe block; both tests use it.

Pure refactor — no runtime behavior change. 31/31 tests still pass.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/rrweb/rrweb-snapshot/src/snapshot.ts:404-423
`onSubsequentLoad` is a one-liner wrapper that simply forwards to `listener` with no argument transforms. It's a superfluous wrapper per the "no superfluous parts" rule — using `listener` directly as the event-listener reference is equivalent and removes the extra allocation.

```suggestion
  if (
    winLocationHref !== blankUrl ||
    iframeEl.src === blankUrl ||
    iframeEl.src === ''
  ) {
    // Genuinely loaded — fire on the next tick so the host mutation that
    // surfaced this iframe has time to commit; also re-fire on later loads.
    const initialTimer = setTimeout(listener, 0);
    iframeEl.addEventListener('load', listener);
    return () => {
      clearTimeout(initialTimer);
      iframeEl.removeEventListener('load', listener);
    };
  }
  // Transient blank during navigation — wait for the real load.
  iframeEl.addEventListener('load', listener);
  return () => {
    iframeEl.removeEventListener('load', listener);
  };
```

### Issue 2 of 2
packages/rrweb/rrweb/src/record/iframe-manager.ts:496-503
**`findAndRemoveIframeBuffer` not called when `capturedDocs` is absent**

When an iframe is removed before `attachIframe` was ever called (so `capturedDocs` is `undefined`), `findAndRemoveIframeBuffer` is never invoked from `removeIframeById`. In that scenario no `MutationBuffer` exists either (the buffer is created inside `loadListener` which is only called from `attachIframe`), so the omission is currently safe. However, this silent invariant is easy to break: a future path that creates a buffer without going through `attachIframe` would leak silently. A brief comment noting the invariant — or an unconditional `findAndRemoveIframeBuffer(iframe)` call outside the `if (capturedDocs)` block — would make the intent explicit.

Reviews (2): Last reviewed commit: "chore(record): tidy up per Greptile revi..." | Re-trigger Greptile

@marandaneto marandaneto requested review from a team, TueHaulund, arnohillen, fasyy612, ksvat and nicowaltz and removed request for a team May 12, 2026 11:33
@ksvat ksvat requested a review from pauldambra May 14, 2026 12:06
Copy link
Copy Markdown
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks safe to me
i feel like we're accruing a very similar "disposables" pattern across the code
but that's not the fault of this PR, or blocking here

Copy link
Copy Markdown
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

QA Swarm review complete. See inline comments and the summary comment for the full verdict.

}
// Forward-declared; assigned inside the try{} block where `handlers` is in scope.
let runAndDetachIframeCleanup: (iframeId: number) => void = () => {};
let cleanupDetachedIframeObservers: () => void = () => {};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[convergent: paul + xp + qa-team/frontend] 🟡 MEDIUM

Forward-declaring runAndDetachIframeCleanup / cleanupDetachedIframeObservers as () => {} and reassigning inside the try{} block makes construction order load-bearing in a quiet way. Anything that ends up invoking wrappedMutationEmit before the try-block executes gets a silent no-op cleanup — there's no warning, the leak just comes back.

Three refactors that would tighten this:

  • Extract handlers + iframeObserverCleanups into a small IframeObserverRegistry class with register(id, handler) / detach(id) / cleanupDetached() methods. The class owns its own state, no forward-declared no-ops needed.
  • Or: let runAndDetachIframeCleanup: ((id: number) => void) | undefined and call runAndDetachIframeCleanup?.(id) — silent misuse becomes a no-op only when intentionally unset, and a future caller that fires too early visibly does nothing rather than swallowing a leak.
  • Or: lift handlers to outer scope so the closures don't need to be reassigned at all.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802 via the smaller of your three suggestions. Both helpers are now typed ((id: number) => void) | undefined and called via ?.(). A premature call is visibly a no-op rather than silently swallowing a leak. Holding off on the IframeObserverRegistry extraction in this PR though.

const addedIframeElements = new Set<HTMLIFrameElement>();
if (m.adds.length > 0) {
for (const add of m.adds) {
const node = mirror.getNode(add.node.id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[paul] 🟠 HIGH

the reparent detection uses mirror.getNode(add.node.id) to map the new id back to the element. that only works because MutationBuffer.emit drains mapRemoves before pushAdd serialises the new id. that's a deep cross-file coupling — a change to emit's ordering would silently corrupt reparent detection and we'd start tearing down observers on every iframe move.

could we add a comment over in MutationBuffer.emit pointing at this expectation, or a test that fails when emit's drain order changes? a silent regression here would look like "iframes occasionally stop recording after a move" — exactly the sort of thing that's painful to track down later. ship as you see fit, but i'd want a tripwire.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802, added a comment over the mapRemoves drain in MutationBuffer.emit that points at the reparent detection in record/index.ts and warns future maintainers not to reorder. Held off on a regression test because asserting drain order from outside requires exposing more MutationBuffer internals than I'd like; happy to add one if you have a shape in mind.

});

// Safety net: cleanup any iframes that have become detached or inaccessible
// Catch iframes removed inside a removed subtree (only the ancestor's
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[qa-team/reliability] 🟡 MEDIUM

The two cleanup orderings disagree. On the explicit m.removes path (around lines 282-284) runAndDetachIframeCleanup(id) runs before iframeManager.removeIframeById(id), so the MutationObserver is disconnected before findAndRemoveIframeBuffer resets the buffer and nulls this.doc. But the safety-net path here runs iframeManager.cleanupDetachedIframes() (which transitively calls buf.reset() -> this.doc = null) before cleanupDetachedIframeObservers() disconnects the observer.

Combined with the new reset() nulling this.doc, a mutation queued in the microtask window between those two calls would land on a buffer whose doc is null and throw inside pushAdd's serializeNodeWithId({ doc: this.doc, ... }). The race window is narrow (browsers deliver MutationObserver callbacks via microtasks) but real.

Suggested fix: swap the order to cleanupDetachedIframeObservers() first, then iframeManager.cleanupDetachedIframes(). That matches the explicit-remove path and closes the race.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802, swapped to cleanupDetachedIframeObservers() then iframeManager.cleanupDetachedIframes(). Both paths now agree.

this.shadowDomManager.reset();
this.canvasManager.reset();
// Defence in depth: clear doc in case a stale closure still pins the buffer.
this.doc = null as unknown as observerParam['doc'];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[convergent: paul + xp + qa-team/reliability] 🟡 MEDIUM

this.doc = null as unknown as observerParam['doc'] — three reviewers independently snagged on this. Either:

  • doc really can be null after reset, in which case make the field's type honest (Document | null) and fix the few call sites that read it (pushAdd's serializeNodeWithId({ doc: this.doc, ... }), addShadowRoot(..., this.doc)), or
  • The buffer simply isn't used post-reset and the null-assignment isn't needed at all — let the GC collect the buffer naturally now that it's spliced from mutationBuffers[].

The "defence in depth" comment + the double-cast suggests neither path was fully committed to. Pick a story. Bonus: public bufferDoc(): Document | null on the next line claims nullability but the field is typed non-null — that inconsistency is the same smell.

If you keep the null-assignment, please also harden processMutation/emit with a if (!this.doc) return; guard so a late MutationObserver callback during the cleanup race (see the related comment on wrappedMutationEmit) is harmless instead of an exception.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802, went with option B. Dropped the null-assignment entirely; bufferDoc() returns Document (matches the field). The buffer's only strong refs are mutationBuffers[] + the cleanup closure in handlers[], and both are removed before reset() is reached, so GC handles the doc. The nulling was creating the same race you flagged on the other thread, worse than the hedge it was meant to be.

private attachedWindows: WeakMap<HTMLIFrameElement, Set<Window>> =
new WeakMap();
private attachedDocuments: WeakMap<HTMLIFrameElement, Set<Document>> =
new WeakMap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[convergent: xp + paul + qa-team/performance + qa-team/compatibility] 🟡 MEDIUM

Four-plus WeakMaps/Maps keyed by HTMLIFrameElement (attachedWindows, attachedDocuments, loadListenerDisposers, pageHideHandlers, plus the strong-keyed iframeElementsById) are a textbook Data Clump. They're populated together in attachIframe and cleared together in removeIframeById / destroy. Every retainer chain ends up with its own let bucket = …; if (!bucket) … boilerplate.

One WeakMap<HTMLIFrameElement, IframeAttachmentState> where IframeAttachmentState = { windows: Set<Window>; documents: Set<Document>; loadDisposers: Set<() => void>; pageHide: Set<{ win: Window; handler: () => void }> } would name the concept once. Cleanup becomes one delete; the "one bucket per element" pattern lives in a single helper.

Secondary observation: iframeElementsById is a strong Map, while the others are WeakMap — that's deliberate (you need to look up by id) but worth a comment that this map MUST be cleared on every detach pathway or it pins iframe elements forever.

Observability nudge while you're in here — five separate retainer chains is a lot to keep right forever. A debug-only counter that logs attachedDocuments.size / loadDisposers.size etc. crossing a threshold would let us see the next regression before users do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed in shape, single WeakMap<HTMLIFrameElement, IframeAttachmentState> would be cleaner. Out of scope here though; the refactor would touch every leak-path I just landed and obscure what's new in this diff.

Adding the comment on iframeElementsById being strong-keyed.

fired = true;
let fired = false;
let timer: ReturnType<typeof setTimeout> | null = null;
const onLoad = () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[xp] 🟡 MEDIUM

The "still loading" branch's onLoad calls iframeEl.removeEventListener('load', onLoad) after firing, so subsequent loads no longer re-fire the listener. Compare to the "complete" branch right below this, which carefully wires onSubsequentLoad precisely to keep firing on later loads.

This asymmetry is either:

  • Intentional — then please leave a comment explaining why a still-loading iframe doesn't deserve subsequent-load handling, or
  • An oversight — the PR description says "Accumulate every contentDocument across loads", which strongly suggests this branch should also keep the listener attached after the first load (just behind the fired guard) so an iframe.src swap while the iframe is still loading the initial document gets its second contentDocument tracked too.

The latter feels more consistent with the rest of the diff's intent.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802, was an oversight, not intentional. Still-loading branch now attaches onSubsequentLoad after the initial fire, matching the complete branch. An iframe.src swap while the initial document is still loading now gets its second contentDocument serialized too. Sharing the disposer wiring across both cases was the right pull.


callSafely(() =>
iframeEl.contentWindow?.addEventListener('pagehide', () => {
callSafely(() => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[qa-team/generalist-a] 🟢 LOW

attachIframe adds a pagehide listener on every call without checking for an existing entry. A reparented iframe (which calls attachIframe again from MutationBuffer.emit's onIframeLoad after the move) will accumulate N pagehide listeners on the same Window. The handlers are idempotent (mirror.removeNodeFromMap is a no-op for an unknown id, etc.), so this isn't a correctness bug — but the pageHideHandlers Set grows on every reparent, and every listener fires once per event.

Dedupe before adding:

if (Array.from(bucket).some(({ win }) => win === pageHideWindow)) return;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 84f9802 ,attachIframe now scans the per-iframe bucket for an existing entry with the same Window and skips registration if found.


this.iframeElementsById.delete(iframeId);

if (iframe) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[xp] 🟢 LOW

removeIframeById is now ~60 lines doing seven distinct things: resolve element, clear nested-Window listeners, clear legacy-Window path, clear crossOrigin map, drop iframe, dispose load listeners, remove pagehide, walk captured docs. Each block has a clarifying comment — those comments are extract-method opportunities (detachNestedWindowListeners(iframe), releaseCapturedDocuments(iframe), disposeLoadListeners(iframe), etc.).

Kent Beck's ComposedMethod: keep all operations in a method at the same level of abstraction. Right now the method mixes the "detach observers" abstraction with the "release captured DOM refs" abstraction with the "forget the iframe" abstraction. The next maintainer has to read the whole thing to follow one retainer chain. Not blocking — works today — but the shape is worth pressing on before this file grows another chain.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, seven inline blocks at different abstraction levels asks to be extracted into detachNestedWindowListeners(iframe) / releaseCapturedDocuments(iframe) / etc. I'd rather land the leak fix as a clean diff handle follow up with a cleanup. Worth doing before the next chain lands in this file.

await new Promise((r) => setTimeout(r, 50));

// Move must not have spliced any per-iframe MutationBuffer entries.
expect(mutationBuffers.length).toBeGreaterThanOrEqual(buffersBeforeMove);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

🤖 Automated comment by QA Swarm — not written by a human

[paul] 🟢 LOW

the reparent test asserts mutationBuffers.length >= buffersBeforeMove. that's deliberately permissive and fine for catching the bug here — but a future regression that adds buffers per move would still pass.

worth tightening to === if the count is stable across the reparent, or splitting into a separate assertion on the iframe-specific buffer count (e.g. count buffers whose doc matches the moved iframe's contentDocument). otherwise we've shipped a test that's load-bearing for one direction of regression and silent on the other.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried === — fails because buffers actually grow 2 → 4 on reparent today: the add side re-serializes and attachIframe starts a fresh observer alongside the kept old one. Separate issue from this leak, out of scope. Kept >= and rewrote the test comment so the next reader reads it as a guarded floor, not a guarantee.

@pauldambra
Copy link
Copy Markdown
Member

Note

🤖 Automated comment by QA Swarm — not written by a human

Multi-perspective review: qa-team (specialists + generalists), paul-reviewer, xp-reviewer

Verdict: ⚠️ REQUEST CHANGES

The leak fix itself is sound and the +118 MB / +390 leaked HTMLDocuments -> ~0 validation is a real receipt. The requested changes are about robustness and maintainability: one narrow correctness concern (safety-net cleanup ordering can race against the new MutationBuffer.reset()), a load-bearing reparent invariant that needs a comment or test, and three convergent design smells in the new state plumbing.

Key findings

🟠 High

  • packages/rrweb/rrweb/src/record/index.ts:267 — reparent detection silently couples to MutationBuffer.emit's drain order (mapRemoves before pushAdd). A future change there would tear down observers on every iframe move with no test failure. Add a comment in emit or a regression test.

🟡 Medium

  • packages/rrweb/rrweb/src/record/index.ts:286 (qa-team/reliability) — the safety-net path runs iframeManager.cleanupDetachedIframes() (which nulls MutationBuffer.doc) before cleanupDetachedIframeObservers() disconnects the observer. Swap the order to match the explicit-remove path and close the narrow race window.
  • packages/rrweb/rrweb/src/record/mutation.ts:258 (convergent — paul + xp + qa-team)this.doc = null as unknown as observerParam['doc'] plus bufferDoc(): Document | null on a non-null field. Pick a story: either make the field properly nullable and guard the call sites, or drop the null-assignment entirely.
  • packages/rrweb/rrweb/src/record/index.ts:188 (convergent — paul + xp + qa-team/frontend) — forward-declared runAndDetachIframeCleanup / cleanupDetachedIframeObservers no-ops reassigned inside try{}. Construction order is now load-bearing in a way that fails silently. Extract an IframeObserverRegistry class or use let f: T | undefined; f?.(...).
  • packages/rrweb/rrweb/src/record/iframe-manager.ts:39 (convergent — xp + paul + qa-team/performance + qa-team/compatibility) — Data Clump: 4 WeakMaps + 1 strong Map all keyed by HTMLIFrameElement, set/cleared together. Collapse to one WeakMap<HTMLIFrameElement, IframeAttachmentState> so the concept is named once.
  • packages/rrweb/rrweb-snapshot/src/snapshot.ts:365 (xp) — the "still loading" branch removes its load listener after the first fire; the "complete" branch right below carefully keeps re-firing via onSubsequentLoad. Either intentional (needs a comment) or an oversight that misses contentDocument swaps during initial load.

🟢 Low

  • iframe-manager.ts:200attachIframe adds a pagehide listener every call; reparented iframes accumulate N listeners on the same Window. Dedupe.
  • iframe-manager.ts:459removeIframeById is now ~60 lines doing 7 distinct things. Extract detachNestedWindowListeners, releaseCapturedDocuments, disposeLoadListeners per ComposedMethod.
  • memory-leaks.test.ts:1224 — reparent test uses >= which catches the current regression but is silent on a future one that adds buffers per move. Tighten to === or split the assertion.

Convergence

Three findings were flagged independently by all three reviewers (highest confidence):

Convergent finding Reviewers Location
Forward-declared no-op cleanup pattern paul + xp + qa-team/frontend index.ts:186-188
this.doc = null cast / bufferDoc(): | null inconsistency paul + xp + qa-team/reliability mutation.ts:254-261
Data Clump across iframe state WeakMaps xp + paul + qa-team/performance + qa-team/compatibility iframe-manager.ts:36-49

Reviewer summaries

Reviewer Assessment
🔍 qa-team Substantial, well-motivated fix. Main concern is the cleanup-ordering inconsistency in wrappedMutationEmit between the explicit-remove path and the safety-net path now that MutationBuffer.reset() nulls this.doc. Other findings are LOW/NIT. New tests cover the right regressions. No security, data-integrity, copy, or compatibility concerns.
👤 paul Solid work, +118MB -> ~0 is a real receipt. Biggest concern is the new cross-file coupling between wrappedMutationEmit's reparent detection and MutationBuffer.emit's drain ordering — wants a comment or test. The this.doc = null as unknown as ... cast is a small smell. Everything else is preference. Ship after a brief chat about the reparent invariant and the cast.
📐 xp Closes a real, validated leak with thorough tests. Passes the four rules of simple design overall, but the main smell is the Data Clump in IframeManager — four parallel WeakMaps keyed by the same element want to be one state object. Secondary smells: subtle behaviour change in onceIframeLoaded's "still loading" branch, forward-declared mutable function bindings papering over a scoping issue, and a small encapsulation leak via MutationBuffer.bufferDoc().

Automated by QA Swarm — not a human review

@marandaneto
Copy link
Copy Markdown
Member

@gruessi would you address the open comments so we can merge and release this? thanks!

Targeted at the correctness / safety concerns in PostHog#3570:

- record/index.ts: swap order of the safety-net cleanup so observers
  are disconnected before iframeManager releases buffers, matching the
  direct-remove path. Closes a microtask race where a queued mutation
  could land on a buffer that had already been spliced/reset.
- record/index.ts: forward-declared cleanup helpers are now
  `((id) => void) | undefined`, called via `?.()`. A premature call is
  visibly a no-op rather than a silently-swallowed cleanup.
- record/mutation.ts: drop the `this.doc = null` defence-in-depth — the
  buffer is already spliced from `mutationBuffers[]` + the cleanup
  closure is removed from `handlers[]`, so GC handles the doc. Nulling
  it created a race where a queued processMutations could pass null to
  serializeNodeWithId. `bufferDoc()` now returns `Document` (no
  pretend-nullable shape).
- record/mutation.ts: add a comment in `emit()` documenting that the
  mapRemoves drain must run before pushAdd — reparent detection in
  index.ts depends on this ordering.
- record/iframe-manager.ts: pagehide-handler registration dedups by
  Window so reparented iframes don't accumulate handlers.
- record/observer.ts: `findAndRemoveIframeBuffer` simplified now that
  `bufferDoc()` is non-nullable.
- rrweb-snapshot/src/snapshot.ts: still-loading branch of
  `onceIframeLoaded` now also attaches an `onSubsequentLoad` handler
  after the initial fire — matches the "complete" branch and ensures
  navigations of an iframe that was still loading at attach time still
  get their fresh contentDocument serialized.

31/31 memory-leak tests still pass. `tsc --noEmit` clean for both
@posthog/rrweb and @posthog/rrweb-snapshot.
@gruessi
Copy link
Copy Markdown
Author

gruessi commented May 19, 2026

@gruessi would you address the open comments so we can merge and release this? thanks!

@marandaneto thanks for the ping, and @pauldambra thanks for the review. I replied to all QA Swarm comments and addressed hopefully everything in the verdict with 84f9802. Let me know if anything is not adding up 🙏

@marandaneto
Copy link
Copy Markdown
Member

@pauldambra are you happy with their changes? should i merge if CI is green?

@pauldambra
Copy link
Copy Markdown
Member

i'll defer to @TUE and @ksvat

@gruessi
Copy link
Copy Markdown
Author

gruessi commented May 27, 2026

Hey @TUE & @ksvat, gentle nudge on this one when you get a moment 🙏 Let me know if I can do anything to help the process!

@ksvat
Copy link
Copy Markdown
Contributor

ksvat commented May 28, 2026

Hey @gruessi i'm happy to merge this once CI is passing/green!

Comment thread .changeset/iframe-memory-leak.md Outdated
Comment on lines +2 to +3
'@posthog/rrweb': patch
'@posthog/rrweb-snapshot': patch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PostHog/team-replay is that correct? iirc we dont release those packages alone anymore, should we just release posthog-js instead?

Copy link
Copy Markdown
Contributor

@ksvat ksvat May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, We only release posthog-js now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gruessi can you please add
'posthog-js': patch
to this changeset! thank you!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksvat Done! The changeset now only bumps posthog-js. Hope that works!

@posthog/rrweb-* are no longer published standalone, so the changeset
bumps posthog-js (the fix ships inside its bundle). Also corrects the
changelog body to "eight retainer chains" to match the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants