Skip to content

waitForGpu fence-signal race can make completed values non-monotonic; resize swallows device-removed #81

Description

@duremovich

From the 2026-07-03 comprehensive review.

waitForGpu() signals both m_showFence (D3D12Renderer.cpp:1416) and m_editorFence (:1430) with GetCompletedValue()+1, and is called from BOTH threads — editor: resize (:503), screenshot staging (:4138), uploadVideoFrameToSlotImmediate (:4461); show: resizeComposeTarget (:3646), destroyOutputWindow (:3985), resizeOutputWindow (:4014). Meanwhile the peer thread signals the same fence with its per-slot high-water value (endShowFrame :686, moveToNextFrame :1466). Two threads signaling one fence with independently-derived values are queue-serialized but in nondeterministic order; if the lower drain value lands after a higher per-slot signal, GetCompletedValue goes backwards, breaking the monotonicity the allocator-reuse gates rely on (beginShowFrame :556-558, moveToNextFrame :1479).

Worst case: editor-window resize during playback → show gate misreads GPU as done → beginShowFrame resets an allocator whose commands are still executing → command-list corruption / device removed. (Precise interleaving UNCERTAIN to reproduce; the fence-value race itself is concrete.)

Fix: dedicated monotonic drain-fence owned by waitForGpu, or serialize begin/end-frame vs waitForGpu with a render mutex.

Also in scope: resize() swallows device-removed — ResizeBuffers failure (:519-522) returns Result::Failure without checking DXGI_ERROR_DEVICE_REMOVED/RESET/HUNG or latching handleDeviceLost, so the editor keeps running against a dead device until a fence timeout notices.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtier:coreLives in core (GPLv3, in-repo); always free, fully featured

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions