Fix per-iteration memory growth in training loop (missing @autoreleasepool) by imdangernoodle · Pull Request #6 · rayanht/msplat

imdangernoodle · 2026-05-20T05:48:26Z

Problem

Long training runs OOM-crash partway through. On a ~800k-1.5M gaussian cloud, msplat -n 30000 dies around step ~12k with no error message and no saved scene — the process is SIGKILL'd by macOS memory pressure.

The tell that it's a per-step leak rather than cloud growth: gaussian count stays stable (~800k) while RSS climbs linearly with step count until the kill. The death step scales with -d downscale (≈3k at full res, ≈9-12k at -d 2) — consistent with a fixed per-iteration allocation that's never freed.

Cause

The training loop in cli/msplat.cpp creates autoreleased Metal objects every iteration (command buffers, encoders, transient textures via the msplat_* entry points) but there's no per-iteration autorelease pool to drain them. They accumulate for the entire run.

Fix

Renamed cli/msplat.cpp → cli/msplat.mm (the project already declares LANGUAGES CXX OBJCXX, so no CMake language changes beyond the source filename) and wrapped the per-iteration loop body in @autoreleasepool { … }. Each step's autoreleased objects now drain at the end of that step.

Verification

Same dataset + command that previously died at step ~12100 now runs to completion at 30000 and saves:

Opacity reset at step 3100
Opacity reset at step 6100
Opacity reset at step 9100
Opacity reset at step 12100      # <- prior runs died here
Saved splat.ply
  PSNR:  18.23  SSIM:  0.81  Gaussians: 801851

RSS holds flat across the run instead of climbing to the kill ceiling.

Notes

One-line behavioural change (the @autoreleasepool wrapper) + a file rename so it compiles as Obj-C++.
No change to training math or output format.

@autoreleasepool

…rowth The training loop in cli/msplat created autoreleased Metal objects (command buffers, encoders, transient textures) every step but never drained them — they accumulated for the entire run and OOM-killed the process around step ~12k on a ~800k-1.5M gaussian cloud (no error, no saved scene, just SIGKILL from memory pressure). Renamed cli/msplat.cpp → cli/msplat.mm (project already enables OBJCXX) and wrapped the per-iteration loop body in @autoreleasepool so each step's autoreleased objects drain at the end of that step instead of living until process exit. Repro before fix: `msplat -n 30000 -d 2 <407-frame-nerfstudio>` dies ~step 12100. Death step scaled with downscale (3100 at full res, 9100 at -d 2 earlier, 12100 with a smaller seed) — classic signature of a fixed per-step leak rather than cloud growth (gaussian count was stable at ~800k while memory still climbed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rayanht · 2026-05-21T21:27:29Z

Hey, thanks for the contribution!

I'll pull this locally real quick and re-run benchmarks to ensure that the GC cycles don't impact performance too much and this will likely get merged tonight.

@TaehoKim86, I expect this will help #5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix per-iteration memory growth in training loop (missing @autoreleasepool)#6

Fix per-iteration memory growth in training loop (missing @autoreleasepool)#6
imdangernoodle wants to merge 1 commit into
rayanht:mainfrom
imdangernoodle:fix/training-loop-autoreleasepool

imdangernoodle commented May 20, 2026

Uh oh!

rayanht commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imdangernoodle commented May 20, 2026

Problem

Cause

Fix

Verification

Notes

Uh oh!

rayanht commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants