Skip to content

nodes: skip self.model.to(device_to) walk in unpatch_model on mmap'd weights#445

Open
Booyaka101 wants to merge 2 commits into
city96:mainfrom
Booyaka101:fix/unpatch-model-mmap-access-violation
Open

nodes: skip self.model.to(device_to) walk in unpatch_model on mmap'd weights#445
Booyaka101 wants to merge 2 commits into
city96:mainfrom
Booyaka101:fix/unpatch-model-mmap-access-violation

Conversation

@Booyaka101
Copy link
Copy Markdown

@Booyaka101 Booyaka101 commented May 10, 2026

Resolves #444.

GGUFModelPatcher.unpatch_model forwards to base ModelPatcher.unpatch_model, which finishes its unpatch_weights=True block with self.model.to(device_to). That walk uses nn.Module.to and iterates every parameter, ignoring comfy_cast_weights = True on GGMLLayer. The walk hits the still-mmap'd quantized tensors and faults with EXCEPTION_ACCESS_VIOLATION on Windows. Repro: any workflow that triggers a cached-patcher flush during sampling (e.g. a UnetLoaderGGUF wired into a multi-output node).

The cleanest fix is upstream, since the same conflict will hit any custom op that uses comfy_cast_weights with tensors that cannot survive nn.Module.to. Tracked at Comfy-Org/ComfyUI#14142.

In the meantime this override inlines base's unpatch_weights=True block and skips just that one line. The block has to run in full because partially_load reads model_loaded_weight_memory immediately after unpatch_model and short-circuits self.load() if it is nonzero - so a partial mirror (the previous attempt at this PR) left .patches cleared without re-attaching anything on LoRA strength changes (caught by @romybaby on this thread). The lines this override now runs map one-for-one to comfy's model_patcher.py body:

  • eject_model
  • unpatch_hooks, unpin_all_weights
  • lowvram cleanup via move_weight_functions / wipe_lowvram_weight
  • non-quantized backup restoration
  • current_weight_patches_uuid = None, backup.clear()
  • model_loaded_weight_memory = 0, model_offload_buffer_memory = 0
  • comfy_patched_weights deletion per module
  • final super().unpatch_model(unpatch_weights=False) for eject_model (idempotent) and object_patches_backup restoration

Drift risk noted: if upstream adds something to that block, this override needs a sync. Easier to spot since the structure now matches line for line.

Credit @stonerabit for the original repro and diagnosis on #444, and @romybaby for catching the LoRA strength-change regression in the previous iteration.

…ap walk

The override forwards unpatch_weights=True to comfy core's unpatch_model,
which then calls self.model.to(device_to). Walking that .to() over the
GGUF-quantized tensors that are still mmap-backed crashes with a Windows
access violation (city96#444), surfacing whenever ComfyUI actually invokes
unpatch_model — easiest repro is wrapping UnetLoaderGGUF in a node that
returns multiple outputs, which flushes cache and triggers the unpatch
path.

The maintainer mitigated this in Sep 2024 (commit 6dbb4ba) by passing
device_to=None to super, then reverted in 717a0e1 because that broke
VRAM estimation. Reporter's proposed fix (passing unpatch_weights=False
to super) avoids the crash but skips comfy core's non-quantized backup
restoration — patch_weight_to_device's `if key not in self.backup`
guard then never re-saves the original weight, and the next run's
temp_weight = weight.to(...) reads the still-patched value, so LoRA
deltas compound across runs.

Restore the non-quantized backups in the override mirroring comfy
core's loop (copy_to_param / set_attr_param + backup.clear() +
comfy_patched_weights cleanup), then forward unpatch_weights=False so
super skips its weight-restore + model.to(device_to) block. Keeps the
crash fix without regressing LoRA correctness or VRAM accounting.

Verified with a stub-based reproducer (verify_unpatch.py) covering
three scenarios:
  pre-fix       crashes on simulated mmap .to() — matches city96#444
  reporter fix  avoids crash, but non-quantized weight stays patched
  this fix      avoids crash AND restores non-quantized backup correctly

Closes city96#444

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@romybaby
Copy link
Copy Markdown

Confirmed working on real CUDA. RTX 3060 (12GB), Windows 10, ComfyUI 0.22.2, PyTorch 2.8.0+cu129. Running Flux 2 Klein 9b, with both TE and diffusion model as GGUFs. Was facing an issue where any attempt to unload GGUF model weights (either manually or automatically, in my case triggered by normal between-run model caching) led to the same "Windows fatal exception: access violation" fault described above. After applying this patch, both manual and automatic unloads work and repeated runs are stable. LoRAs produce identical output across multiple runs with the same seed.

@romybaby
Copy link
Copy Markdown

After further usage, one edge case to report: changing LoRA strength between runs produces output as if no LoRA is applied at all. Appears the quantized tensors' .patches are cleared correctly on unpatch, but not re-attached on the subsequent in-place re-patch since the model stays resident (unsure about this, though). A full model reload between strength changes fixes it.

…override

The previous override forwarded unpatch_weights=False to super to dodge the
self.model.to(device_to) walk that faults on mmap'd quantized tensors, but
that also skipped the model_loaded_weight_memory / model_offload_buffer_memory
reset super does in the same block. partially_load checks that counter
immediately after calling unpatch_model and short-circuits self.load() if it
is nonzero, so subsequent LoRA strength changes left .patches cleared without
re-attaching anything (no LoRA applied at all on re-runs).

Inline the full unpatch_weights body from base ModelPatcher minus that one
device walk: hooks/pin/lowvram cleanup, backup restore, uuid reset, memory
counters, comfy_patched_weights deletion. Upstream tracked at
Comfy-Org/ComfyUI#14142.
@Booyaka101 Booyaka101 changed the title fix(nodes): restore non-quantized backups locally and skip super's mmap walk nodes: skip self.model.to(device_to) walk in unpatch_model on mmap'd weights May 27, 2026
@Booyaka101
Copy link
Copy Markdown
Author

@romybaby thanks for catching this, the regression is real and caused by the previous version of this PR.

What happened: forwarding unpatch_weights=False to base unpatch_model dodged the self.model.to(device_to) crash, but it also skipped the model_loaded_weight_memory = 0 and model_offload_buffer_memory = 0 lines that live in the same block. partially_load reads model_loaded_weight_memory right after unpatch_model and short-circuits self.load() if it is nonzero, so on a LoRA strength change the quantized .patches got cleared but never re-attached. Result: no LoRA applied on the next run, exactly what you saw. Full model reload worked because cold load does not go through that check.

Just pushed 32414a1 which inlines the whole unpatch_weights=True body minus the offending device walk. That keeps the memory counters zeroed so load() runs and re-attaches .patches at the new strength. The upstream feature request that would let this override collapse to a one-liner is at Comfy-Org/ComfyUI#14142.

Worth retesting your strength-change workflow with this commit if you have a moment.

@romybaby
Copy link
Copy Markdown

Thanks for the quick response! The commit works perfectly on my end, changing strengths between runs now correctly reapplies the LoRA at the new strength level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] Windows fatal exception: access violation when node has multiple outputs

2 participants