nodes: skip self.model.to(device_to) walk in unpatch_model on mmap'd weights#445
nodes: skip self.model.to(device_to) walk in unpatch_model on mmap'd weights#445Booyaka101 wants to merge 2 commits into
Conversation
…ap walk The override forwards unpatch_weights=True to comfy core's unpatch_model, which then calls self.model.to(device_to). Walking that .to() over the GGUF-quantized tensors that are still mmap-backed crashes with a Windows access violation (city96#444), surfacing whenever ComfyUI actually invokes unpatch_model — easiest repro is wrapping UnetLoaderGGUF in a node that returns multiple outputs, which flushes cache and triggers the unpatch path. The maintainer mitigated this in Sep 2024 (commit 6dbb4ba) by passing device_to=None to super, then reverted in 717a0e1 because that broke VRAM estimation. Reporter's proposed fix (passing unpatch_weights=False to super) avoids the crash but skips comfy core's non-quantized backup restoration — patch_weight_to_device's `if key not in self.backup` guard then never re-saves the original weight, and the next run's temp_weight = weight.to(...) reads the still-patched value, so LoRA deltas compound across runs. Restore the non-quantized backups in the override mirroring comfy core's loop (copy_to_param / set_attr_param + backup.clear() + comfy_patched_weights cleanup), then forward unpatch_weights=False so super skips its weight-restore + model.to(device_to) block. Keeps the crash fix without regressing LoRA correctness or VRAM accounting. Verified with a stub-based reproducer (verify_unpatch.py) covering three scenarios: pre-fix crashes on simulated mmap .to() — matches city96#444 reporter fix avoids crash, but non-quantized weight stays patched this fix avoids crash AND restores non-quantized backup correctly Closes city96#444 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Confirmed working on real CUDA. RTX 3060 (12GB), Windows 10, ComfyUI 0.22.2, PyTorch 2.8.0+cu129. Running Flux 2 Klein 9b, with both TE and diffusion model as GGUFs. Was facing an issue where any attempt to unload GGUF model weights (either manually or automatically, in my case triggered by normal between-run model caching) led to the same "Windows fatal exception: access violation" fault described above. After applying this patch, both manual and automatic unloads work and repeated runs are stable. LoRAs produce identical output across multiple runs with the same seed. |
|
After further usage, one edge case to report: changing LoRA strength between runs produces output as if no LoRA is applied at all. Appears the quantized tensors' .patches are cleared correctly on unpatch, but not re-attached on the subsequent in-place re-patch since the model stays resident (unsure about this, though). A full model reload between strength changes fixes it. |
…override The previous override forwarded unpatch_weights=False to super to dodge the self.model.to(device_to) walk that faults on mmap'd quantized tensors, but that also skipped the model_loaded_weight_memory / model_offload_buffer_memory reset super does in the same block. partially_load checks that counter immediately after calling unpatch_model and short-circuits self.load() if it is nonzero, so subsequent LoRA strength changes left .patches cleared without re-attaching anything (no LoRA applied at all on re-runs). Inline the full unpatch_weights body from base ModelPatcher minus that one device walk: hooks/pin/lowvram cleanup, backup restore, uuid reset, memory counters, comfy_patched_weights deletion. Upstream tracked at Comfy-Org/ComfyUI#14142.
|
@romybaby thanks for catching this, the regression is real and caused by the previous version of this PR. What happened: forwarding Just pushed 32414a1 which inlines the whole Worth retesting your strength-change workflow with this commit if you have a moment. |
|
Thanks for the quick response! The commit works perfectly on my end, changing strengths between runs now correctly reapplies the LoRA at the new strength level. |
Resolves #444.
GGUFModelPatcher.unpatch_modelforwards to baseModelPatcher.unpatch_model, which finishes itsunpatch_weights=Trueblock withself.model.to(device_to). That walk usesnn.Module.toand iterates every parameter, ignoringcomfy_cast_weights = TrueonGGMLLayer. The walk hits the still-mmap'd quantized tensors and faults withEXCEPTION_ACCESS_VIOLATIONon Windows. Repro: any workflow that triggers a cached-patcher flush during sampling (e.g. aUnetLoaderGGUFwired into a multi-output node).The cleanest fix is upstream, since the same conflict will hit any custom op that uses
comfy_cast_weightswith tensors that cannot survivenn.Module.to. Tracked at Comfy-Org/ComfyUI#14142.In the meantime this override inlines base's
unpatch_weights=Trueblock and skips just that one line. The block has to run in full becausepartially_loadreadsmodel_loaded_weight_memoryimmediately afterunpatch_modeland short-circuitsself.load()if it is nonzero - so a partial mirror (the previous attempt at this PR) left.patchescleared without re-attaching anything on LoRA strength changes (caught by @romybaby on this thread). The lines this override now runs map one-for-one to comfy'smodel_patcher.pybody:eject_modelunpatch_hooks,unpin_all_weightsmove_weight_functions/wipe_lowvram_weightcurrent_weight_patches_uuid = None,backup.clear()model_loaded_weight_memory = 0,model_offload_buffer_memory = 0comfy_patched_weightsdeletion per modulesuper().unpatch_model(unpatch_weights=False)foreject_model(idempotent) andobject_patches_backuprestorationDrift risk noted: if upstream adds something to that block, this override needs a sync. Easier to spot since the structure now matches line for line.
Credit @stonerabit for the original repro and diagnosis on #444, and @romybaby for catching the LoRA strength-change regression in the previous iteration.