feat: Time-to-Move sampling#13707
Conversation
📝 WalkthroughWalkthroughAdds latent-space video compositing and a step-windowed KSampler workflow plus RGB→latent mask conversion. New functions in comfy_extras/nodes_custom_sampler.py: video_latent_composite (blends 5D latents with spatial offsets and optional mask), time_to_move_sample (iterative per-step KSampler with optional recomputed noisy latents and compositing), and time_to_move_common_ksampler (prepares latent/noise, callbacks, returns updated latent). New node TimeToMoveKSamplerAdvanced registered. In comfy_extras/nodes_mask.py: convert_rgb_mask_to_latent_mask (temporal sampling + spatial nearest-resize) and node RGBMaskToLatentMask registered. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_extras/nodes_custom_sampler.py`:
- Around line 86-117: The loop can be skipped when start_step >= effective last
step, leaving samples undefined; fix by guarding/initializing before the loop
(check the same condition used in the for: if min(last_step, steps) - start_step
<= 0) and set samples to a valid tensor (e.g., convert latent_image to the
expected device/dtype via
comfy.model_management.intermediate_device()/intermediate_dtype() or simply
assign latent_image) and return it, or initialize samples to latent_image before
entering the loop; update the code around the for-loop that uses start_step,
last_step, steps, samples, and latent_image to ensure samples is always defined
when returned.
In `@comfy_extras/nodes_mask.py`:
- Around line 49-63: The temporal sampling is incorrect: don’t assume
vae.downscale_ratio is a tuple or that a stride k reconstructs the latent frame
count; instead derive the target temporal length explicitly (e.g., accept a
target_latent_T or compute it from vae.downscale_ratio whether it’s an int or a
tuple) and resample the input mask to that exact T using a proper mapping (use
torch.linspace to create target frame indices and gather/interpolate along dim=0
or use torch.nn.functional.interpolate after adding a channel dim) rather than
mask[0:1] + mask[1::k]; update convert_rgb_mask_to_latent_mask to compute
target_T robustly from vae.downscale_ratio or an explicit parameter and replace
the mask0/mask1 sampling with index-based gather/interpolate, and apply the same
fix at the other occurrence (lines ~455-457) so the returned latent_mask frame
count matches the VAE latent frames.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 45eb2c66-072e-4b5c-b596-194a18865abf
📒 Files selected for processing (2)
comfy_extras/nodes_custom_sampler.pycomfy_extras/nodes_mask.py
Clarifies that node is intended to work with causal Video VAEs.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy_extras/nodes_mask.py (1)
451-457: ⚡ Quick winConsider adding a type guard for clearer error messages.
The description correctly documents that this node is for causal Video VAEs, but if a user accidentally wires a non-video VAE (which has a scalar
downscale_ratio), they'll get a cryptic"'int' object is not subscriptable"error. A quick check would improve the UX.💡 Optional: Add validation for VAE type
`@classmethod` def execute(cls, mask, vae) -> IO.NodeOutput: # Ensure we work on a copy of the mask to remain non-destructive mask_copy = mask.clone() downscale_ratio = vae.downscale_ratio + if not isinstance(downscale_ratio, tuple) or len(downscale_ratio) < 3: + raise ValueError("RGBMaskToLatentMask requires a causal Video VAE (e.g., Wan). The provided VAE does not have a compatible downscale_ratio.") k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1 return IO.NodeOutput(convert_rgb_mask_to_latent_mask(mask_copy, k, spatial_downsample_h = downscale_ratio[1], spatial_downsample_w = downscale_ratio[2]))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@comfy_extras/nodes_mask.py` around lines 451 - 457, The execute method assumes vae.downscale_ratio is indexable (video VAE) but a scalar (image VAE) causes "'int' object is not subscriptable"; add a type guard at the start of execute to validate vae.downscale_ratio is a sequence/iterable with at least three elements (or supports __getitem__ for indices 0..2), and if not raise a clear ValueError (or TypeError) referencing the node and value (e.g., "nodes_mask.execute expects video VAE with downscale_ratio like (h,w,...) but got <value>"); keep using mask.clone(), downscale_ratio, k calculation and convert_rgb_mask_to_latent_mask/IO.NodeOutput after the guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@comfy_extras/nodes_mask.py`:
- Around line 451-457: The execute method assumes vae.downscale_ratio is
indexable (video VAE) but a scalar (image VAE) causes "'int' object is not
subscriptable"; add a type guard at the start of execute to validate
vae.downscale_ratio is a sequence/iterable with at least three elements (or
supports __getitem__ for indices 0..2), and if not raise a clear ValueError (or
TypeError) referencing the node and value (e.g., "nodes_mask.execute expects
video VAE with downscale_ratio like (h,w,...) but got <value>"); keep using
mask.clone(), downscale_ratio, k calculation and
convert_rgb_mask_to_latent_mask/IO.NodeOutput after the guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d23c08ef-8557-4a57-b996-ae09abc2b794
📒 Files selected for processing (1)
comfy_extras/nodes_mask.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@comfy_extras/nodes_mask.py`:
- Line 458: The computation for k incorrectly uses the guard
(downscale_ratio[0](mask.shape[0]) - 1) > 1 which forces k=1 when T_latent==2;
change it to only guard against division by zero by checking > 0 and call the
downscale callable once: store ratio = downscale_ratio[0](mask.shape[0]) then
compute k = (mask.shape[0] - 1) // (ratio - 1) if (ratio - 1) > 0 else 1 so
T_latent==2 yields the correct k = T - 1 and avoids the double call to
downscale_ratio[0].
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 06107a97-6de4-4031-a81e-e36584382b7b
📒 Files selected for processing (1)
comfy_extras/nodes_mask.py
| downscale_ratio = vae.downscale_ratio | ||
| if not isinstance(downscale_ratio, tuple) or len(downscale_ratio) < 3: | ||
| raise ValueError("RGBMaskToLatentMask requires a causal Video VAE (e.g., Wan). The provided VAE does not have a compatible downscale_ratio.") | ||
| k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1 |
There was a problem hiding this comment.
k guard condition > 1 silently misbehaves when T_latent == 2.
The condition (downscale_ratio[0](mask.shape[0]) - 1) > 1 evaluates to False when T_latent - 1 == 1 (i.e., T_latent == 2), so k is hard-clamped to 1 instead of the correct k = T - 1. A mask with 81 input frames would be returned with all 81 temporal frames rather than the expected 2, silently producing a shape mismatch downstream. The guard only needs to avoid division by zero, so the threshold should be > 0.
🐛 Proposed fix
- k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1
+ t_latent = downscale_ratio[0](mask.shape[0])
+ k = (mask.shape[0] - 1) // (t_latent - 1) if (t_latent - 1) > 0 else 1The refactor also avoids calling the downscale_ratio[0] callable twice.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@comfy_extras/nodes_mask.py` at line 458, The computation for k incorrectly
uses the guard (downscale_ratio[0](mask.shape[0]) - 1) > 1 which forces k=1 when
T_latent==2; change it to only guard against division by zero by checking > 0
and call the downscale callable once: store ratio =
downscale_ratio[0](mask.shape[0]) then compute k = (mask.shape[0] - 1) // (ratio
- 1) if (ratio - 1) > 0 else 1 so T_latent==2 yields the correct k = T - 1 and
avoids the double call to downscale_ratio[0].
|
Hey, have you tried my implementation in KJNodes, and if so is it lacking something? I called it |
@kijai I was not aware of your implementation, thanks for letting me know. Yes, would love to see this in core. And your implementation is quite helpful in teaching me how to patch the model. |
Adds native support for #10745 which is a request for https://github.com/time-to-move/TTM
There is an implementation in the WanVideoWrapper repository but this one should in theory work for other video models as well.
Adds two nodes:
TimeToMoveKSamplerAdvanced, which modifies KSamplerAdvanced to add inputs for a latent_mask and a time_to_move_end_at_step. Logic is as follows:
RGBMaskToLatentMask, which uses the input VAE's downsampling factor to create a compatible latent mask to input into the TimeToMoveKSamplerAdvanced. For example, a video mask that is 81 frames long will be interpolated using nearest neighbors to a 21 frame mask if Wan VAE is the input.
Considerations:
Issues