Skip to content

feat: Time-to-Move sampling#13707

Draft
Pizzawookiee wants to merge 10 commits intoComfy-Org:masterfrom
Pizzawookiee:time_to_move
Draft

feat: Time-to-Move sampling#13707
Pizzawookiee wants to merge 10 commits intoComfy-Org:masterfrom
Pizzawookiee:time_to_move

Conversation

@Pizzawookiee
Copy link
Copy Markdown

@Pizzawookiee Pizzawookiee commented May 4, 2026

Adds native support for #10745 which is a request for https://github.com/time-to-move/TTM

image

There is an implementation in the WanVideoWrapper repository but this one should in theory work for other video models as well.

Adds two nodes:
TimeToMoveKSamplerAdvanced, which modifies KSamplerAdvanced to add inputs for a latent_mask and a time_to_move_end_at_step. Logic is as follows:

  1. The input latent should be the VAE encoded 'crude reference animation' as mentioned in the abstract reproduced above.
  2. The latent is cloned at the start to produce a reference latent.
  3. After each sampling step in [start_at step, time_to_move_end_at_step), reference latent is noised and composited onto the partially sampled latent using the input latent_mask.
  4. Starting at time_to_move_end_at_step, the compositing stops and the latent is sampled as usual.

RGBMaskToLatentMask, which uses the input VAE's downsampling factor to create a compatible latent mask to input into the TimeToMoveKSamplerAdvanced. For example, a video mask that is 81 frames long will be interpolated using nearest neighbors to a 21 frame mask if Wan VAE is the input.

Considerations:

  1. For Wan Video I2V workflows, the conditionings from WanImagetoVideo node are required as those conditionings include a VAE encoded latent containing the start image. However we don't use the latent from WanImagetoVideo and instead use the encoded reference animation as input latent. Likely to be the case for other video models as well.

Issues

  1. The loop I use to retrieve the latent image after each sampling step causes issues with the visual progress bar on top of the node, such that it will appear at 100 percent after the first step.
  2. While I believe this matches the compositing logic seen in the original Time-to-Move repo, I am unable to get particularly impressive results on a Wan 2.2 low-step workflow (compared to only using a later start_at_step to begin sampling with the reference animation latent) and lack the time and resources to test this with a 50-step workflow used in the original Time-to-Move implementation.

@Pizzawookiee Pizzawookiee marked this pull request as ready for review May 4, 2026 19:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

Adds latent-space video compositing and a step-windowed KSampler workflow plus RGB→latent mask conversion. New functions in comfy_extras/nodes_custom_sampler.py: video_latent_composite (blends 5D latents with spatial offsets and optional mask), time_to_move_sample (iterative per-step KSampler with optional recomputed noisy latents and compositing), and time_to_move_common_ksampler (prepares latent/noise, callbacks, returns updated latent). New node TimeToMoveKSamplerAdvanced registered. In comfy_extras/nodes_mask.py: convert_rgb_mask_to_latent_mask (temporal sampling + spatial nearest-resize) and node RGBMaskToLatentMask registered.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Time-to-Move sampling' directly and specifically summarizes the main change: adding Time-to-Move sampling functionality to the codebase.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the new nodes (TimeToMoveKSamplerAdvanced and RGBMaskToLatentMask), their behavior, and known issues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_custom_sampler.py`:
- Around line 86-117: The loop can be skipped when start_step >= effective last
step, leaving samples undefined; fix by guarding/initializing before the loop
(check the same condition used in the for: if min(last_step, steps) - start_step
<= 0) and set samples to a valid tensor (e.g., convert latent_image to the
expected device/dtype via
comfy.model_management.intermediate_device()/intermediate_dtype() or simply
assign latent_image) and return it, or initialize samples to latent_image before
entering the loop; update the code around the for-loop that uses start_step,
last_step, steps, samples, and latent_image to ensure samples is always defined
when returned.

In `@comfy_extras/nodes_mask.py`:
- Around line 49-63: The temporal sampling is incorrect: don’t assume
vae.downscale_ratio is a tuple or that a stride k reconstructs the latent frame
count; instead derive the target temporal length explicitly (e.g., accept a
target_latent_T or compute it from vae.downscale_ratio whether it’s an int or a
tuple) and resample the input mask to that exact T using a proper mapping (use
torch.linspace to create target frame indices and gather/interpolate along dim=0
or use torch.nn.functional.interpolate after adding a channel dim) rather than
mask[0:1] + mask[1::k]; update convert_rgb_mask_to_latent_mask to compute
target_T robustly from vae.downscale_ratio or an explicit parameter and replace
the mask0/mask1 sampling with index-based gather/interpolate, and apply the same
fix at the other occurrence (lines ~455-457) so the returned latent_mask frame
count matches the VAE latent frames.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45eb2c66-072e-4b5c-b596-194a18865abf

📥 Commits

Reviewing files that changed from the base of the PR and between c33d26c and ae54d7a.

📒 Files selected for processing (2)
  • comfy_extras/nodes_custom_sampler.py
  • comfy_extras/nodes_mask.py

Comment thread comfy_extras/nodes_custom_sampler.py Outdated
Comment thread comfy_extras/nodes_mask.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comfy_extras/nodes_mask.py (1)

451-457: ⚡ Quick win

Consider adding a type guard for clearer error messages.

The description correctly documents that this node is for causal Video VAEs, but if a user accidentally wires a non-video VAE (which has a scalar downscale_ratio), they'll get a cryptic "'int' object is not subscriptable" error. A quick check would improve the UX.

💡 Optional: Add validation for VAE type
     `@classmethod`
     def execute(cls, mask, vae) -> IO.NodeOutput:
         # Ensure we work on a copy of the mask to remain non-destructive
         mask_copy = mask.clone()
         downscale_ratio = vae.downscale_ratio
+        if not isinstance(downscale_ratio, tuple) or len(downscale_ratio) < 3:
+            raise ValueError("RGBMaskToLatentMask requires a causal Video VAE (e.g., Wan). The provided VAE does not have a compatible downscale_ratio.")
         k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1
         return IO.NodeOutput(convert_rgb_mask_to_latent_mask(mask_copy, k, spatial_downsample_h = downscale_ratio[1], spatial_downsample_w = downscale_ratio[2]))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_mask.py` around lines 451 - 457, The execute method
assumes vae.downscale_ratio is indexable (video VAE) but a scalar (image VAE)
causes "'int' object is not subscriptable"; add a type guard at the start of
execute to validate vae.downscale_ratio is a sequence/iterable with at least
three elements (or supports __getitem__ for indices 0..2), and if not raise a
clear ValueError (or TypeError) referencing the node and value (e.g.,
"nodes_mask.execute expects video VAE with downscale_ratio like (h,w,...) but
got <value>"); keep using mask.clone(), downscale_ratio, k calculation and
convert_rgb_mask_to_latent_mask/IO.NodeOutput after the guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@comfy_extras/nodes_mask.py`:
- Around line 451-457: The execute method assumes vae.downscale_ratio is
indexable (video VAE) but a scalar (image VAE) causes "'int' object is not
subscriptable"; add a type guard at the start of execute to validate
vae.downscale_ratio is a sequence/iterable with at least three elements (or
supports __getitem__ for indices 0..2), and if not raise a clear ValueError (or
TypeError) referencing the node and value (e.g., "nodes_mask.execute expects
video VAE with downscale_ratio like (h,w,...) but got <value>"); keep using
mask.clone(), downscale_ratio, k calculation and
convert_rgb_mask_to_latent_mask/IO.NodeOutput after the guard.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d23c08ef-8557-4a57-b996-ae09abc2b794

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd41ef and d56a093.

📒 Files selected for processing (1)
  • comfy_extras/nodes_mask.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@comfy_extras/nodes_mask.py`:
- Line 458: The computation for k incorrectly uses the guard
(downscale_ratio[0](mask.shape[0]) - 1) > 1 which forces k=1 when T_latent==2;
change it to only guard against division by zero by checking > 0 and call the
downscale callable once: store ratio = downscale_ratio[0](mask.shape[0]) then
compute k = (mask.shape[0] - 1) // (ratio - 1) if (ratio - 1) > 0 else 1 so
T_latent==2 yields the correct k = T - 1 and avoids the double call to
downscale_ratio[0].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06107a97-6de4-4031-a81e-e36584382b7b

📥 Commits

Reviewing files that changed from the base of the PR and between d56a093 and de97192.

📒 Files selected for processing (1)
  • comfy_extras/nodes_mask.py

downscale_ratio = vae.downscale_ratio
if not isinstance(downscale_ratio, tuple) or len(downscale_ratio) < 3:
raise ValueError("RGBMaskToLatentMask requires a causal Video VAE (e.g., Wan). The provided VAE does not have a compatible downscale_ratio.")
k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

k guard condition > 1 silently misbehaves when T_latent == 2.

The condition (downscale_ratio[0](mask.shape[0]) - 1) > 1 evaluates to False when T_latent - 1 == 1 (i.e., T_latent == 2), so k is hard-clamped to 1 instead of the correct k = T - 1. A mask with 81 input frames would be returned with all 81 temporal frames rather than the expected 2, silently producing a shape mismatch downstream. The guard only needs to avoid division by zero, so the threshold should be > 0.

🐛 Proposed fix
-        k = (mask.shape[0] - 1) // (downscale_ratio[0](mask.shape[0]) - 1) if (downscale_ratio[0](mask.shape[0]) - 1) > 1 else 1
+        t_latent = downscale_ratio[0](mask.shape[0])
+        k = (mask.shape[0] - 1) // (t_latent - 1) if (t_latent - 1) > 0 else 1

The refactor also avoids calling the downscale_ratio[0] callable twice.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@comfy_extras/nodes_mask.py` at line 458, The computation for k incorrectly
uses the guard (downscale_ratio[0](mask.shape[0]) - 1) > 1 which forces k=1 when
T_latent==2; change it to only guard against division by zero by checking > 0
and call the downscale callable once: store ratio =
downscale_ratio[0](mask.shape[0]) then compute k = (mask.shape[0] - 1) // (ratio
- 1) if (ratio - 1) > 0 else 1 so T_latent==2 yields the correct k = T - 1 and
avoids the double call to downscale_ratio[0].

Comment thread comfy_extras/nodes_mask.py
Comment thread comfy_extras/nodes_custom_sampler.py
@Pizzawookiee Pizzawookiee marked this pull request as draft May 4, 2026 20:15
@kijai
Copy link
Copy Markdown
Collaborator

kijai commented May 4, 2026

Hey, have you tried my implementation in KJNodes, and if so is it lacking something? I called it LatentInpaintTTM because it's really just bit different inpaint method and can be implemented with much less code in general. If it's adequate, it could be added to core with little effort.

@Pizzawookiee
Copy link
Copy Markdown
Author

Pizzawookiee commented May 4, 2026

LatentInpaintTTM

@kijai I was not aware of your implementation, thanks for letting me know. Yes, would love to see this in core. And your implementation is quite helpful in teaching me how to patch the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants