Transfer motion from a reference video onto a still image, producing a new short video where the image moves like the reference. Built on top of Lightricks LTX-2.3 using the ICLoraPipeline with IC-LoRA conditioning.
subject image reference video output video
┌──────────┐ ┌──────────────┐ ┌─────────────┐
│ 🧑 │ + │ ↻ motion ↺ │ ───────▶ │ 🧑 + motion │
└──────────┘ └──────────────┘ └─────────────┘
(silent .mp4)
This project is a derivative of Lightricks LTX-2 and is governed by the LTX-2 Community License Agreement. By cloning or using this repo you agree to those terms.
- Free for personal and small-business use. Entities with annual revenue ≥ USD 10,000,000 must obtain a paid commercial license from Lightricks at https://ltx.io/model/licensing before any use. Unauthorized commercial use triggers liquidated damages equal to 2× the commercial fee (LICENSE §2).
- Use restrictions (LICENSE Attachment A) — prohibited: deepfakes / impersonation without consent, content harming minors, disinformation, automated decision-making affecting legal rights, discrimination, harassment, and the full list in LICENSE.
- Derivative obligations (LICENSE §3) — anyone who forks or modifies this repo must redistribute under the same LTX-2 Community License, ship the full LICENSE + NOTICE, mark modified files prominently, and retain attribution.
- Outputs — you own the generated videos but must still comply with Attachment A.
- The Gemma 3 text encoder is © Google, used under the Gemma Terms of Use.
See NOTICE for the full third-party attribution and modified-file list.
- Subject image — locks in who/what appears in the output (the person, clothes, background style).
- Reference video — supplies how things move (motion, pose, camera path).
- Text prompt — guides the overall scene description.
- The model — combines all three into a new video, in two stages: a fast low-res pass to lock in motion + composition, then a refinement pass that upsamples to the target resolution.
The output is a silent MP4 (audio is stripped automatically).
# 1. Clone
git clone git@github.com:PrachiUkey/motion_transfer.git
cd motion_transfer
# 2. Install Python deps (pick ONE of the two options below — see "Install" section)
uv sync --frozen && source .venv/bin/activate # option A — uv (recommended)
# -- or --
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu129 \
&& pip install -e packages/ltx-core packages/ltx-pipelines # option B — pip
# 3. Download model weights (~67 GB). Needs a HuggingFace token; see "Models" section.
python download_models.py
# 4. Generate
python main.py path/to/your_image.png
# → outputs/motion_transfer_<image_stem>.mp4The default reference video is assets/idle_avatar_15_reverse.mp4 (a 5-second idle-avatar clip). Override it with --video your_motion.mp4.
| Python | ≥ 3.10 |
| GPU | NVIDIA with ≥ 24 GB VRAM (e.g. RTX 4090, A100, H100). FP8 quantization is used to fit the 22B model in 24 GB. |
| CUDA | 12.9 (PyTorch wheels are pinned to this) |
| RAM | ≥ 32 GB (Gemma 3 text encoder runs on CPU to keep VRAM free) |
| Disk | ~70 GB for model weights + ~1 MB per output video |
You can use either uv (faster, single command, recommended) or plain pip.
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --frozen
source .venv/bin/activateuv sync --frozen reads uv.lock and installs the exact resolved versions (including the right PyTorch CUDA 12.9 wheel and the local ltx-core + ltx-pipelines workspace packages).
# Create and activate a venv (recommended)
python3 -m venv .venv && source .venv/bin/activate
# Install all third-party deps, using PyTorch's CUDA 12.9 index for torch wheels
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu129
# Install the two local packages in editable mode
pip install -e packages/ltx-core packages/ltx-pipelinesrequirements.txt is auto-generated from uv.lock (uv export --no-dev --no-emit-workspace) so the pinned versions stay in sync with the uv flow. The --extra-index-url is required so pip picks the CUDA 12.9 PyTorch wheel rather than the CPU-only default.
download_models.py fetches four sets of weights into ./models/:
| Subdir | File | Source | Size |
|---|---|---|---|
distilled/ |
ltx-2.3-22b-distilled.safetensors |
Lightricks/LTX-2.3 | ~43 GB |
upscaler/ |
ltx-2.3-spatial-upscaler-x2-1.0.safetensors |
Lightricks/LTX-2.3 | ~1 GB |
ic-lora/ |
ltx-2.3-22b-ic-lora-motion-track-control-ref0.5.safetensors |
Lightricks/LTX-2.3-22b-IC-LoRA-Motion-Track-Control | ~0.3 GB |
gemma/ |
Gemma 3 12B (5 shards + tokenizer) | google/gemma-3-12b-it-qat-q4_0-unquantized — gated | ~23 GB |
Gemma 3 is a gated Google model. Before running download_models.py:
- Sign in at https://huggingface.co.
- Visit https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized, click "Acknowledge license", and submit the form.
- Create a Read token at https://huggingface.co/settings/tokens.
Provide the token any of these ways (script checks in order):
export HF_TOKEN=hf_...export HUGGINGFACE_HUB_TOKEN=hf_...huggingface-cli login(caches token in~/.cache/huggingface/token)- Interactive prompt (the script asks if none of the above are set)
The token stays on your machine. It is not committed to the repo.
python main.py IMAGE [--video VIDEO] [--prompt PROMPT] [--output PATH] [options]| Flag | Default | Purpose |
|---|---|---|
IMAGE |
(required) | Subject image (PNG / JPG). Convert paletted/grayscale PNGs to RGB first. |
--video |
assets/idle_avatar_15_reverse.mp4 |
Reference motion video. |
--prompt |
"A person facing the camera with subtle head and shoulder movement…" | Text prompt for the output. |
--output |
outputs/motion_transfer_<image_stem>.mp4 |
Where to write the silent MP4. |
--height / --width |
768 / 768 |
Output resolution (must be multiples of 64). |
--num-frames |
121 |
Frame count (≈ 5 s at 25 fps). |
--frame-rate |
25.0 |
Output fps. |
--seed |
42 |
Reproducibility. |
--lora-strength |
0.8 |
How strongly the IC-LoRA influences output. Try 0.6–1.0. |
--video-strength |
1.0 |
How strongly the reference motion drives output. |
--image-strength |
1.0 |
How strongly the subject image anchors appearance. |
Internally, main.py:
- Sets
LTX_TEXT_ENCODER_CPU=1so Gemma runs on CPU (needed on 24 GB GPUs). - Invokes
python -m ltx_pipelines.ic_lorawith FP8 quantization. - Strips the audio stream losslessly via PyAV (no
ffmpegbinary required).
- Motion too weak → raise
--video-strength(try 1.0–1.5) or--lora-strength. - Motion too rigid / face distorts → lower
--lora-strengthto 0.5–0.7. - Subject drifts from the image → raise
--image-strength, keep the image anchored at frame 0. - Out of VRAM → lower
--height/--widthto 512, or--num-framesto 81. FP8 quantization is already on. - Faster preview → temporarily set
--num-frames 49 --height 512 --width 512for a 2-second 512² draft.
.
├── main.py # generate a single silent video
├── download_models.py # fetch all weights into models/
├── assets/
│ └── idle_avatar_15_reverse.mp4 # default reference motion
├── models/ # downloaded weights (gitignored)
├── outputs/ # generated MP4s (gitignored)
└── packages/
├── ltx-core/ # LTX-2 model implementation
└── ltx-pipelines/ # ICLoraPipeline + shared utils
GatedRepoError: 403during Gemma download → your account hasn't accepted the Gemma 3 license yet. See the HF token section.ValueError: Expected numpy array with ndim 3during run → your input image is grayscale or palette-indexed. Convert to RGB first:from PIL import Image; Image.open("img.png").convert("RGB").save("img.png")
CUBLAS_STATUS_ALLOC_FAILED→ out of GPU memory. Confirm no other processes are holding the card withnvidia-smi. Try lower resolution.
See the License & restrictions callout at the top of this README, the full LICENSE, and NOTICE for the complete attribution and modified-file list.
Pipeline source: packages/ltx-pipelines/src/ltx_pipelines/ic_lora.py.