Skip to content

chen-yy20/SmartDiffusion

Repository files navigation

Smart-Diffusion

中文版 | Why Smart-Diffusion?

License Python 3.12+ CUDA 12.4+

Smart-Diffusion is a high-performance diffusion model inference framework built on Chitu. It provides extreme performance and flexible scheduling for AI-generated content (AIGC) workloads.

Overview

Smart-Diffusion is the pure enjoyment version of Chitu-Diffusion, developed by the PACMAN team from Tsinghua University and QingCheng.ai. We aim to provide support for the rapidly growing Diffusion ecosystem by restructuring DiT models under the API and scheduling philosophy of Chitu, maintaining scheduling flexibility while offering extreme performance.

Key Features

  • 🚀 High Performance: Optimized diffusion inference with advanced parallelism strategies
  • 🔧 Flexible Architecture: Support for multiple attention backends (FlashAttention, SageAttention, SpargeAttention)
  • 💾 Memory Efficient: Low memory mode with model offloading and VAE tiling
  • 📊 Feature Cache: Unified FlexCache API for TeaCache, PAB, and DiTango
  • 🎯 Easy to Use: Simple API with per-request parameter configuration
  • 🌐 Multi-Model: Currently supports Wan-T2V series (1.3B, 14B, A14B) with more coming soon

Design Philosophy

Smart-Diffusion follows three core pillars:

  1. Parallelism: Context parallelism (CP), CFG parallelism, and data parallelism
  2. Kernels: Optimized attention implementations with quantization support
  3. Algorithms: Feature reuse and caching strategies for acceleration

See Why Smart-Diffusion? for detailed design philosophy.

Table of Contents

Installation

Prerequisites

  • Python 3.12+
  • CUDA 12.4+ (recommended: 12.8)
  • NVIDIA GPU with compute capability 8.0+ (Ampere) or 9.0+ (Hopper/Blackwell)

Quick Start with uv

We recommend using uv for a smoother installation experience.

1. Clone the repository

git clone git@github.com:chen-yy20/SmartDiffusion.git
cd SmartDiffusion

1.1 Clone the submodules

Option 1: Clone all submodules (sage-attn, sparge-attn and vbench)

git submodule update --init --recursive 

Option 2: Clone only a specific submodule. For example, to clone only the sage/sparge_attn submodule:

git submodule update --init third_party/sage_attn
git submodule update --init third_party/sparge_attn

2. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

See uv documentation for more details.

3. Configure build settings

Check your CUDA version:

nvcc --version

Edit pyproject.toml to match your CUDA version. For CUDA 12.8:

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-cu128" }
torchvision = { index = "pytorch-cu128" }

Configure GPU architecture in pyproject.toml:

[tool.uv.extra-build-variables]
# Set TORCH_CUDA_ARCH_LIST according to your GPU
# Ampere: 8.0, Hopper: 9.0, Blackwell: 9.0
sageattention = { 
    EXT_PARALLEL= "4", 
    NVCC_APPEND_FLAGS="--threads 8", 
    MAX_JOBS="32", 
    "TORCH_CUDA_ARCH_LIST" = "8.0;9.0"
}
spas_sage_attn = { 
    EXT_PARALLEL= "4", 
    NVCC_APPEND_FLAGS="--threads 8", 
    MAX_JOBS="32", 
    "TORCH_CUDA_ARCH_LIST" = "8.0;9.0"
}

4. Install dependencies

# Required installation (base dependencies in [project.dependencies]) | 30mins
uv sync -v 2>&1 | tee uv_sync.log

# Optional extras from [project.optional-dependencies]
# SageAttention
uv sync -v --extra sage 2>&1 | tee build_sage.log

# SpargeAttention
uv sync -v --extra sparge 2>&1 | tee build_sparge.log

# VBench evaluation toolkit
uv sync -v --extra vbench 2>&1 | tee build_vbench.log

# Evaluation metrics (FID/FVD/PSNR/SSIM/LPIPS)
uv sync -v --extra eval 2>&1 | tee build_eval.log

# One-command extension install (sage + sparge + vbench + eval)
uv sync -v --all-extras 2>&1 | tee build_full.log

Manual Installation

# Install from requirements.txt
pip install -r requirements.txt

# Install in editable mode
pip install -e .

Note: Flash Attention can be installed via wheel from GitHub releases.

Supported Models

Smart-Diffusion currently supports the Wan-T2V series:

Model ID Parameters Description
Wan-AI/Wan2.1-T2V-1.3B 1.3B Lightweight text-to-video model
Wan-AI/Wan2.1-T2V-14B 14B High-quality text-to-video model
Wan-AI/Wan2.2-T2V-A14B 14B Advanced two-stage text-to-video model

More models are being added continuously. Stay tuned!

Usage

Basic Example

Create a test script test_generate.py:

from chitu_diffusion import chitu_init, chitu_generate, chitu_start
from chitu_diffusion.task import DiffusionUserParams, DiffusionTask, DiffusionTaskPool
from hydra import compose, initialize

# Initialize with configuration
initialize(config_path="config", version_base=None)
args = compose(config_name="wan")

# Set model checkpoint path
args.models.ckpt_dir = "/path/to/your/model/checkpoint"

# Initialize backend
chitu_init(args)
chitu_start()

# Create generation task
user_params = DiffusionUserParams(
    role="user1",
    prompt="A cat walking on grass.",
    num_inference_steps=50,
    height=480,
    width=848,
    num_frames=81,
    guidance_scale=7.0,
)

task = DiffusionTask.from_user_request(user_params)
DiffusionTaskPool.add(task)

# Generate
while not DiffusionTaskPool.all_finished():
    chitu_generate()

print(f"Video saved to: {task.buffer.save_path}")

Launch Scripts

Only srun launch is supported.

  1. Edit system_config.yaml to configure model path, system params, and cfp.
  2. Run the unified launcher:
bash run.sh system_config.yaml

Optional runtime overrides:

bash run.sh system_config.yaml --num-nodes 2 --gpus-per-node 8 --cfp 2

Runtime notes:

  • parallel.cfp (or --cfp) must be 1 or 2; launcher maps it to infer.diffusion.cfg_size.
  • infer.diffusion.cp_size is auto-derived as (num_nodes * gpus_per_node) / cfp.
  • launch.tag is exported as CHITU_RUN_TAG and prefixes output run directory names.
  • launch.enable_launch_log=true writes launcher logs to output.root_dir/launch_<timestamp>.log.
  • CHITU_PYTHON_BIN can force the runtime Python; default order is .venv/bin/python -> python -> python3.

Recommended system_config.yaml output section:

output:
  root_dir: outputs
  enable_run_log: true
  enable_timer_dump: true
  hydra_dump_mode: off   # default/video_dir/off

hydra_dump_mode=video_dir relocates Hydra .hydra metadata to the video output directory. When enable_timer_dump=true, timer statistics are dumped as time_stats.csv in each run directory.

Advanced Configuration

Configuration is split into three levels:

  1. Model Parameters (Static): Defined in chitu_core/config/models/<model>.yaml
  2. User Parameters (Dynamic): Set per-request via DiffusionUserParams
  3. System Parameters (Semi-static): Set in system_config.yaml

Example: Using different attention backend

python test_generate.py \
    models.ckpt_dir=/path/to/checkpoint \
    infer.attn_type=sage \
    infer.diffusion.low_mem_level=2

Key Parameters

Attention Backend

Control your attention implementation with infer.attn_type:

Type Description Performance
flash_attn Default FlashAttention. High-performance full attention without accuracy loss Baseline
sage SageAttention (NIPS25 spotlight). Train-free quantized attention ~2x speedup
sparge SpargeAttention (ICML25). Train-free sparse attention ~3x speedup
auto Automatically choose best backend -

Example:

python test_generate.py infer.attn_type=sage

Low Memory Mode

Control GPU memory usage with infer.diffusion.low_mem_level:

Level Behavior
0 All models loaded to GPU
1 VAE enables tiling
2 T5 encoder offloaded to CPU
≥3 DiT model offloaded to CPU

Example:

python test_generate.py infer.diffusion.low_mem_level=2

FlexCache

Enable feature reuse acceleration with infer.diffusion.enable_flexcache=true:

Method cache_type Description
teacache TeaCache CVPR24 spotlight. Time embedding tells.
pab Pyramid Attention Broadcast ICLR25. Pyramid attention broadcasting
ditango DiTango ASE + anchor-gated grouped reuse

DiTango behavior notes (current implementation):

  • Local partition is always computed each step and merged separately for stability.
  • Anchor decision is step-level and synchronized across CFG positive/negative branches.
  • cache_ratio controls both anchor trigger aggressiveness and global ASE-threshold quantile update.
  • Strategy implementation is in chitu_diffusion/flex_cache/strategy/ditango/ditango.py.
  • A merged decision visualization is emitted to <output_dir>/ditango_policy_step_layer_group.ppm.

Unified per-request API:

from chitu_diffusion.task import DiffusionUserParams, FlexCacheParams

user_params = DiffusionUserParams(
  prompt="A cat walking on grass.",
  flexcache_params=FlexCacheParams(
    strategy="teacache",  # teacache / pab / ditango
    cache_ratio=0.4,       # 0 quality-first, 1 speed-first
    warmup=5,
    cooldown=5,
  ),
)

Legacy style is still supported:

user_params = DiffusionUserParams(
    prompt="A cat walking on grass.",
  flexcache='teacache',
    # ... other params
)

Evaluation

Enable automatic evaluation with eval.eval_type (multi-select):

python test_generate.py eval.eval_type=[vbench,fid,psnr] eval.reference_path=/path/to/reference_videos

Supported evaluation methods:

  • vbench: VBench custom-mode evaluation
  • fid: Frechet Inception Distance (requires reference_path)
  • fvd: Frechet Video Distance (requires reference_path)
  • psnr: Peak Signal-to-Noise Ratio (requires reference_path)
  • ssim: Structural Similarity Index (requires reference_path)
  • lpips: Learned Perceptual Image Patch Similarity (requires reference_path)

Behavior notes:

  • eval.eval_type=[] or null disables evaluation.
  • Metrics requiring references are skipped with warning if eval.reference_path is missing or invalid.
  • Results are saved under ./vbench_out/ (vbench) and ./eval_out/ (other metrics).

Documentation

Contributing

We welcome contributions! Smart-Diffusion is in active development.

To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with proper documentation
  4. Submit a pull request

Please see our Developer Guide for parameter taxonomy and best practices.

Community

Roadmap

  • More diffusion model support (Flux2, Longcat-Video, FireRed etc.)
  • More acceleration algorithms
  • More parallelism strategies
  • Better operator implementations
  • Production-ready serving framework
  • Comprehensive benchmarks

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Citation

If you use Smart-Diffusion in your research, please cite:

@software{smart_diffusion2025,
  title={Smart-Diffusion: High-Performance Diffusion Model Inference Framework},
  author={PACMAN Team, Tsinghua University and QingCheng.ai},
  year={2025},
  url={https://github.com/chen-yy20/SmartDiffusion}
}

Acknowledgments


Note: Smart-Diffusion is currently in testing and development phase. We're working hard to make it better! Join us in building the future of AIGC acceleration. 🚀

About

Fast, flexible and easy diffusion framework for everyone.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors