Smart-Diffusion

Smart-Diffusion is a high-performance diffusion model inference framework built on Chitu. It provides extreme performance and flexible scheduling for AI-generated content (AIGC) workloads.

Overview

Smart-Diffusion is the pure enjoyment version of Chitu-Diffusion, developed by the PACMAN team from Tsinghua University and QingCheng.ai. We aim to provide support for the rapidly growing Diffusion ecosystem by restructuring DiT models under the API and scheduling philosophy of Chitu, maintaining scheduling flexibility while offering extreme performance.

Key Features

🚀 High Performance: Optimized diffusion inference with advanced parallelism strategies
🔧 Flexible Architecture: Support for multiple attention backends (FlashAttention, SageAttention, SpargeAttention)
💾 Memory Efficient: Low memory mode with model offloading and VAE tiling
📊 Feature Cache: Unified FlexCache API for TeaCache, PAB, and DiTango
🎯 Easy to Use: Simple API with per-request parameter configuration
🌐 Multi-Model: Currently supports Wan-T2V series (1.3B, 14B, A14B) with more coming soon

Design Philosophy

Smart-Diffusion follows three core pillars:

Parallelism: Context parallelism (CP), CFG parallelism, and data parallelism
Kernels: Optimized attention implementations with quantization support
Algorithms: Feature reuse and caching strategies for acceleration

See Why Smart-Diffusion? for detailed design philosophy.

Installation

Prerequisites

Python 3.12+
CUDA 12.4+ (recommended: 12.8)
NVIDIA GPU with compute capability 8.0+ (Ampere) or 9.0+ (Hopper/Blackwell)

Quick Start with uv

We recommend using uv for a smoother installation experience.

1. Clone the repository

git clone git@github.com:chen-yy20/SmartDiffusion.git
cd SmartDiffusion

1.1 Clone the submodules

Option 1: Clone all submodules (sage-attn, sparge-attn and vbench)

git submodule update --init --recursive

Option 2: Clone only a specific submodule. For example, to clone only the sage/sparge_attn submodule:

git submodule update --init third_party/sage_attn
git submodule update --init third_party/sparge_attn

2. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

See uv documentation for more details.

3. Configure build settings

Check your CUDA version:

nvcc --version

Edit pyproject.toml to match your CUDA version. For CUDA 12.8:

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch-cu128" }
torchvision = { index = "pytorch-cu128" }

Configure GPU architecture in pyproject.toml:

[tool.uv.extra-build-variables]
# Set TORCH_CUDA_ARCH_LIST according to your GPU
# Ampere: 8.0, Hopper: 9.0, Blackwell: 9.0
sageattention = { 
    EXT_PARALLEL= "4", 
    NVCC_APPEND_FLAGS="--threads 8", 
    MAX_JOBS="32", 
    "TORCH_CUDA_ARCH_LIST" = "8.0;9.0"
}
spas_sage_attn = { 
    EXT_PARALLEL= "4", 
    NVCC_APPEND_FLAGS="--threads 8", 
    MAX_JOBS="32", 
    "TORCH_CUDA_ARCH_LIST" = "8.0;9.0"
}

4. Install dependencies

# Required installation (base dependencies in [project.dependencies]) | 30mins
uv sync -v 2>&1 | tee uv_sync.log

# Optional extras from [project.optional-dependencies]
# SageAttention
uv sync -v --extra sage 2>&1 | tee build_sage.log

# SpargeAttention
uv sync -v --extra sparge 2>&1 | tee build_sparge.log

# VBench evaluation toolkit
uv sync -v --extra vbench 2>&1 | tee build_vbench.log

# Evaluation metrics (FID/FVD/PSNR/SSIM/LPIPS)
uv sync -v --extra eval 2>&1 | tee build_eval.log

# One-command extension install (sage + sparge + vbench + eval)
uv sync -v --all-extras 2>&1 | tee build_full.log

Manual Installation

# Install from requirements.txt
pip install -r requirements.txt

# Install in editable mode
pip install -e .

Note: Flash Attention can be installed via wheel from GitHub releases.

Supported Models

Smart-Diffusion currently supports the Wan-T2V series:

Model ID	Parameters	Description
`Wan-AI/Wan2.1-T2V-1.3B`	1.3B	Lightweight text-to-video model
`Wan-AI/Wan2.1-T2V-14B`	14B	High-quality text-to-video model
`Wan-AI/Wan2.2-T2V-A14B`	14B	Advanced two-stage text-to-video model

More models are being added continuously. Stay tuned!

Usage

Basic Example

Create a test script test_generate.py:

from chitu_diffusion import chitu_init, chitu_generate, chitu_start
from chitu_diffusion.task import DiffusionUserParams, DiffusionTask, DiffusionTaskPool
from hydra import compose, initialize

# Initialize with configuration
initialize(config_path="config", version_base=None)
args = compose(config_name="wan")

# Set model checkpoint path
args.models.ckpt_dir = "/path/to/your/model/checkpoint"

# Initialize backend
chitu_init(args)
chitu_start()

# Create generation task
user_params = DiffusionUserParams(
    role="user1",
    prompt="A cat walking on grass.",
    num_inference_steps=50,
    height=480,
    width=848,
    num_frames=81,
    guidance_scale=7.0,
)

task = DiffusionTask.from_user_request(user_params)
DiffusionTaskPool.add(task)

# Generate
while not DiffusionTaskPool.all_finished():
    chitu_generate()

print(f"Video saved to: {task.buffer.save_path}")

Launch Scripts

Only srun launch is supported.

Edit system_config.yaml to configure model path, system params, and cfp.
Run the unified launcher:

bash run.sh system_config.yaml

Optional runtime overrides:

bash run.sh system_config.yaml --num-nodes 2 --gpus-per-node 8 --cfp 2

Runtime notes:

parallel.cfp (or --cfp) must be 1 or 2; launcher maps it to infer.diffusion.cfg_size.
infer.diffusion.cp_size is auto-derived as (num_nodes * gpus_per_node) / cfp.
launch.tag is exported as CHITU_RUN_TAG and prefixes output run directory names.
launch.enable_launch_log=true writes launcher logs to output.root_dir/launch_<timestamp>.log.
CHITU_PYTHON_BIN can force the runtime Python; default order is .venv/bin/python -> python -> python3.

Recommended system_config.yaml output section:

output:
  root_dir: outputs
  enable_run_log: true
  enable_timer_dump: true
  hydra_dump_mode: off   # default/video_dir/off

hydra_dump_mode=video_dir relocates Hydra .hydra metadata to the video output directory. When enable_timer_dump=true, timer statistics are dumped as time_stats.csv in each run directory.

Advanced Configuration

Configuration is split into three levels:

Model Parameters (Static): Defined in chitu_core/config/models/<model>.yaml
User Parameters (Dynamic): Set per-request via DiffusionUserParams
System Parameters (Semi-static): Set in system_config.yaml

Example: Using different attention backend

python test_generate.py \
    models.ckpt_dir=/path/to/checkpoint \
    infer.attn_type=sage \
    infer.diffusion.low_mem_level=2

Key Parameters

Attention Backend

Control your attention implementation with infer.attn_type:

Type	Description	Performance
`flash_attn`	Default FlashAttention. High-performance full attention without accuracy loss	Baseline
`sage`	SageAttention (NIPS25 spotlight). Train-free quantized attention	~2x speedup
`sparge`	SpargeAttention (ICML25). Train-free sparse attention	~3x speedup
`auto`	Automatically choose best backend	-

Example:

python test_generate.py infer.attn_type=sage

Low Memory Mode

Control GPU memory usage with infer.diffusion.low_mem_level:

Level	Behavior
0	All models loaded to GPU
1	VAE enables tiling
2	T5 encoder offloaded to CPU
≥3	DiT model offloaded to CPU

Example:

python test_generate.py infer.diffusion.low_mem_level=2

FlexCache

Enable feature reuse acceleration with infer.diffusion.enable_flexcache=true:

Method	cache_type	Description
`teacache`	TeaCache	CVPR24 spotlight. Time embedding tells.
`pab`	Pyramid Attention Broadcast	ICLR25. Pyramid attention broadcasting
`ditango`	DiTango	ASE + anchor-gated grouped reuse

DiTango behavior notes (current implementation):

Local partition is always computed each step and merged separately for stability.
Anchor decision is step-level and synchronized across CFG positive/negative branches.
cache_ratio controls both anchor trigger aggressiveness and global ASE-threshold quantile update.
Strategy implementation is in chitu_diffusion/flex_cache/strategy/ditango/ditango.py.
A merged decision visualization is emitted to <output_dir>/ditango_policy_step_layer_group.ppm.

Unified per-request API:

from chitu_diffusion.task import DiffusionUserParams, FlexCacheParams

user_params = DiffusionUserParams(
  prompt="A cat walking on grass.",
  flexcache_params=FlexCacheParams(
    strategy="teacache",  # teacache / pab / ditango
    cache_ratio=0.4,       # 0 quality-first, 1 speed-first
    warmup=5,
    cooldown=5,
  ),
)

Legacy style is still supported:

user_params = DiffusionUserParams(
    prompt="A cat walking on grass.",
  flexcache='teacache',
    # ... other params
)

Evaluation

Enable automatic evaluation with eval.eval_type (multi-select):

python test_generate.py eval.eval_type=[vbench,fid,psnr] eval.reference_path=/path/to/reference_videos

Supported evaluation methods:

vbench: VBench custom-mode evaluation
fid: Frechet Inception Distance (requires reference_path)
fvd: Frechet Video Distance (requires reference_path)
psnr: Peak Signal-to-Noise Ratio (requires reference_path)
ssim: Structural Similarity Index (requires reference_path)
lpips: Learned Perceptual Image Patch Similarity (requires reference_path)

Behavior notes:

eval.eval_type=[] or null disables evaluation.
Metrics requiring references are skipped with warning if eval.reference_path is missing or invalid.
Results are saved under ./vbench_out/ (vbench) and ./eval_out/ (other metrics).

Documentation

Why Smart-Diffusion? - Design philosophy and architecture
API Reference - Detailed API documentation
Configuration Guide - Complete configuration options

Contributing

We welcome contributions! Smart-Diffusion is in active development.

To contribute:

Fork the repository
Create a feature branch
Make your changes with proper documentation
Submit a pull request

Please see our Developer Guide for parameter taxonomy and best practices.

Community

Issues: GitHub Issues
Discussions: GitHub Discussions

Roadmap

More diffusion model support (Flux2, Longcat-Video, FireRed etc.)
More acceleration algorithms
More parallelism strategies
Better operator implementations
Production-ready serving framework
Comprehensive benchmarks

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Citation

If you use Smart-Diffusion in your research, please cite:

@software{smart_diffusion2025,
  title={Smart-Diffusion: High-Performance Diffusion Model Inference Framework},
  author={PACMAN Team, Tsinghua University and QingCheng.ai},
  year={2025},
  url={https://github.com/chen-yy20/SmartDiffusion}
}

Acknowledgments

Chitu - Base inference framework
xDiT - Scalable Inference Engine for Diffusion Transformers
SGLang-Diffusion - Image/Video Generation Framework
SageAttention - Quantized attention implementation
SpargeAttention - Sparse+Sage attention implementation
FlashAttention - Efficient attention implementation
TeaCache - Feature cache strategy
PyramidAttentionBroadcast - PAB algorithm

Note: Smart-Diffusion is currently in testing and development phase. We're working hard to make it better! Join us in building the future of AIGC acceleration. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
backup		backup
chitu_core		chitu_core
chitu_diffusion		chitu_diffusion
docs		docs
script		script
test		test
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
setup.py		setup.py
system_config.yaml		system_config.yaml

Folders and files

Latest commit

History

Repository files navigation

Smart-Diffusion

Overview

Key Features

Design Philosophy

Table of Contents

Installation

Prerequisites

Quick Start with uv

1. Clone the repository

1.1 Clone the submodules

2. Install uv

3. Configure build settings

4. Install dependencies

Manual Installation

Supported Models

Usage

Basic Example

Launch Scripts

Advanced Configuration

Key Parameters

Attention Backend

Low Memory Mode

FlexCache

Evaluation

Documentation

Contributing

Community

Roadmap

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages