Skip to content

tksmatsubara/PredicatedDiffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models (CVPR 2024 Highlight)

Official paper page: CVPR OpenAccess

Abstract

Diffusion models have achieved remarkable success in generating high-quality diverse and creative images. However in text-based image generation they often struggle to accurately capture the intended meaning of the text. For instance a specified object might not be generated or an adjective might incorrectly alter unintended objects. Moreover we found that relationships indicating possession between objects are frequently overlooked. Despite the diversity of users' intentions in text existing methods often focus on only some aspects of these intentions. In this paper we propose Predicated Diffusion a unified framework designed to more effectively express users' intentions. It represents the intended meaning as propositions using predicate logic and treats the pixels in attention maps as fuzzy predicates. This approach provides a differentiable loss function that offers guidance for the image generation process to better fulfill the propositions. Comparative evaluations with existing methods demonstrated that Predicated Diffusion excels in generating images faithful to various text prompts while maintaining high image quality as validated by human evaluators and pretrained image-text models.

Highlights

Predicated Diffusion overview (Hokkaido SAIL page)

Citation

If you find this work useful, please cite:

@InProceedings{Sueyoshi_2024_CVPR,
    author    = {Kota Sueyoshi and Takashi Matsubara},
    title     = {Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {8651-8660}
}

Usage

Run python run.py with options

  • --prompt: Text prompt to generate.
  • --attention_exist_indices: List of token indices that should be present.
    • Preventing missing objects
    • Ensuring the co-existence
  • --attention_corr_indices: List of [modifier_token_idx, head_token_idx] pairs to ensure the modification.
    • Ensuring the modification
  • --attention_leak_indices: List of token-index pairs that should not overlap in attention.
    • Preventing the attribute leakage.
    • Ensuring the one-to-one correspondence.
  • --attention_possession_indices: List of [possessor_token_idx, possession_token_idx] pairs to encourage the possessor to have the possession.
    • Preventing possession failure.

Notes:

  • Indices refer to positions in the tokenizer output for the prompt.
  • Use empty lists to disable options.

Example Options

For Experiment (i), to generate the image of "a dog and a bowl," run

python run.py --prompt "A dog and a bowl" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices []

For Experiment (ii), to generate the image of "a brown dog and a yellow bowl," run

python run.py --prompt "A brown dog and a yellow bowl" --attention_corr_indices [[2,3],[6,7]] --attention_exist_indices [3,7] --attention_leak_indices [[2,7],[6,3]] --attention_possession_indices []

For Experiment (iii), to generate the image of "a frog wearing a hat," run

python run.py --prompt "A frog wearing a hat" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices [2,5]

Disclaimers

This code was built by modifying the official implementation of Chefer et al., "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models," SIGGRAPH, 2023.

About

Code for "Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models," CVPR, 2024 (Highlight)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages