Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models (CVPR 2024 Highlight)

Abstract

Diffusion models have achieved remarkable success in generating high-quality diverse and creative images. However in text-based image generation they often struggle to accurately capture the intended meaning of the text. For instance a specified object might not be generated or an adjective might incorrectly alter unintended objects. Moreover we found that relationships indicating possession between objects are frequently overlooked. Despite the diversity of users' intentions in text existing methods often focus on only some aspects of these intentions. In this paper we propose Predicated Diffusion a unified framework designed to more effectively express users' intentions. It represents the intended meaning as propositions using predicate logic and treats the pixels in attention maps as fuzzy predicates. This approach provides a differentiable loss function that offers guidance for the image generation process to better fulfill the propositions. Comparative evaluations with existing methods demonstrated that Predicated Diffusion excels in generating images faithful to various text prompts while maintaining high image quality as validated by human evaluators and pretrained image-text models.

Highlights

Citation

If you find this work useful, please cite:

@InProceedings{Sueyoshi_2024_CVPR,
    author    = {Kota Sueyoshi and Takashi Matsubara},
    title     = {Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {8651-8660}
}

Usage

Run python run.py with options

--prompt: Text prompt to generate.
--attention_exist_indices: List of token indices that should be present.
- Preventing missing objects
- Ensuring the co-existence
--attention_corr_indices: List of [modifier_token_idx, head_token_idx] pairs to ensure the modification.
- Ensuring the modification
--attention_leak_indices: List of token-index pairs that should not overlap in attention.
- Preventing the attribute leakage.
- Ensuring the one-to-one correspondence.
--attention_possession_indices: List of [possessor_token_idx, possession_token_idx] pairs to encourage the possessor to have the possession.
- Preventing possession failure.

Notes:

Indices refer to positions in the tokenizer output for the prompt.
Use empty lists to disable options.

Example Options

For Experiment (i), to generate the image of "a dog and a bowl," run

python run.py --prompt "A dog and a bowl" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices []

For Experiment (ii), to generate the image of "a brown dog and a yellow bowl," run

python run.py --prompt "A brown dog and a yellow bowl" --attention_corr_indices [[2,3],[6,7]] --attention_exist_indices [3,7] --attention_leak_indices [[2,7],[6,3]] --attention_possession_indices []

For Experiment (iii), to generate the image of "a frog wearing a hat," run

python run.py --prompt "A frog wearing a hat" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices [2,5]

Disclaimers

This code was built by modifying the official implementation of Chefer et al., "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models," SIGGRAPH, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
environment		environment
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
pipeline_predicated_diffusion.py		pipeline_predicated_diffusion.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models (CVPR 2024 Highlight)

Abstract

Highlights

Citation

Usage

Example Options

Disclaimers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models (CVPR 2024 Highlight)

Abstract

Highlights

Citation

Usage

Example Options

Disclaimers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages