Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models (CVPR 2024 Highlight)
Official paper page: CVPR OpenAccess
Diffusion models have achieved remarkable success in generating high-quality diverse and creative images. However in text-based image generation they often struggle to accurately capture the intended meaning of the text. For instance a specified object might not be generated or an adjective might incorrectly alter unintended objects. Moreover we found that relationships indicating possession between objects are frequently overlooked. Despite the diversity of users' intentions in text existing methods often focus on only some aspects of these intentions. In this paper we propose Predicated Diffusion a unified framework designed to more effectively express users' intentions. It represents the intended meaning as propositions using predicate logic and treats the pixels in attention maps as fuzzy predicates. This approach provides a differentiable loss function that offers guidance for the image generation process to better fulfill the propositions. Comparative evaluations with existing methods demonstrated that Predicated Diffusion excels in generating images faithful to various text prompts while maintaining high image quality as validated by human evaluators and pretrained image-text models.
If you find this work useful, please cite:
@InProceedings{Sueyoshi_2024_CVPR,
author = {Kota Sueyoshi and Takashi Matsubara},
title = {Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {8651-8660}
}Run python run.py with options
--prompt: Text prompt to generate.--attention_exist_indices: List of token indices that should be present.- Preventing missing objects
- Ensuring the co-existence
--attention_corr_indices: List of[modifier_token_idx, head_token_idx]pairs to ensure the modification.- Ensuring the modification
--attention_leak_indices: List of token-index pairs that should not overlap in attention.- Preventing the attribute leakage.
- Ensuring the one-to-one correspondence.
--attention_possession_indices: List of[possessor_token_idx, possession_token_idx]pairs to encourage the possessor to have the possession.- Preventing possession failure.
Notes:
- Indices refer to positions in the tokenizer output for the prompt.
- Use empty lists to disable options.
For Experiment (i), to generate the image of "a dog and a bowl," run
python run.py --prompt "A dog and a bowl" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices []For Experiment (ii), to generate the image of "a brown dog and a yellow bowl," run
python run.py --prompt "A brown dog and a yellow bowl" --attention_corr_indices [[2,3],[6,7]] --attention_exist_indices [3,7] --attention_leak_indices [[2,7],[6,3]] --attention_possession_indices []For Experiment (iii), to generate the image of "a frog wearing a hat," run
python run.py --prompt "A frog wearing a hat" --attention_corr_indices [] --attention_exist_indices [2,5] --attention_leak_indices [] --attention_possession_indices [2,5]This code was built by modifying the official implementation of Chefer et al., "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models," SIGGRAPH, 2023.
