Simple end-to-end RLHF (Reinforcement Learning from Human Feedback) for diffusion models (DDPO) on personal hardware.
-
Updated
Feb 26, 2025 - Python
Simple end-to-end RLHF (Reinforcement Learning from Human Feedback) for diffusion models (DDPO) on personal hardware.
Comparative study of six diffusion model personalization methods (DreamBooth, LoRA, Textual Inversion, Custom Diffusion, LCM, DDPO) using HuggingFace Diffusers on Stable Diffusion v1.5 | NVIDIA H100 | IU Quartz HPC
Add a description, image, and links to the ddpo topic page so that developers can more easily learn about it.
To associate your repository with the ddpo topic, visit your repo's landing page and select "manage topics."