Skip to content

TongLiu-github/focalpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Offical code for the paper FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings (ACL 2025 long paper).

Core contribution

Chen et al., 2024 empirically finds that DPO training rarely improves these misranked preference, despite its gradient emphasizing on these cases. We add a simple factor to DPO loss to make DPO focus on "more correct" (see gradient curve) samples. With the introduced hyperparameter fixed (we do not want to over-rely on hyperparameter tuning), it consistently outperforms DPO on Arena-hard and Alpaca Eval.

image image

Released Models

Mistral

We release the following model that are built on top of Mistral-Base SFT (7B) model by training FocalPO on UltraFeedback dataset.

models Alpaca Eval 2.0 LC AH WR
tongliuphysics/Mistral-7B-Base-SFT-FocalPO 23.9 17.1

Llama

We release the following model that are built on top of Llama-3-Instruct (8B) model by training FocalPO on the on-policy Llama3-ultrafeedbackarmorm dataset.

models Alpaca Eval 2.0 LC AH WR
tongliuphysics/Llama-3-8B-Instruct-FocalPO 54.7 34.6

BibTeX

@article{liu2025focalpo,
  title={FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings},
  author={Liu, Tong and Yu, Xiao and Zhou, Wenxuan and Gu, Jindong and Tresp, Volker},
  journal={arXiv preprint arXiv:2501.06645},
  year={2025}
}

About

[ACL 2025] FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors