[2026/2/19] After the double-blind review process, benchmark are released now. Check FREAK benchmark at Huggingface!
[2026/1/26] 🎉🎉 Our paper has been accepted to ICLR 2026 !
We are finalizing the paper, and the camera-ready version will be uploaded to arXiv and OpenReview shortly.
FREAK is a comprehensive multimodal benchmark designed for fine-grained hallucination assessment in MLLMs. Through high-quality photorealistic images featuring fine-grained counter-commonsense edits, FREAK innovatively evaluates hallucination phenomena in detailed visual perception of MLLMs. Extensive experiments on FREAK show severe hallucination issues in SOTA models regarding detailed visual perception. The highlights of FREAK lie in:
-
High-quality CCS image input truly challenges Advanced MLLMs perception capability.
-
Comprehensive improvement of existing multimodal hallucination benchmarks with various image and question content.
-
Firstly discuss improvement between reasoning MLLMs and non-reasoning MLLMs in visual perception hallucination.
git clone https://github.com/Hans-M-Yin/FREAK.git
cd FREAKFirst, install the requirments.
conda create -n freak python=3.12
conda activate freak
pip install -r requirementsIn FREAK, free-form questions rely on an external judge model, so you need to specify the API key and address of the judge model first. You can feel free to try different judge models.
export OPENAI_API_KEY=<your api key>
export OPENAI_BASE_URL=<your api key>For local MLLMs, we recommand to install vLLM for model deployment. Check scripts/deploy.sh for details.
Finally, you can change params in scripts/run.sh and run it!
python scripts/run.shPlease refer to the paper for the detailed performance of different MLLMs.
Our leaderboard repored in the paper will be released soon. At the same time, we are integrating the FREAK benchmark into VLMEvalKit. Please stay tuned :)
If you find our paper and code useful for your research and applications, please cite using this BibTeX:
@inproceedings{
yin2026freak,
title={{FREAK}: A Fine-grained Hallucination Evaluation Benchmark for Advanced {MLLM}s},
author={Zhihan Yin and Jianxin Liang and Yueqian Wang and Yifeng Yao and Huishuai Zhang and Dongyan Zhao},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=YeagC09j2K}
}This project is licensed under the MIT License - see the LICENSE file for details.
