Uncertainty-aware multi-caption weighting framework for robust BLIP fine-tuning under noisy captions.
deep-learning transformers pytorch image-captioning uncertainty-estimation blip multimodal-learning robust-training-algorithm vision-language-models caption-noise robust-training
-
Updated
May 10, 2026 - Python