Skip to content

Accio-Lab/SwimBird

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong1, Shilin Yan2†‡, Hongwei Xue2, Xiaojun Tang2, Kunyu Shi2,
Guannan Zhang2, Ruixuan Li1‡, Yixiong Zou1‡

Project Leader Corresponding author

1Huazhong University of Science and Technology, 2Accio Team, Alibaba Group

ArXiv Project HF HF

🔥 News

  • 2025.02.14 🚀 Evaluation Code is available!
  • 2025.02.06 🚀 Model and Dataset are released!
  • 2025.02.05 🚀 Training Code is available!
  • 2025.02.05 📝 We release our latest work SwimBird!

🌟 Method

We introduce SwimBird, a hybrid autoregressive MLLM that dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning (continuous hidden states as visual thoughts), and (3) interleaved vision–text reasoning. By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks.

mask

👀 Cases

SwimBird dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning, and (3) interleaved vision–text reasoning.

mask

🛠 Preparation

git clone https://github.com/Accio-Lab/SwimBird.git
cd SwimBird

pip install -r requirements.txt
pip install qwen-vl-utils
pip install flash-attn --no-build-isolation

🎯 Training

To train the model, follow these steps:

  1. Replace Qwen3-VL's chat_template.json with ours.
  2. Download the training datasets SwimBird-SFT-92K and add the dataset absolute directory path as a prefix to all image paths in the JSON files:
python data_process.py absolute_path_to_dataset

Example:

python data_process.py /abs_path/SwimBird-ZebraCoT/
python data_process.py /abs_path/SwimBird-MathCanvas/
python data_process.py /abs_path/SwimBird-ThinkMorph/
python data_process.py /abs_path/SwimBird-OpenMMReasoner/
  1. Run the training script with the following command:
bash scripts/train.sh

📖 Evaluation

We adopt VLMEvalKit to conduct the evaluation. You can get started as follows:

1. Setup

cd VLMEvalKit
pip install -e.

2. Inference

Notably, we evaluate our model with the LLM-based API judge setting rather than exact matching for a more accurate and reliable assessment. We set gpt-4o-0806 as the default judge model, and you can replace it with your own.

bash test.sh

The path to our model: VLMEvalKit/vlmeval/vlm/swimbird

See [QuickStar | 快速开始] for more details about arguments.

✉️ Concat

  • If you have any questions about this project, please feel free to contact: tattoo.ysl@gmail.com.
  • We are actively seeking self-motivated researchers and research interns to join our team!

📌 Citation

  • If you find this project useful in your research, please consider citing:
@article{tong2026swimbird,
  title={SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs},
  author={Tong, Jintao and Yan, Shilin and Xue, Hongwei and Tang, Xiaojun and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
  journal={arXiv preprint arXiv:2602.06040},
  year={2026}
}

👍 Acknowledgment

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors