GitHub - Accio-Lab/SwimBird

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

Jintao Tong¹, Shilin Yan^2†‡, Hongwei Xue², Xiaojun Tang², Kunyu Shi²,
Guannan Zhang², Ruixuan Li^1‡, Yixiong Zou^1‡

^†Project Leader ^‡Corresponding author

¹Huazhong University of Science and Technology, ²Accio Team, Alibaba Group

🔥 News

2025.02.14 🚀 Evaluation Code is available!
2025.02.06 🚀 Model and Dataset are released!
2025.02.05 🚀 Training Code is available!
2025.02.05 📝 We release our latest work SwimBird!

🌟 Method

We introduce SwimBird, a hybrid autoregressive MLLM that dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning (continuous hidden states as visual thoughts), and (3) interleaved vision–text reasoning. By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks.

👀 Cases

SwimBird dynamically switches among three reasoning modes conditioned on the input: (1) text-only reasoning, (2) vision-only reasoning, and (3) interleaved vision–text reasoning.

🛠 Preparation

git clone https://github.com/Accio-Lab/SwimBird.git
cd SwimBird

pip install -r requirements.txt
pip install qwen-vl-utils
pip install flash-attn --no-build-isolation

🎯 Training

To train the model, follow these steps:

Replace Qwen3-VL's chat_template.json with ours.
Download the training datasets SwimBird-SFT-92K and add the dataset absolute directory path as a prefix to all image paths in the JSON files:

python data_process.py absolute_path_to_dataset

Example:

python data_process.py /abs_path/SwimBird-ZebraCoT/
python data_process.py /abs_path/SwimBird-MathCanvas/
python data_process.py /abs_path/SwimBird-ThinkMorph/
python data_process.py /abs_path/SwimBird-OpenMMReasoner/

Run the training script with the following command:

bash scripts/train.sh

📖 Evaluation

We adopt VLMEvalKit to conduct the evaluation. You can get started as follows:

1. Setup

cd VLMEvalKit
pip install -e.

2. Inference

Notably, we evaluate our model with the LLM-based API judge setting rather than exact matching for a more accurate and reliable assessment. We set gpt-4o-0806 as the default judge model, and you can replace it with your own.

bash test.sh

The path to our model: VLMEvalKit/vlmeval/vlm/swimbird

See [QuickStar | 快速开始] for more details about arguments.

✉️ Concat

If you have any questions about this project, please feel free to contact: tattoo.ysl@gmail.com.
We are actively seeking self-motivated researchers and research interns to join our team!

📌 Citation

If you find this project useful in your research, please consider citing:

@article{tong2026swimbird,
  title={SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs},
  author={Tong, Jintao and Yan, Shilin and Xue, Hongwei and Tang, Xiaojun and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
  journal={arXiv preprint arXiv:2602.06040},
  year={2026}
}

👍 Acknowledgment

We sincerely thank Qwen-VL-Series-Finetune, Skila and others for their contributions, which have provided valuable insights.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
VLMEvalKit		VLMEvalKit
img		img
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
chat_template.json		chat_template.json
data_process.py		data_process.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

🔥 News

🌟 Method

👀 Cases

🛠 Preparation

🎯 Training

📖 Evaluation

1. Setup

2. Inference

✉️ Concat

📌 Citation

👍 Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SwimBird: Eciliting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

🔥 News

🌟 Method

👀 Cases

🛠 Preparation

🎯 Training

📖 Evaluation

1. Setup

2. Inference

✉️ Concat

📌 Citation

👍 Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages