Difference between NVILA-8B and NVILA-8B-Video Model

Hi VILA Team, great appreciation for the amazing work!

Recently, I have been adapting the NVILA model as the video understanding model for a research project, and I noticed that NVILA-8B and NVILA-8B-Video are more than just two models going through an extra video instruction-tuning stage, as mentioned in https://github.com/NVlabs/VILA/issues/167. **The two models actually have different visual sampling feature dimensions, and the base model architectures also differ, with one annotated as qwen2vl and the other as qwen2.5-vl.** I am wondering if there is any missing information about the models in the repository?

@Lyken17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between NVILA-8B and NVILA-8B-Video Model #269

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Difference between NVILA-8B and NVILA-8B-Video Model #269

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions