Skip to content

Difference between NVILA-8B and NVILA-8B-Video Model #269

@LaBaZh

Description

@LaBaZh

Hi VILA Team, great appreciation for the amazing work!

Recently, I have been adapting the NVILA model as the video understanding model for a research project, and I noticed that NVILA-8B and NVILA-8B-Video are more than just two models going through an extra video instruction-tuning stage, as mentioned in #167. The two models actually have different visual sampling feature dimensions, and the base model architectures also differ, with one annotated as qwen2vl and the other as qwen2.5-vl. I am wondering if there is any missing information about the models in the repository?

@Lyken17

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions