Skip to content

SigLIP2 in UniPic-1 #22

Description

@wavinflaghxm

Similar to #9, I'm also confused why SigLIP2 isn't used in the code.

The paper says, 'Image understanding is performed using a SigLIP2 encoder to extract rich visual features, which are subsequently passed to an LLM for autoregressive text generation.'

However, the code in image2text.py uses VAE+MAR.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/scripts/image2text.py#L64

Also, the loss calculation does not involve the SigLIP2.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/src/models/skywork_unipic_dev.py#L334

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions