SigLIP2 in UniPic-1

Similar to #9, I'm also confused why SigLIP2 isn't used in the code.

The paper says, 'Image understanding is performed using a SigLIP2 encoder to extract rich visual features, which are subsequently passed to an LLM for autoregressive text generation.'

However, the code in image2text.py uses VAE+MAR.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/scripts/image2text.py#L64

Also, the loss calculation does not involve the SigLIP2.
https://github.com/SkyworkAI/UniPic/blob/main/UniPic-1/src/models/skywork_unipic_dev.py#L334

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SigLIP2 in UniPic-1 #22

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

SigLIP2 in UniPic-1 #22

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions