Skip to content

Mismatch paper's approach and README pretrain command #6

Description

@ndhuynh02

I found that the pretraining phase from this code is a bit different from what I understand about the paper. According to 2 images below, only the Image modality is intra-contrastive with the aid of a semantic module.
Image
Image

However, the recommended pretraining command in README says the differ with both --separate_text and --separte_image are activated. If I understand the paper correctly, only --separate_image should be used.

python -m main.run --logs="path/to/logs" --save-frequency 2 --report-to wandb --wandb-project-name="sample_project" --train-data="path/to/cc12m" --train-num-samples 10030127 --warmup 10000 --batch-size=512 --lr=1e-3 --wd=0.1 --epochs=30 --workers=2 --model "ViT-B-16" --precision amp --dataset-type webdataset --clip-inModality-loss --clip-loss --alpha=1 --beta=0.5 --nl_semantic_supervision --train-num-samples 10030127 --dataset-type webdataset --separate_text --separate_image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions