Skip to content

write imagenet to beton takes 6 hours #400

@Si1kyRiver

Description

@Si1kyRiver

Hi, by using the code below, it takes > 6hours to write imagenet1k to beton, which is nontrivial.

from ffcv.writer import DatasetWriter
from ffcv.fields import RGBImageField, IntField

# Your dataset (`torch.utils.data.Dataset`) of (image, label) pairs
my_dataset = make_my_dataset()
write_path = '/output/path/for/converted/ds.beton'

# Pass a type for each data field
writer = DatasetWriter(write_path, {
    # Tune options to optimize dataset size, throughput at train-time
    'image': RGBImageField(
        max_resolution=256
    ),
    'label': IntField()
})

# Write dataset
writer.from_indexed_dataset(my_dataset)

Is there a solution or any tricks?
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions