Skip to content
This repository was archived by the owner on Jun 11, 2026. It is now read-only.
This repository was archived by the owner on Jun 11, 2026. It is now read-only.

Reddit Data #41

Description

@KeremTurgutlu

Data preparation involves downloading reddit comment and submission data form https://files.pushshift.io/reddit/ and it is written that total data is around 700GB. However, the actual size of the data is around ~2TB, for training GODEL unitl which YYYY-MM reddit data you've used?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions