Skip to content

How to map human_annotated_data to raw subsets? #6

@yx-lu

Description

@yx-lu

Thanks a lot for releasing this dataset.

I’m trying to use human_annotated_data as supervision labels and map it to the raw review text from the other subsets (ICLR2024, NIPS2023, F1000Journal, SemanticWebJournal), but I’m stuck on the join.

What I observed:

human_annotated_data rows contain only:
paper_id
submitted_at
metrics (keys like review__Overall_Quality, etc.)
The four raw subsets use different ID formats:
ICLR2024 / NIPS2023: OpenReview-style IDs (e.g., zzv4Bf50RW)
SemanticWebJournal: IDs like 3654-4868
F1000Journal: no obvious paper_id field aligned with human_annotated_data

Human_annotated_data.paper_id (e.g., small numeric strings like 42, 120) does not match IDs in those raw subsets.
Could you clarify the intended mapping?

Is there an official mapping file from human_annotated_data.paper_id to raw paper/review records?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions