How to map human_annotated_data to raw subsets?

Thanks a lot for releasing this dataset.

I’m trying to use human_annotated_data as supervision labels and map it to the raw review text from the other subsets (ICLR2024, NIPS2023, F1000Journal, SemanticWebJournal), but I’m stuck on the join.

What I observed:

human_annotated_data rows contain only:
paper_id
submitted_at
metrics (keys like review_<reviewer>_Overall_Quality, etc.)
The four raw subsets use different ID formats:
ICLR2024 / NIPS2023: OpenReview-style IDs (e.g., zzv4Bf50RW)
SemanticWebJournal: IDs like 3654-4868
F1000Journal: no obvious paper_id field aligned with human_annotated_data

Human_annotated_data.paper_id (e.g., small numeric strings like 42, 120) does not match IDs in those raw subsets.
Could you clarify the intended mapping?

Is there an official mapping file from human_annotated_data.paper_id to raw paper/review records?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to map human_annotated_data to raw subsets? #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to map human_annotated_data to raw subsets? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions