Thanks a lot for releasing this dataset.
I’m trying to use human_annotated_data as supervision labels and map it to the raw review text from the other subsets (ICLR2024, NIPS2023, F1000Journal, SemanticWebJournal), but I’m stuck on the join.
What I observed:
human_annotated_data rows contain only:
paper_id
submitted_at
metrics (keys like review__Overall_Quality, etc.)
The four raw subsets use different ID formats:
ICLR2024 / NIPS2023: OpenReview-style IDs (e.g., zzv4Bf50RW)
SemanticWebJournal: IDs like 3654-4868
F1000Journal: no obvious paper_id field aligned with human_annotated_data
Human_annotated_data.paper_id (e.g., small numeric strings like 42, 120) does not match IDs in those raw subsets.
Could you clarify the intended mapping?
Is there an official mapping file from human_annotated_data.paper_id to raw paper/review records?
Thanks a lot for releasing this dataset.
I’m trying to use human_annotated_data as supervision labels and map it to the raw review text from the other subsets (ICLR2024, NIPS2023, F1000Journal, SemanticWebJournal), but I’m stuck on the join.
What I observed:
human_annotated_data rows contain only:
paper_id
submitted_at
metrics (keys like review__Overall_Quality, etc.)
The four raw subsets use different ID formats:
ICLR2024 / NIPS2023: OpenReview-style IDs (e.g., zzv4Bf50RW)
SemanticWebJournal: IDs like 3654-4868
F1000Journal: no obvious paper_id field aligned with human_annotated_data
Human_annotated_data.paper_id (e.g., small numeric strings like 42, 120) does not match IDs in those raw subsets.
Could you clarify the intended mapping?
Is there an official mapping file from human_annotated_data.paper_id to raw paper/review records?