[AI-generated news construction pipeline]
LEDE is a large-scale benchmark dataset for AI-generated news detection, comprising over 337K articles and approximately 4.3M sentences. It addresses the limitations of existing benchmarks by providing broader generator diversity and news-specific coverage across 21 state-of-the-art LLMs, two languages, and 17 news categories. LEDE serves as a valuable resource for advancing research on AI-generated text detection, cross-model generalization, multilingual robustness, and domain-aware evaluation.The dataset repository includes AI-generated news articles spanning multiple prompting strategies and news categories. For access to the full dataset, please refer to the Hugging Face repository below: https://huggingface.co/datasets/NeurIPS-2026-LEDE/LEDE-dataset
| Dataset | Venue | Including News | # News | # LLMs | # Category | # Language |
|---|---|---|---|---|---|---|
| M4 [paper] | EACL 2024 | ✓ (N%) | 12,000 | 2 | ✗ | 3 |
| MAGE [paper] | ACL 2024 | ✓ (N%) | 58,391 | 27 | ✗ | 1 |
| M4GT-Bench [paper] | ACL 2024 | ✓ (N%) | 19,100 | 4 | ✗ | 6 |
| RAID [paper] | ACL 2024 | ✓ (N%) | 726,240 | 11 | 5 | 1 |
| DetectRL [paper] | NeurIPS 2024 D&B | ✓ (N%) | 33,600 | 4 | ✗ | 1 |
| Beemo [paper] | NAACL 2025 | ✗ | -- | -- | -- | -- |
| M-DAIGT [paper] | RANLP 2025 Shared Task | ✓ (N%) | 7,000 | 6 | ✗ | 2 |
| LEDE | -- | ✓ (100%) | 337,322 | 21 | 17 | 2 |
LEDE is a large-scale multilingual benchmark for AI-generated news detection, designed to support robust evaluation across diverse LLMs, news categories, generation strategies, and languages.
- # of LLMs : 21
- # of Languages : 2 (Eng, Kor)
- # of Articles : 337,322
- # of Sentences : 4,309,153
- # of News Category : 17
- # of News Strategy : 4 (sc, ib, ng, we)
- # English Sentences : 2,393,518
- # Korean Sentences : 1,915,635
| Field | Description |
|---|---|
human_rid |
Identifier for the original human-written article. • AIHub datasets: uses the official AIHub dataset ID • English datasets: constructed as {first 4 words}-{last 4 words} from the original article |
human_fid |
Identifier for the corresponding fake/generated counterpart. • AIHub datasets: uses the official AIHub dataset ID • English datasets: constructed as {first 4 words}-{last 4 words} from the original article |
title |
Title of the AI-generated news article |
summary |
Summary of the AI-generated news article |
ai_article |
Full text of the AI-generated news article |
category |
News category/domain of the article (17 categories in total; e.g., politics, health, law, economy, sports) |
model |
Large Language Model (LLM) used for article generation (21 models in total) |
strategy |
Generation strategy used for article creation (sc, ib, ng, we) |
language |
Language of the generated article (Kor or Engs) |
num_sentences |
Number of sentences in the generated article |
num_words |
Number of words in the generated article |
To access the LEDE dataset, please visit the following link.
The LEDE dataset is available under the Creative Commons Attribution-NonCommercial 4.0 International Public License. Any violation of this license agreement may result in legal action. By downloading the HiDF, the user agrees to the terms of the CC BY-NC 4.0 license.
Please download all of the following datasets and store them in the human-written/ directory.
Each human-written article is aligned with its corresponding AI-generated article using the human_rid field.
- AI-Hub datasets: The original dataset ID is used directly.
- English datasets: IDs are constructed in the format {first 4 words}-{last 4 words} from the original article.
This mapping enables direct and consistent comparison between human-written and AI-generated texts during evaluation.
Run baseline model evaluation using either a single CSV file or a CSV directory. Below are sample commands for running zero-shot baseline evaluations.
$ git clone https://github.com/DSAIL-SKKU/LEDE.git
2-1. Fast-DetectGPT
Installation
- You can follow the official Fast-DetectGPT GitHub repository for installation details.
- Python3.8
- PyTorch1.10.0
Evaluate a CSV Directory
$ cd src/baselines/fast-detect-gpt
$ bash scripts/eval.sh --csv_dir /path/to/csv_dirEach file prints metrics in the following format:
n_pairs: XXXX
ROC AUC (criterion): 0.XXXX
PR AUC (criterion): 0.XXXX
The aggregated per-file metrics are saved to ./outputs/batch_eval/roc/ by default.
2-2. Binoculars
Installation
- You can follow the official Binoculars GitHub repository for installation details.
- Python3.8
- PyTorch1.10.0
Evaluate a Single CSV File
$ cd src/baselines/Binoculars/
$ bash eval.sh --csv_path /path/to/file.csvEvaluate a CSV Directory
$ cd src/baselines/Binoculars/
$ bash eval.sh --csv_dir /path/to/csv_dirEach file prints metrics in the following format:
[OK] <file>.csv | n=<rows> (eval=<evaluated_rows>) | ACC=0.XXXX ROC_AUC=0.XXXX PR_AUC=0.XXXX
The aggregated per-file metrics are saved to binoculars_csv_folder_metrics.csv by default.
In addition to the two base models described above, other AI-generated text detection models can be explored through their official GitHub repositories.
Zero-shot Modles
Supervised Models
The LEDE dataset is available under the Creative Commons Attribution-NonCommercial 4.0 International Public License: https://creativecommons.org/licenses/by-nc/4.0/. The code is released under the MIT license.