Skip to content

rogerferrod/ENIMD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

ENIMD: European National Implementing Measures Dataset


Pairing EU directives and their national implementing measures: A dataset for semantic search

Roger Ferrod, Denys Amore Bondarenko, Davide Audrito, and Giovanni Siragusa

Published in Computer Law & Security Review, Volume 51, 2023.


📌 Overview

European Directives (EUDs) are binding upon Member States regarding the results to be achieved, but they leave the choice of form and methods to national authorities. Member States adopt ad hoc National Implementing Measures (NIMs) to transpose EUDs into domestic legislation, a process known as "legal harmonization".

ENIMD is a multilingual dataset designed specifically for legal semantic search and harmonization analysis. It pairs European Directives (EUDs) with their corresponding National Implementing Measures (NIMs) across five Member States.

The dataset provides an essential foundation for training models to automatically identify national laws that implement EU directives, distinguishing them from domestic legislation that does not.

⚖️ Legal Harmonization Task

The primary task enabled by ENIMD is Semantic Search / Retrieval:

  1. Query: An article from an EU Directive.
  2. Target: The specific article(s) in National Law that implement that directive.
  3. Challenge: The model must retrieve the correct implementation from a pool of ~900k national articles, most of which are unrelated.

💾 Dataset Structure

The dataset is organized into three components, catering to different research needs:

1. ML-dataset

A shuffled, machine-learning-ready collection of articles split into Train and Test sets.

  • Content: Pairs of EUD articles (Queries) and National Law articles (Documents).
  • Labels: Includes positive examples (NIMs) and negative examples (irrelevant national laws).
  • Metadata: Articles are labeled with the CELEX number, country code, and transposition hash.
  • Preprocessing: Filtered using an IDF-based method to remove boilerplate text (e.g., entry into force dates, financial clauses).

2. filtered

The parsed collection of articles where irrelevant/boilerplate provisions have been removed using the method described in the paper. This split is useful for analysis without the noise of administrative clauses.

3. raw

The full parsed collection of Directives and National Laws in their original structure (articles/paragraphs), without any filtering.


📊 Statistics

The dataset covers legislation from five EU Member States:

Country Language EUD Articles (Queries) National Corpus Articles
Italy Italian 11,514 135,221
France French 11,386 236,762
Spain Spanish 11,249 209,795
Ireland English 11,344 157,601
Austria German 11,837 199,781
Total Multilingual 57,330 939,160
  • Total Directives: 906
  • Total National Documents: 9,016
  • Challenge Ratio: ~88% of the national corpus consists of "irrelevant" laws (negative examples), providing a realistic and robust retrieval challenge.

💻 Usage

You can easily load the dataset using the Hugging Face datasets library:

from datasets import load_dataset

# Load the ML-ready dataset
dataset = load_dataset("rogerferrod/ENIMD", data_dir="ML-dataset")

# Example: Inspect the first training example
print(dataset['train'][0])

Alternatively, you can download the raw files directly from Mendeley Data.


📖 Citation

If you use this dataset in your research, please cite the original paper:

@article{FERROD2023105862,
  title = {Pairing EU directives and their national implementing measures: A dataset for semantic search},
  journal = {Computer Law & Security Review},
  volume = {51},
  pages = {105862},
  year = {2023},
  issn = {2212-473X},
  author = {Roger Ferrod and Denys Amore Bondarenko and Davide Audrito and Giovanni Siragusa}
}

About

European National Implementing Measures Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages