GitHub - hanyeonjee/evidence-units: Official evaluation code & QA pairs for "Evidence Units: Ontology-Grounded Document Organization for Parser-Independent Retrieval"

English

Evidence Units is a parser-independent document organization framework that groups visual assets with their contextual text into semantically complete retrieval units — achieving consistent retrieval gains across any document parser.

🔍 What is an Evidence Unit?

An Evidence Unit (EU) is a semantically complete document unit that groups visual assets (tables, charts, figures) with their contextual text (captions, headers, labels, paragraphs) — constructed through ontology-grounded normalization that works regardless of which document parser you use.

┌─────────────────────────────────────┐
│  section_header  "2.2 Methods"      │
│  table           [HTML data]        │  ← Evidence Unit
│  unit_label      "(Unit: mg/L)"     │
│  support_para    "As shown above…"  │
└─────────────────────────────────────┘

Key property: EU spatial footprints converge across parsers (DocuSee, MinerU, Docling, etc.) even when individual bounding boxes differ — making downstream retrieval parser-independent.

📦 This Repository

This repo releases the evaluation code and QA pairs used in the paper.

File	Description
`eval_retrieval_combined.py`	Retrieval evaluation script (LCS, Recall@K, MinK)
`qas_en.json`	1,551 QA pairs generated from OmniDocBench v1.0

Full EU construction pipeline is not included in this release.

🚀 Quick Start

git clone https://github.com/hanyeonjee/evidence-units
cd evidence-units
pip install sentence-transformers numpy

# Baseline evaluation (GT annotations, element-level)
python eval_retrieval.py \
    --gt   OmniDocBench.json \
    --qas  qas_en.json \
    --output results/

# Cross-parser evaluation with pre-computed EU outputs
python eval_retrieval.py \
    --gt              OmniDocBench.json \
    --qas             qas_en.json \
    --output          results/ \
    --docling-eu-dir  path/to/eu_docling \
    --mineru-eu-dir   path/to/eu_mineru

📊 Results on OmniDocBench (471 English-primary pages · 1,551 QA pairs, Strict protocol)

Method	Avg LCS	Recall@1	MinK ↓
w/o EU (baseline)	0.501	0.150	2.58
w/ EU (ours)	0.797	0.507	1.74
Δ	+0.296	+0.357	−0.84

Cross-parser consistency: ΔLCS between +0.23 and +0.30 across MinerU, DocuSee, and Docling. Downstream gain on DocVQA ANLS: +0.05 to +0.10 across all three parsers.

🗂️ QA Pair Format

{
  "qa_id": "omnidoc_table_0042",
  "type": "table",
  "question": "Table 1. Water quality in the experiments.",
  "evidence_node_ids": ["node_012", "node_013", "node_014"],
  "page_id": "scihub_page_002"
}

type is one of table · figure · text.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
README.md		README.md
eval_retrieval.py		eval_retrieval.py
qas_en.json		qas_en.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 What is an Evidence Unit?

📦 This Repository

🚀 Quick Start

📊 Results on OmniDocBench (471 English-primary pages · 1,551 QA pairs, Strict protocol)

🗂️ QA Pair Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 What is an Evidence Unit?

📦 This Repository

🚀 Quick Start

📊 Results on OmniDocBench (471 English-primary pages · 1,551 QA pairs, Strict protocol)

🗂️ QA Pair Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages