English
Evidence Units is a parser-independent document organization framework that groups visual assets with their contextual text into semantically complete retrieval units — achieving consistent retrieval gains across any document parser.
An Evidence Unit (EU) is a semantically complete document unit that groups visual assets (tables, charts, figures) with their contextual text (captions, headers, labels, paragraphs) — constructed through ontology-grounded normalization that works regardless of which document parser you use.
┌─────────────────────────────────────┐
│ section_header "2.2 Methods" │
│ table [HTML data] │ ← Evidence Unit
│ unit_label "(Unit: mg/L)" │
│ support_para "As shown above…" │
└─────────────────────────────────────┘
Key property: EU spatial footprints converge across parsers (DocuSee, MinerU, Docling, etc.) even when individual bounding boxes differ — making downstream retrieval parser-independent.
This repo releases the evaluation code and QA pairs used in the paper.
| File | Description |
|---|---|
eval_retrieval_combined.py |
Retrieval evaluation script (LCS, Recall@K, MinK) |
qas_en.json |
1,551 QA pairs generated from OmniDocBench v1.0 |
Full EU construction pipeline is not included in this release.
git clone https://github.com/hanyeonjee/evidence-units
cd evidence-units
pip install sentence-transformers numpy# Baseline evaluation (GT annotations, element-level)
python eval_retrieval.py \
--gt OmniDocBench.json \
--qas qas_en.json \
--output results/# Cross-parser evaluation with pre-computed EU outputs
python eval_retrieval.py \
--gt OmniDocBench.json \
--qas qas_en.json \
--output results/ \
--docling-eu-dir path/to/eu_docling \
--mineru-eu-dir path/to/eu_mineru| Method | Avg LCS | Recall@1 | MinK ↓ |
|---|---|---|---|
| w/o EU (baseline) | 0.501 | 0.150 | 2.58 |
| w/ EU (ours) | 0.797 | 0.507 | 1.74 |
| Δ | +0.296 | +0.357 | −0.84 |
Cross-parser consistency: ΔLCS between +0.23 and +0.30 across MinerU, DocuSee, and Docling. Downstream gain on DocVQA ANLS: +0.05 to +0.10 across all three parsers.
{
"qa_id": "omnidoc_table_0042",
"type": "table",
"question": "Table 1. Water quality in the experiments.",
"evidence_node_ids": ["node_012", "node_013", "node_014"],
"page_id": "scihub_page_002"
}type is one of table · figure · text.