Parsing tables in document images with cell detection models
- Classification model (wired / wireless)
- Cell detection model with different weights for each class
Uses ONNX weights downloaded automatically from Hugging Face on first use.
With uv, add to your project with:
uv add cells2table| Optional | Description |
|---|---|
docling |
For docling usage |
huggingface |
For downloading models |
cells2table only extract structural information from the tables. Another library is needed to extract content from the cells.
A docling plugin is provided to allow integrating cells2table in a complete pipeline.
Usage example:
from cells2table.docling import CustomDoclingTableStructureOptions
pipeline_options = PdfPipelineOptions(
allow_external_plugins=True,
table_structure_options=CustomDoclingTableStructureOptions(),
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
InputFormat.IMAGE: ImageFormatOption(pipeline_options=pipeline_options),
}
)
result = converter.convert("path/to/document.pdf")
print(result.document.export_to_markdown())