Skip to content

jspast/cells2table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cells2table

Parsing tables in document images with cell detection models

Implemented pipelines

PaddlePaddle

  • Classification model (wired / wireless)
  • Cell detection model with different weights for each class

Uses ONNX weights downloaded automatically from Hugging Face on first use.

Instalation

With uv, add to your project with:

uv add cells2table
Optional Description
docling For docling usage
huggingface For downloading models

Usage

cells2table only extract structural information from the tables. Another library is needed to extract content from the cells.

Docling

A docling plugin is provided to allow integrating cells2table in a complete pipeline.

Usage example:

from cells2table.docling import CustomDoclingTableStructureOptions

pipeline_options = PdfPipelineOptions(
    allow_external_plugins=True,
    table_structure_options=CustomDoclingTableStructureOptions(),
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
        InputFormat.IMAGE: ImageFormatOption(pipeline_options=pipeline_options),
    }
)

result = converter.convert("path/to/document.pdf")
print(result.document.export_to_markdown())

About

Table image parsing with cell detection models

Topics

Resources

License

Stars

Watchers

Forks

Contributors