😋 Optical Character Recognition (OCR)

Check the CHANGELOG file to have a global overview of the latest modifications! 😋

Project structure

├── architectures            : utilities for model architectures
│   ├── layers               : custom layer implementations
│   ├── transformers         : transformer architecture implementations
│   ├── common_blocks.py     : defines common blocks (e.g., Conv + BN + ReLU)
│   ├── crnn_arch.py         : CRNN architecture
│   ├── east_arch.py         : EAST architecture
│   ├── generation_utils.py  : utilities for text and sequence generation
│   ├── hparams.py           : hyperparameter management
│   ├── simple_models.py     : defines classical models such as CNN / RNN / MLP and siamese
│   └── yolo_arch.py         : YOLOv2 architecture
├── custom_train_objects     : custom objects used in training / testing
├── loggers                  : logging utilities for tracking experiment progress
├── models                   : main directory for model classes
│   ├── detection            : detector implementations
│   │   ├── base_detector.py : abstract base class for all detectors
│   │   ├── east.py          : EAST implementation for text detection
│   │   └── yolo.py          : YOLOv2 implementation for general object detection
│   ├── interfaces           : directories for interface classes
│   ├── ocr                  : OCR implementations
│   │   ├── base_ocr.py      : abstract base class for all OCR models
│   │   └── crnn.py          : CRNN implementation for OCR
│   └── weights_converter.py : utilities to convert weights between different models
├── tests                    : unit and integration tests for model validation
├── utils                    : utility functions for data processing and visualization
├── LICENCE                  : project license file
├── ocr.ipynb                : notebook demonstrating model creation + OCR features
├── README.md                : this file
└── requirements.txt         : required packages

Check the main project for more information about the unextended modules / structure / main classes.

Check the detection project for more information about the detection module and the EAST Scene-Text Detection model.

Available features

OCR (module models.ocr) :

Feature	Function / class	Description
OCR	`ocr`	Performs OCR on the given image(s)

You can check the ocr notebook for a concrete demonstration.

Available models

Model architectures

Available architectures :

detection :
- EAST
OCR :
- CRNN

Model weights

Classes	Dataset	Architecture	Trainer	Weights

Models should be unzipped in the pretrained_models/ directory!

The pretrained CRNN models come from the EasyOCR library. Weights are automatically downloaded given the language or the model name, and converted to keras! The easyocr library is therefore not required, while pytorch is required for weights loading (for conversion).

The pretrained version of EAST can be downloaded from this project. It should be placed in pretrained_models/pretrained_weights/east_vgg16.pth (torch is required to convert the weights: pip install torch).

Installation and usage

See the installation guide for a step-by-step installation 😄

Here is a summary of the installation procedure, if you have a working python environment :

Clone this repository: git clone https://github.com/yui-mhcp/ocr.git
Go to the root of this repository: cd ocr
Install requirements: pip install -r requirements.txt
Open the ocr notebook and follow the instructions!

TO-DO list:

Make the TO-DO list
Convert the CRNN architecture / weights from the easyocr library to tensorflow
Convert the CRNN + attention architecture from this repo to tensorflow
Add examples to initialize pretrained models (both EAST and CRNN)
Add an example to perform OCR on image (with text detection)
Add an example to perform OCR on camera
Allow to combine texts in lines / paragraphs (as EAST detects individual words)
Take into account the text rotation in the combination procedure

Notes and references

GitHub projects

The code for the CRNN architecture is highly inspired from the easyocr repo:

EasyOCR library: official repo of the easyocr library

The code for the EAST part of this project is highly inspired from this repo:

SakuraRiven pytorch implementation: pytorch implementation of the EAST paper.
Awesome-OCR : A curated list of OCR resources
Tesseract OCR : The official Tesseract repository
Deep Text Recognition Benchmark : A comprehensive benchmark of Scene Text Recognition models
CRAFT-pytorch : Character Region Awareness for Text Detection
mmocr : OpenMMLab Text Detection, Recognition and Understanding Toolbox

Papers

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition : the original CRNN paper
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis : a great benchmark of OCR models + an open-source repository with pretrained models and datasets
U-Net: Convolutional Networks for Biomedical Image Segmentation : U-net original paper
EAST: An Efficient and Accurate Scene Text Detector : text detection (with possibly rotated bounding-boxes) with a segmentation model (U-Net).

Datasets

COCO Text: an extension of COCO for text detection
ICDAR 2015: a standard dataset for text detection and recognition
Synthetic Word Dataset: synthetic word dataset for OCR training

Tutorials

A Comprehensive Guide to OCR with Tesseract, OpenCV and Python : A great introduction to classical OCR approaches
Scene Text Detection with OpenCV : Tutorial on implementing EAST text detector
Attention Mechanisms in OCR : How attention mechanisms improve OCR accuracy

Contacts and licence

Contacts:

Mail: yui-mhcp@tutanota.com
Discord: yui0732

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.

This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.

For more information about the AGPL-3.0 license, please visit the official website

Citation

If you find this project useful in your work, please add this citation to give it more visibility! 😋

@misc{yui-mhcp
    author  = {yui},
    title   = {A Deep Learning projects centralization},
    year    = {2021},
    publisher   = {GitHub},
    howpublished    = {\url{https://github.com/yui-mhcp}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

😋 Optical Character Recognition (OCR)

Project structure

Available features

Available models

Model architectures

Model weights

Installation and usage

TO-DO list:

Notes and references

GitHub projects

Papers

Datasets

Tutorials

Contacts and licence

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
architectures		architectures
custom_train_objects		custom_train_objects
docker		docker
loggers		loggers
models		models
tests		tests
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
README.md		README.md
ocr.ipynb		ocr.ipynb
requirements.txt		requirements.txt
text.jpg		text.jpg

Folders and files

Latest commit

History

Repository files navigation

😋 Optical Character Recognition (OCR)

Project structure

Available features

Available models

Model architectures

Model weights

Installation and usage

TO-DO list:

Notes and references

GitHub projects

Papers

Datasets

Tutorials

Contacts and licence

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages