Check the CHANGELOG file to have a global overview of the latest modifications! π
βββ architectures : utilities for model architectures
β βββ layers : custom layer implementations
β βββ transformers : transformer architecture implementations
β βββ common_blocks.py : defines common blocks (e.g., Conv + BN + ReLU)
β βββ crnn_arch.py : CRNN architecture
β βββ east_arch.py : EAST architecture
β βββ generation_utils.py : utilities for text and sequence generation
β βββ hparams.py : hyperparameter management
β βββ simple_models.py : defines classical models such as CNN / RNN / MLP and siamese
β βββ yolo_arch.py : YOLOv2 architecture
βββ custom_train_objects : custom objects used in training / testing
βββ loggers : logging utilities for tracking experiment progress
βββ models : main directory for model classes
β βββ detection : detector implementations
β β βββ base_detector.py : abstract base class for all detectors
β β βββ east.py : EAST implementation for text detection
β β βββ yolo.py : YOLOv2 implementation for general object detection
β βββ interfaces : directories for interface classes
β βββ ocr : OCR implementations
β β βββ base_ocr.py : abstract base class for all OCR models
β β βββ crnn.py : CRNN implementation for OCR
β βββ weights_converter.py : utilities to convert weights between different models
βββ tests : unit and integration tests for model validation
βββ utils : utility functions for data processing and visualization
βββ LICENCE : project license file
βββ ocr.ipynb : notebook demonstrating model creation + OCR features
βββ README.md : this file
βββ requirements.txt : required packagesCheck the main project for more information about the unextended modules / structure / main classes.
Check the detection project for more information about the detection module and the EAST Scene-Text Detection model.
- OCR (module
models.ocr) :
| Feature | Function / class | Description |
|---|---|---|
| OCR | ocr |
Performs OCR on the given image(s) |
You can check the ocr notebook for a concrete demonstration.
Available architectures :
| Classes | Dataset | Architecture | Trainer | Weights |
|---|
Models should be unzipped in the pretrained_models/ directory!
The pretrained CRNN models come from the EasyOCR library. Weights are automatically downloaded given the language or the model name, and converted to keras! The easyocr library is therefore not required, while pytorch is required for weights loading (for conversion).
The pretrained version of EAST can be downloaded from this project. It should be placed in pretrained_models/pretrained_weights/east_vgg16.pth (torch is required to convert the weights: pip install torch).
See the installation guide for a step-by-step installation π
Here is a summary of the installation procedure, if you have a working python environment :
- Clone this repository:
git clone https://github.com/yui-mhcp/ocr.git - Go to the root of this repository:
cd ocr - Install requirements:
pip install -r requirements.txt - Open the
ocrnotebook and follow the instructions!
- Make the TO-DO list
- Convert the
CRNNarchitecture / weights from theeasyocrlibrary totensorflow - Convert the
CRNN + attentionarchitecture from this repo totensorflow - Add examples to initialize pretrained models (both EAST and CRNN)
- Add an example to perform OCR on image (with text detection)
- Add an example to perform OCR on camera
- Allow to combine texts in lines / paragraphs (as EAST detects individual words)
- Take into account the text rotation in the combination procedure
The code for the CRNN architecture is highly inspired from the easyocr repo:
- EasyOCR library: official repo of the
easyocrlibrary
The code for the EAST part of this project is highly inspired from this repo:
-
SakuraRiven pytorch implementation: pytorch implementation of the EAST paper.
-
Awesome-OCR : A curated list of OCR resources
-
Tesseract OCR : The official Tesseract repository
-
Deep Text Recognition Benchmark : A comprehensive benchmark of Scene Text Recognition models
-
CRAFT-pytorch : Character Region Awareness for Text Detection
-
mmocr : OpenMMLab Text Detection, Recognition and Understanding Toolbox
- An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition : the original CRNN paper
- What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis : a great benchmark of OCR models + an open-source repository with pretrained models and datasets
- U-Net: Convolutional Networks for Biomedical Image Segmentation : U-net original paper
- EAST: An Efficient and Accurate Scene Text Detector : text detection (with possibly rotated bounding-boxes) with a segmentation model (U-Net).
- COCO Text: an extension of COCO for text detection
- ICDAR 2015: a standard dataset for text detection and recognition
- Synthetic Word Dataset: synthetic word dataset for OCR training
- A Comprehensive Guide to OCR with Tesseract, OpenCV and Python : A great introduction to classical OCR approaches
- Scene Text Detection with OpenCV : Tutorial on implementing EAST text detector
- Attention Mechanisms in OCR : How attention mechanisms improve OCR accuracy
Contacts:
- Mail:
yui-mhcp@tutanota.com - Discord: yui0732
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.
For more information about the AGPL-3.0 license, please visit the official website
If you find this project useful in your work, please add this citation to give it more visibility! π
@misc{yui-mhcp
author = {yui},
title = {A Deep Learning projects centralization},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/yui-mhcp}}
}