Skip to content

MaxHalford/orc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

orc ๐ŸงŒ

orc is a tool for parsing structured information from (messy) OCR outputs. This toolkit doesn't use fancy deep learning models. It focuses on simple and efficient algorithms that are practical enough to be used in battle.

Usage

fuzz: fuzzy string matching ๐Ÿ˜ถโ€๐ŸŒซ๏ธ

This modules focuses on approximate string matching. Not only does it give the ability to calculate distances between words, it also records the operations that were performed to transform one word into another.

spell: spell checking ๐Ÿ“

ocr: optical character recognition ๐Ÿ”ฌ

lines: line segmentation ๐Ÿ“

Development

git clone https://github.com/MaxHalford/orc
cd orc
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

About

๐ŸงŒ Parsing structured information from OCR outputs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •