Programmable OCR with Tesseract

Parsing documents and extracting text as well as running machine learning on them as part of a Python or JVM machine learning pipeline.

Python Libraries for Working with OCR

Tesseract Tools and UI to Tesseract

Basic Tessseract OCR

Install and Test

On a Macintosh running Homebrew
brew install tesseract 

For Centos Users
/usr/bin/yum --enablerepo epel-testing install tesseract.x86_64 tesseract-langpack-en.noarch 

Tesseract with Apache NiFi

Machine Learning and Deep Learning with Tesseract


