Welcome to OCR utils’s documentation!¶
OCR utils¶
Python tools for interacting with Tesseract
Features¶
Detects tables in PDF/images and performs OCR on each cell
Performs OCR on PDF and generates SVG image
Quick Start¶
from ocr_utils import pdf_to_svg
pdf_to_svg(
input_filename='in.pdf',
output_filename='out.svg',
detect_tables=True,
lang='eng',
)
Installation¶
Stable Release: pip install tesseract_ocr_utils
Development Head: pip install git+https://github.com/envinorma/ocr_utils.git
This library is built upon pytesseract and pdf2image which have non-pip requirements. Visit these libraries installation pages to install dependencies.
For example, on ubuntu, the following libraries need to be installed:
apt-get install libarchive13
apt-get install tesseract-ocr
apt-get install poppler-utils
Documentation¶
For full package documentation please visit envinorma.github.io/ocr_utils.