Welcome to Tesserwrap’s documentation!

Tesserwrap is a ctypes/capi wrapper for Tesseract OCR.

class tesserwrap.Tesseract(datadir='', lang='eng')

Tesseract OCR object.

  • datadir – Tesseract data-directory with Tesseract training data.
  • lang – The language of the image(s) to be OCRed.

A simple example:

>>> from tesserwrap import Tesseract
>>> from PIL import Image

>>> img = Image.open("test.png")
>>> tr = Tesseract()
>>> tr.ocr_image(img)
'The quick brown fox jumps ove\n\n'

Clear the tesseract Image, and clean up any Tesseract run-data.


Returns the page analysis mode from Tesseract


Get the bounding rectangle that tesseract is looking at inside of the image.


Get the text of the OCR’d image as a byte-string


Get the text of the OCR’d image as a string.

This function is kept for backwards compatability with the 0.0 version of tesserwrap.


OCR an image returning the UTF8 text data.

Parameters:image – image Image to be OCR’d by tesseract.

Takes a PIL Image and loads it into Tesseract for further operations.

Note:: This function will automatically convert the image to Grayscale.

Parameters:image – image Image to use in tesseract.

Set the page layout analysis mode.

Parameters:mode – integer The page layout analysis mode. See PageSegMode class for options
set_rectangle(left, top, width, height)

Set the OCR detection bounding-box.

  • left – integer Pixels offset right from left of the image.
  • top – integer Pixels offset down from the top of the image.
  • width – integer Width of the bounding-box.
  • height – integer Height of the bounding-box.

Indices and tables