comp.org.uk

Networking | Programming | Security | Linux | Computer Science | About

OCR in Ubuntu using tesseract and gimagereader

Scanning in an image and converting it to text is relatively straightforward in Linux provided you have the correct software installed. I plumped for Tesseract as it was reputedly the best command line OCR program but I also wanted to have a graphical user interface with it so I used gImageReader as a front-end to Tesseract.

Here’s how to install both of them for Optical Character Recognition in Ubuntu.

Firstly, install tesseract (and the associated language files if needed):

sudo apt-get install tesseract-ocr

Install a language file (e.g. -eng, -deu, -fra, -ita, -ndl, -por, -spa, …)

sudo apt-get install tesseract-ocr-eng

Next, install gImageReader as a frontend to tesseract.

Add the application repository:

sudo add-apt-repository ppa:sandromani/gimagereader

Update the repository sources

sudo apt-get update

Install the application

sudo apt-get install gimagereader

Now you should be ready to go. gImageReader can be accessed on your graphics menu. Happy Character Recognising!


Published on Sat 23 April 2016 by Gary Hall in Linux with tag(s): ubuntu tesseract gimagreader ocr