Free OCR Software (Optical Character Recognition)

Convert Scanned Images with Text to Pure Text Documents


Free OCR Software (Optical Character Recognition)

Free OCR software are programs that will take an image file containing text (words) and generate a text document containing those words. You usually get such pictures containing text when you scan a document using a scanner. In general, these programs don't do well if the text on your page does not stand out clearly from its background, nor if the fonts used are highly stylised.

Some OCR programs can be trained. That is, you can get it to scan some text, and then you teach the software what those characters are. In this way, the program is able to learn the shape of each of the characters even from unusual fonts. Many, if not most, of the OCR software also consult a dictionary of words for that language when converting.

Note: that OCR software often come free with your scanner or all-in-one machine (ie, printer, scanner and copier combined), so you may want to see if you already have such a program before rushing out to download one. The ones bundled with your scanner are usually limited versions of commercial software, and can sometimes work better than the free ones listed here (or as well as OCRs can be expected to work given the current state of technology).

If you are looking for full-blown commercial OCR software, one possibility is to check out OmniPage Professional.

Related Pages

Free OCR Software (Optical Character Recognition)

Tesseract OCR (Windows, Linux)

Currently sponsored by Google and originally developed by Hewlett Packard, this open source OCR program works under Windows and Linux. It can recognize 6 languages, is fully UTF-8 capable, is able to detect fixed pitch vs proportional pitch fonts, and can be trained. It takes a TIF image file as input (but if you need to, you can always convert your images from other formats using one of the free image and photo editing programs available). At the time I write this, the program can only handle text in a single column.

GOCR (Linux, Windows, OS/2)

GOCR is an OCR program that converts scanned images of text into a text file. It is multiplatform and is released under the open source GNU General Public License. Executables (or binaries) are available for Linux, Windows and OS/2. This is a command line program.

Ocropus (Linux)

Ocropus is a document analysis and OCR system that uses plugins for its character recognition engine and has layout analysis and statistical natural language modelling, multi-lingual capabilities. The OCR engine uses Tesseract (see elsewhere on this page). It comes in source code form, so you will have to compile it yourself.

Ocrad: The GNU OCR (Linux)

Ocrad is a command line OCR utility that accepts files in the format of pbm, pgm, or ppm. It is able to handle multi-column texts or blocks of text. The program is available only in source code form.

Ocre (Linux)

This open source tool runs from the command line and you're supposed to able to integrate it with a spell checker. The program accepts pgm and pbm files as input and sends the output to stdout (the terminal window).

Microsoft Office Document Imaging (Windows, Mac OS X)

If you use Microsoft Office, you will probably already have this tool on your system. (Although it doesn't have a separate free download, it is listed here since many people already have this software on their system, and are not aware of the existence of this utility.) Windows users can find it in "Microsoft Office\Microsoft Office Tools" on the Start menu.

Related Pages

Can't Find What You're Looking For?

Search the site using Google.

Newest Pages

Popular Pages

How to Link to This Page

It will appear on your page as:

Free OCR Software (Optical Character Recognition)

 

thefreecountry.com Free Programmers, Webmasters and Security Software

 


 

thesitewizard.com: Free Webmaster Tutorials, Scripts and Articles

HowtoHaven.com: Free How-To Guides

thefreecountry.com: Free Programmers, Webmasters and Security Resources
If you find this site useful, please link to us.