Free OCR Software (Optical Character Recognition)

Convert Scanned Images with Text to Pure Text Documents


Free OCR Software (Optical Character Recognition)

Free OCR software are programs that will take an image file containing text (words) and generate a text document containing those words. You usually get such pictures containing text when you scan a document using a scanner. In general, these programs don't do well if the text on your page does not stand out clearly from its background, nor if the fonts used are highly stylised.

Some OCR programs can be trained. That is, you can get it to scan some text, and then you teach the software what those characters are. In this way, the program is able to learn the shape of each of the characters even from unusual fonts. Many, if not most, of the OCR software also consult a dictionary of words for that language when converting.

Note: that OCR software often come free with your scanner or all-in-one machine (ie, printer, scanner and copier combined), so you may want to to see if you already have such a program before rushing out to download one. The ones bundled with your scanner are usually limited versions of commercial software, and can sometimes work better than the free ones listed here (or as well as OCRs can be expected to work given the current state of technology).

If you are looking for full-blown commercial OCR software, probably one of the most well-known one is ABBYY FineReader. Another possibility is OmniPage Professional.

Related Pages

Disclaimer

The information provided on this page comes without any warranty whatsoever. Use it at your own risk. Just because a program, book, document or service is listed here or has a good review does not mean that I endorse or approve of the program or of any of its contents. All the other standard disclaimers also apply.

Free OCR Software (Optical Character Recognition)

TopOCR: Free OCR for Digital Cameras (Windows)

This free OCR program is designed especially for recognizing text from the poorer quality images that come from digital cameras or smartphones, since such images can have variable lighting conditions. Your camera needs to have a minimum of 3 megapixels resolution though. It can, of course, also be used for scanned images (ie, obtained from scanners). TopOCR supports 11 languages, including English, Danish, Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish and Swedish. It is able to obtain images directly from your scanner or camera, or, if you wish, you can drag and drop files onto the application window. File format supported include JPEG, TIFF, GIF and BMP. The program is able to handle images containing a mixture of text and graphics. This is a Windows program.

Tesseract OCR (Windows, Linux)

Currently sponsored by Google and originally developed by Hewlett Packard, this open source OCR program works under Windows and Linux. It can recognize 6 languages, is fully UTF-8 capable, is able to detect fixed pitch vs proportional pitch fonts, and can be trained. It takes a TIF image file as input (but if you need to, you can always convert your images from other formats using one of the free image and photo editing programs available. At the time I write this, the program can only handle text in a single column.

GOCR (Linux, Windows, OS/2)

GOCR is an OCR program that converts scanned images of text into a text file. It is multiplatform and is released under the open source GNU General Public License. Executables (or binaries) are available for Linux, Windows and OS/2. This is a command line program.

Ocropus (Linux)

Ocropus is a document analysis and OCR system that uses plugins for its character recognition engine and has layout analysis and statistical natural language modelling, multi-lingual capabilities. The OCR engine uses Tesseract (see elsewhere on this page). It comes in source code form, so you will have to compile it yourself.

Ocrad: The GNU OCR (Linux)

Ocrad is a command line OCR utility that accepts files in the format of pbm, pgm, or ppm. It is able to handle multi-column texts or blocks of text. The program is available only in source code form.

Ocre (Linux)

This open source tool runs from the command line and you're supposed to able to integrate it with a spell checker. The program accepts pgm and pbm files as input and sends the output to stdout (the terminal window).

Microsoft Office Document Imaging (Windows, Mac OS X)

If you use Microsoft Office, you will probably already have this tool on your system. (Although it doesn't have a separate free download, it is listed here since many people already have this software on their system, and are not aware of the existence of this utility.) Windows users can find it in "Microsoft Office\Microsoft Office Tools" on the Start menu.

Related Pages

Can't Find What You're Looking For?

Search the site using Google.

Newest Pages

Popular Pages

How to Link to This Page

It will appear on your page as:

Free OCR Software (Optical Character Recognition)

 

thefreecountry.com Free Programmers, Webmasters and Security Resource Site

 


 

thesitewizard.com: Free Webmaster Tutorials, Scripts and Articles

HowtoHaven.com: Free How-To Guides

thefreecountry.com: Free Programmers, Webmasters and Security Resources
If you find this site useful, please link to us.