Turn any scanned document into searchable text using OCR.

  DOWNLOAD Free

Do not let its name fool you – VietOCR.NET is not limited to recognizing Vietnamese text from a scanned document and turning it into searchable text; actually, it works with any language that Tesseract can work with.

This open-source tool is actually a GUI for Tesseract, an open-source OCR engine that supports dozens of languages – all you have to do is download the corresponding module from the program and use it for the language you need.

The fact that its developer is apparently a Vietnamese person, that the program comes with Vietnamese and English as the only supported languages, and that it is presented as an OCR tool for the Vietnamese language should not stop you from trying this free OCR utility. Tesseract recognizes more than 100 languages, including all the most widely used. Note, however, that to turn any scanned document into PDF files – regardless of the language – you will need to have GPL Ghostscript installed on your computer.

Praising the quality of the text rendered by the OCR engine would be praising the many qualities of Tesseract (the most renowned open-source OCR engine out there), which I think it is outside the scope of this review. As a GUI for that engine, VietOCR.NET is also a neat piece of work, though it still needs polishing certain areas, such as the OCR language selection, whose drop-down menu mixes languages and even disappears from view sometimes.

The tool opens in a two-panel interface – the one on the left is for the original document, while the panel on the right will present you with the results of the OCR operation. Each panel has its own options to help you reach the best result. If you happen to have a scanner connected to your computer, you can scan a document into the program directly from here, and you can then zoom in and out the original image, deskew it, remove speckles, and rotate it for a better OCR scan. Once the text is presented on the right-hand panel, you can find and replace text from here (and thus check the quality of the OCR engine), spell-check the contents or remove any line breaks and change the case and the font.

The program offers you some other interesting tools that make this free utility certainly worth having installed on your computer, such as a PDF to TIFF converter, a TIFF merger and splitter, a number of page segmentation modes, various OCR engine modes, and even the possibility of “watching” a directory so that as soon as a new file is stored in it, the program can convert it for you into text or PDF.

Pros

  • Uses Tesseract as the OCR engine
  • Works with any language supported by Tesseract
  • Includes some basic format options
  • Offers a search engine to find text

Cons

  • The interface shows some design flaws, i.e., the OCR language selection
  • Requires GPL Ghostscript to install language packages
This program received 2 awards
  DOWNLOAD Free
Specifications
Developer:
Quan Nguyen
License type:
Open source
Related stories