OCR

OCR (Optical Character Recognition) can convert picture to text there are gocr, ocrad, tesseract

In general pictures scanned as lineart can be used.

tesseract and gImageReader

tesseract has its origin at HP and is quite advanced. It supports many file formats as png and it also supports languages (as deu for German) to recognize characters as äöü and also german words. If no output file extensions is given, then it produces a <name>.txt file. tesseract <name>.png <name> -l deu

gImageReader is a graphical frontend and is available for gtk and Qt. To start gtk gimagereader-gtk

A resolution of 200 DPI is good enough, too high resolutions as 1200 might block the application.

ocrad

ocrad --format=utf8 <name>.pbm

gocr

For gocr test with different resolutions, too high resolution does not give the best results.

Too low obviously also not. Around 600-800 DPI is ok. Choose a format that gocr understands as pnm that gives rather big size pictures. gocr <name>.pnm > <name>.txt


Linurs startpage