[sane-devel] scanning for archival and OCR

Tue Jan 22 18:24:51 UTC 2013

The perl application gscan2pdf  will probably do what you need:
http://gscan2pdf.sourceforge.net/

I use a shell script "bscan" for scanning to pnm then conversion to e.g. pdf.
Since my scanner scans better in 8-bit grayscale then 2-bit B&W,
I scan in 8-bit grayscale @ 300dpi then convert to bitonal Black&White using 
djvu wavelet compression (option -BW in my script):
bscan --mode=8-bit --shades=2 --page=Legal --comp=lzw -BW FILE

Sometimes I may need to use a photo scanner with high optical resolution (e.g. 
an Epson with 24-bit grayscale). If I need to scan in color, I usually scan to 
pnm then convert to djvu using c44, e.g.:
bscan --mode=color --shades=truecolor --page=Letter -c44 --djvutopdf=25 FILE

http://www.acjlaw.net:8080/~jeremy/Ricoh/usage_bscan.html

I haven't had much luck with any of the open source OCR programs. Maybe max 
90% accuracy on straight B/W text with no logos, rules, underlines and all 
text horizontal and of the same font weight and shape.