manual file types

From: Antonio Carlini <>
Date: Mon Jan 31 17:53:46 2005

> TIFF file (big)

The general consensus seems to be that bi-level scanning
with a resolution of at least 300dpi but preferably 400dpi
(although I tend to use 600dpi). G4 encoded TIFF is pretty
good space wise (obviously lously compared to text).

> OCRed ASCII text (ugly)

OCR is (almost) certain to introduce errors. You'll need a
significant investment in proof-reading to fix this!

> compressed PostScript of OCRed text (depending on OCR, could be nice).

If you can OCR, then any format that can represent that text in
whatever fonts and layout the original document used (and uses
an efficient openly-documented format) should do. Most of my
text documents are PDF. You can turn PDF into text (or html I guess)
where appropriate.

But you cannot OCR (or at least, I bet you cannot OCR without
introducing errors).


Antonio Carlini
Received on Mon Jan 31 2005 - 17:53:46 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST