manual file types

From: Paul Koning <>
Date: Tue Feb 1 09:30:17 2005

>>>>> "John" == John Allain <> writes:

>> The general consensus seems to be that bi-level scanning with a
>> resolution of at least 300dpi but preferably 400dpi

 John> Not Here. I was once fooled by the following logic: 'well it
 John> was printed in bilevel, so there's no sense in going above
 John> that'. What's wrong there is that there's need for subpixel
 John> information like for things that don't sit exactly on a pixel
 John> but a fraction to the side of it, and certainly for this:
 John> curved letters. I choose 300dpi grayscale but if I had to-had
 John> to use bilevel I'd go above 600dpi possibly all the way to
 John> 1200.

Exactly right. That's not just for OCR, but also for human
consumption. A 300 dpi grayscale scan is far easier to read than a
300 dpi bitonal scan. Obviously, they don't compress quite as well,
but with modern disks, so what?

A quick post-processing pass to do "levels" adjustment on a grayscale
scan can turn the background into almost all true white pixels (rather
than a mix of lots of light grays). That improves contrast, which
makes for easier reading. It also dramatically improves the
compression ratio.

Oh yes, never never never use JPEG for compressing text or line art
scans. For one thing, it butchers the image, and besides, the
compression isn't very effective. A scan with the page background
cleaned up to mostly-white will produce a SMALLER file if compressed
by lossless compression in TIFF, PNG, or GIF, than it does when
compressed with the lossy JPEG. (The "P" in JPEG stands for
"photography" -- and indeed JPEG is ONLY fit for photographs and
similar continuous-tone images, and not for any other kind.)

Received on Tue Feb 01 2005 - 09:30:17 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:34 BST