Sheet-feed scanners

From: Eric Smith <eric_at_brouhaha.com>
Date: Tue Dec 16 16:55:34 2003

Tom Jennings wrote:
> I store stuff at 150 dpi jpeg, maybe that's not enough resolution, but
> it prints out OK and reproduces detail OK.

While I very much appreciate the stuff you've put online, I respectfully
disagree with your "reproduced detail OK", at least in the general case.

For text and line art, JPEG *completely* sucks. It makes the text and
line art blurry. JPEG was intended for continuous-tone images, and
throws away high-frequency content. And guess what. Text and line
art have a LOT of high-frequency content. That's why it blurs them.

For text and line art, lossless bilevel compression such as ITU T.4 and
T.6 compression (known as Fax Group 3 and Group 4) are much better, as
are JBIG and JBIG2. G3 and G4 can be packaged in TIFF files. G3, G4,
and JBIG can be packaged in PDF files. JBIG and JBIG2 have slightly
better compression than G4, but they are patented so I avoid them.
G4 works quite well.

For pages that contain ONLY text and line art, there is literally
NO advantage to JPEG compression. It yields less compression and
worse results.

The thing that's really annoying when processing scanned documents
is dealing with pages that have a mix of text or line art and continuous
tone images. Although using JPEG for such pages is not optimal, it is
certainly the path of least resistance, so I don't complain about it.

A better solution (but much more difficult and/or time-consuming) is
to composite the final page image from a G4 image of the text and line
art and JPEG of the image(s). However, there are no freely available
tools to do this, and I'm not even aware of anything commercial that
can do this automatically. I've experimented some with putting support
for this form of compositing in my "tumble" program, but without
software to automatically identify the images, it would still be very
time-consuming.

Eric
Received on Tue Dec 16 2003 - 16:55:34 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:35:50 BST