Technique for scanning in documentation

From: Eric Smith <eric_at_brouhaha.com>
Date: Thu Jan 16 02:53:02 2003

Tim Mann wrote:
> G4 compression is pretty poor. I really like DjVu format instead of
> PDF. It uses much better compression technology (JBIG2 for bitonal
> black/white and fractal compression for color images) and gives better
> legibility as well.

Actually, DjVu doesn't use JBIG2, it uses JB2, which was AT&T's proposal
to the JBIG committee which was rejected. (That doesn't mean that
there's anything wrong with it.) Note that JB2 as normally used is
NOT lossless even for bilevel images. It *can* be used for lossless
encoding, but then it doesn't do much better than G4. So all the available
DjVu tools use lossy encoding.

The "better legibility" is debatable. Note that it is fundamentally
impossible for a lossy compression algorithm to give a more accurate
reproduction than a lossless one, however more accurate reproduction
does not necesarily imply better legibility. In general I prefer to
use lossless compression in the hope that OCR technology will eventually
be good enough to be useful. It's always possible to run the file
through a legibility filter later, but once you've used lossy compression
you can't later decide you prefer accuracy.

I've compared scanned docs encoded in G4 and JB2/DjVu, and when printed
out the G4 looked slightly better to me. On-screen there was less of
a noticable difference. It may be that the commercial DjVu viewer does a
better job of scaling for display than Acrobat Reader, but that's entirely
orthogonal to the compression algorithm used.

My main problems with DjVu are:

1) Everyone already has Acrobat. Expecting people to install a new
    application to view documents I scan seems rather presumptuous.

2) The compression in DjVu is patented. There is a somewhat wishy-washy
    patent license for GPL'd software that may cover the AT&T patents,
    however the base compression technology is encumbered by other non-AT&T
    patents on arithmetic coding and such. All the patents on G4
    compression have long since lapsed.

3) The free software to encode files into DjVu does NOT have anywhere
    near as good an implementation of JB2 as the commercial version, so
    the compression ratios are NOT the great ones claimed on the DjVu
    sites.

4) The company that now owns the DjVu technology has withdrawn the
    commercial versions from distribution. Their web site suggests that
    it's possible to buy them directly, but no pricing is given, nor is
    there on-line ordering. The old prices were quite high as compared
    to Acrobat.

5) PDF format is well documented, and there are MANY tools for dealing
    with it in various ways. For DjVu, there's not much.

DjVu does have one major advantage for web access, which is that it is
better for online browsing, *if* the user already has a viewer
application installed.

In my opinion, even if there is a slight technical advantage to JB2 and
DjVu, it is outweighed by the disadvantages I've listed above.

> I think Acrobat 5 may also have JBIG2 compression, though earlier
> versions didn't,

A recent version of the PDF file format added JBIG compression, but I'm
not sure about JBIG2. I would avoid both and stick to G4.
Received on Thu Jan 16 2003 - 02:53:02 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:01 BST