Many things

From: Eric Smith <eric_at_brouhaha.com>
Date: Fri Jan 28 18:20:38 2005

Michael wrote:
> The problem is that PS recreated from PDF can never be as good as the
> original PS from which the PDF was made.

I wrote:
> Huh? If you have a PDF containing nothing but G4 fax compressed bilevel
> images, and convert it to PS, you get the same G4 images with a
> different wrapper. How is that any worse?

Jules wrote:
> of course it's still a bilevel image rather than greyscale, so you mess
> up a lot of diagrams and images within the documentation, and go a long
> way toward ruining the chances of someone running a useful OCR scan over
> the data at some future point :-(

We were comparing the merits of PDF vs. Postscript, and what happens
when you convert between them. The point was that most scanned manuals
are in lossless bilevel form, such as G4. That's a native image format
in both Postscript and PDF, so there should be no loss of data or
usefulness converting in either direction. (If there is, the conversion
program is arguably broken.)

Whether documents should be scanned in greyscale is an entirely
separate discussion. Both PDF and Postscript support the common
lossy and lossless greyscale and color encodings, so there should
not be any loss of data or usefulness converting those in either
direction.

Where converting from PS -> PDF -> PS is a bad idea is when the
original Postscript file contains a lot of complex native Postcript
rendering, in which case the conversion to PDF "flattens" it, and
converting back may yield suboptimal results. For archiving of
scanned documents this would not occur.

> I can't remember exactly what documentation's being discussed; bilevel's
> great for documents that are still pretty common but if it's something
> rare then it doesn't do much on the preservation front :-(

Lossless bilevel encoding at a high enough resolution (generally at least
300 DPI) is suitable for text and line art, and can still be OCRed just
fine. If the document has very small text or very detailed line art,
higher than 300 DPI resolution may be required. I've seen few documents
with text or line art that wasn't reproduced well with a 400 DPI bilevel
scan.

If the document has continous-tone images such as photographs, lossless
bilevel encoding is usually NOT adequate. For archival purposes, such
images should ideally be saved in a lossless greyscale or color encoding,
though for working documents lossy compression such as JPEG is OK.

If there's one thing I can't stand, it is text and line art compressed
with JPEG. Text and line art have sharp edges, which is high-frequency
information. If you apply JPEG compression, the high frequencies are
discarded, making the text and line art blurry. People who advocate
this usually say something like "but I set the 'quality' to 95%". That
*still* results in blurry text, yet larger file sizes than lossless
bilevel compression.

Eric
Received on Fri Jan 28 2005 - 18:20:38 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:45 BST