Many things

From: Jim Leonard <>
Date: Mon Jan 31 06:22:13 2005

Eric Smith wrote:
>>That is one of the MISuses of PDF. PDF should not be used as a container
>>for bitmap images.
> Why? What better open-standard file format can store a lot of pages using
> lossless bilevel compression? PDF can store the original bitmaps (well-
> compresses) together with the OCR results, so that you can have
> mostly-searchable files that still look like the original doument. (As
> opposed to typical OCR files that are completely screwed up and lose
> information.)
> And PDF can support a mix of bilevel and greyscale or monochrome in
> the same document, or even on the same page.

I maintain that PDF should not be used merely as a container for existing
graphics files because there is normally no easy free way to extract the image
data and use it in another program. I know that it *can* do it, but the
majority of users who do this screw it up massively (I'm thinking 150 DPI JPGs
of scanned text).

>>In case it wasn't obvious, PDF *is* Postscript! It's *portable*
> Speaking as someone who has written software to read and write both
> Postscrpt and PDF, I can tell you in no uncertain terms that PDF is
> NOT Postscript. PDF happens to use a subset of the Postscript
> imaging model, and has superficially similar syntax in some areas,
> but that's about as close as they get.

I am familiar with the internals of PDF as well, which is why I wrote portable.
  Portable does not imply complete. Perhaps I shouldn't have used the "*is*"

> Since PDF can do the same things, there seems to be little advantage
> to using DjVu instead.

DjVu has other advantages, such as local/window/viewport decoding of images
with ludicrously high dimensions/resolutions but I understand your point.

Where are the tools to create DjVu-like PDF files? The best Acrobat can do is
OCR text but still leave the source bitmap in place... If I scan in a page
with a background color image with B&W text foreground, where are the PDF tools
to properly handle layer seperation? (Not CMYK seperation, you know what I
mean :-)
