manuals in pdf (resolution, compression)

From: Antonio Carlini <a.carlini_at_ntlworld.com>
Date: Sun Jun 27 14:40:11 2004

> I tend to do that too. I'd much rather use my favourite image
> viewer to
> flick through images than deal with a pdf file, plus I can just put
> everything in a tar or zip archive if I do need to distribute as a
> single file.

I have to admit that I prefer pdf, just because
all the platforms I use have a decent pdf reader.
(Well, xpdf on OpenVMS VAX is slow, but then I
guess my expectations are at fault there :-))

> I never thought I'd say it, but maybe wrapping the data up in *simple*
> HTML markup is the best way - at least then it is readable in a
> plain-text editor, and finding a machine with a web browser
> is probably
> easier than finding a machine with Word installed.

If you've OCRed the data, HTML is probably fine for
pure text. Once you start to have text + images then
you have a bunch of files to keep tied together. You
could zip them up but PDF works well for me here too.

I believe (but have not tried) that you can go
from PDF to text in this case without any great
difficulty (I don't recall what happens to images).

> Actually, I suppose seperate images can help here too as people can
> navigate straight away to what they want, plus they don't need to
> download the whole of a huge pdf file before they can start reading.

I prefer to grab the whole thing anyway. Today I might
just want the frobozz pinout, but tomorrow I'm almost
certain to need the lead engineer's middle initial,
by which time I'll have forgotten where I found the
docs in the first place.

> > I know a lot of you expressed concerns about JPEGs, but I
> haven't been
> > able to get anywhere near the compression using other methods, for
> > greyscale images. Am I overlooking any options?
>
> Probably not. JPEG is lossy after all, so I expect it'll always do
> better than a non-lossy format. It's a tradeoff between size
> and quality

As has been pointed out (quite a few times on this list, I think),
JPEG is very poor for text and line drawings. JPEG is better
than nothing, but the archived version should be in something
more appropriate (and lossless).

> problem is that
> you need to be *really* sure that your OCR versions are good
> before you
> can risk taking the raw scans offline, which means having a lot of

Once I've generated a raw scan (or picked up someone elses)
I expect to keep it around essentially forever. OCR has improved
immensely in the last few years, but not to the point where
I can throw a scan of a poor quality photocopy at it and expect
something that looks like the original with zero errors.
(The Module/Options list that Eric Smith scanned would be
an excellent torture test for any candidate "perfect" OCR program).

Another point is that if you have high quality scans, why
keep them to yourself? By all means have low-res versions
available for those who just need a page or two or just
need to look something up quickly and don't care about
the artefacts, but make the "masters" available too. If you
don't have the space yourself, there are people on this list
who seem to have no problem with online disk space.

Antonio

-- 
---------------
Antonio Carlini arcarlini_at_iee.org
Received on Sun Jun 27 2004 - 14:40:11 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:01 BST