manuals in pdf (resolution, compression)

From: Jules Richardson <julesrichardsonuk_at_yahoo.co.uk>
Date: Sun Jun 27 15:45:53 2004

On Sun, 2004-06-27 at 19:40, Antonio Carlini wrote:
> (Well, xpdf on OpenVMS VAX is slow, but then I
> guess my expectations are at fault there :-))

It seems to be bloody horrible on modern versions of Linux, too :-(
(well, at least on Redhat 9) Slow as hell, plus the rendering quality is
pretty awful.

True that most systems have PDF viewers, but they're more likely to have
an image displayer and a text editor ;-)

> I believe (but have not tried) that you can go
> from PDF to text in this case without any great
> difficulty (I don't recall what happens to images).

What's the current licencing for PDF tools? I've pretty much avoided it
since the days when the reader was free, but anything (at least from
Adobe) which created or manipulated PDF files cost $$$

I believe the data format itself was copyrighted - but presumably isn't
these days what with all the 3rd-party viewers out there?

> > Actually, I suppose seperate images can help here too as people can
> > navigate straight away to what they want, plus they don't need to
> > download the whole of a huge pdf file before they can start reading.
>
> I prefer to grab the whole thing anyway. Today I might
> just want the frobozz pinout, but tomorrow I'm almost
> certain to need the lead engineer's middle initial,
> by which time I'll have forgotten where I found the
> docs in the first place.

Oh sure, me too. I make use of wget an awful lot to create local copies
of useful bits of websites for instance, but if I'm looking for
something then and there then it's nice to be able to at least look at
the navigation up-front (particularly to see if the whole thing's
actually relevant anyway!) and quickly start reading the most-useful
bits whilst downloading the whole lot as a background job.

> > problem is that
> > you need to be *really* sure that your OCR versions are good
> > before you
> > can risk taking the raw scans offline, which means having a lot of
>
> Once I've generated a raw scan (or picked up someone elses)
> I expect to keep it around essentially forever. OCR has improved
> immensely in the last few years, but not to the point where
> I can throw a scan of a poor quality photocopy at it and expect
> something that looks like the original with zero errors.
> (The Module/Options list that Eric Smith scanned would be
> an excellent torture test for any candidate "perfect" OCR program).
>
> Another point is that if you have high quality scans, why
> keep them to yourself? By all means have low-res versions
> available for those who just need a page or two or just
> need to look something up quickly and don't care about
> the artefacts, but make the "masters" available too. If you
> don't have the space yourself, there are people on this list
> who seem to have no problem with online disk space.

Fair point. I'd never completely delete high-quality scans - but as you
say, there are quite a few people around who seem to be set up for
hosting huge amounts of data!

Hmm, how editable are PDF files by the way? On the OCR front, I'd expect
anyone OCRing anything to proofread it afterwards and correct mistakes
(which is of course vital for technical data anyway - technical data
with mistakes in is useless!). So unless wordprocessor-like tools exist
to edit PDF files then I wouldn't think they're much good as an
intermediate format, because people need to be able to go in there and
easily correct mistakes made by the OCR software.

cheers

Jules
Received on Sun Jun 27 2004 - 15:45:53 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:01 BST