manuals in pdf (resolution, compression)

From: Jules Richardson <julesrichardsonuk_at_yahoo.co.uk>
Date: Mon Jun 28 11:43:24 2004

On Mon, 2004-06-28 at 14:33, Paul Koning wrote:
> The trick is to look for a threshold setting for the black vs. white
> threshold that results in minimal pixels on the page, but not so high
> that the letters lose their shape. This is a compromise -- the edge
> of a printed letter is not really sharp in a scan, so as you raise the
> threshold some of the outer pixels change from black to white -- your
> letter gets "thinner". If you can't get both a clean page and an
> acceptable letter shape, then the source material isn't good enough to
> support bitonal scanning. If so, you'll need a grayscale scan and
> you'll have to put up with the larger file sizes that result.

Personally I'd make sure I had copies of the *original* non-processed
scans archived, though. It's really easy to lose quality when messing
around with threshold settings for 1bpp scans or tweaking
brightness/contrast for greyscale scans. Typically diagrams tend to
suffer more than text, and it's very hard to "proof read" those after
processing to make sure they're spot-on - it's all to easy to miss
something.

Remember that on old documents the original page quality and contrast
can vary quite a bit, plus some pages may have aged differently to
others - so it's not like you can pick a setting that works for one page
and apply it to all.

Personally I'd only want to be doing the scanning once as it's such a
time-consuming job (and OCRing is even worse!). Plus (as I'm sure is the
case with others on the list) I have some original documents where the
number of surviving copies worldwide is probably in single figures. I
wouldn't want a scan to exist where information is missing, and the
original source document is impossible to track down. For more common
documents it's less of an issue - but still a pain to re-scan anything
(and it means there's a fixed and a broken version then floating around
out there too!)

I just do everything as greyscale, save and archive the scans with no
processing whatsoever, and only tweak
colour/brightness/contrast/threshold/whatever settings just prior to
feeding into OCR software. Storage really isn't an issue these days (I
keep data on both tape and hard disk)

cheers,

Jules
Received on Mon Jun 28 2004 - 11:43:24 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:01 BST