Scanning old manuals

From: Stephen Dauphin <>
Date: Tue Mar 9 13:05:40 1999

On Tue, 9 Mar 1999, Pete Turnbull wrote:

> That's not what I'd call "high". That means that on average, you have to
> correct or interpret every tenth character. I'd call less than 99% "low",
> not high. Our Department looked at this a few years ago, and rejected
> anything less than 95%, I think. Even that means correcting (or as one
> person put it, "clicking on") one character in every twenty.

That's not what I meant. I did not study the results closely and so I
wrote "high 90%" as a disclaimer to mean something like 98, 98.5, 99,
99.5, or 99.9. Perhaps I should have used the word "range". It seemed to
me that I was getting less than 1 to no more than two words per hundred
that needed correcting and I don't remember any punctuation or numerical

William Donzelli wrote:

> The best solution for this is to keep the scans AND the OCR'd text. That
> way, with a simple database, one could do searches on the text, and get
> most of the hits, yet actually read the images.

A good observation, which brings up the question whether anyone has
database templates and what database are they using. How does one deal
with separate text like sidebars and captions? Should you save an image
of the page and individual images in the database along with text? This
rules out mych legacy db software. Perhaps keep individual files and
database the directory?

Anyone using document management software? There seems to be 3 or 4 low
priced ones for windows, a couple for the Mac, maybe something for
another platform (anything for Linux?) and everything else is
stratospherically priced.

Chuck McManis wrote:

> 300 DPI B&W is good for most printed manuals _without_ graphics because it
> is a 1:1 ratio with what most printers can print. 200 DPI gives you a 2:3
> ratio of real pixels to printer pixels and I've seen that introduce banding
> on the printed output.

300 dpi is ideal for print. Is that the best ultimate goal - scan at
some multiple of 300 (over or under) in order to optimize for eventual

                                                  -- Stephen Dauphin
Received on Tue Mar 09 1999 - 13:05:40 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:32:19 BST