Manuals to PDF advice sought from Gunther Schadow on 2001-06-13 (2001-June)

From: Gunther Schadow <gunther_at_aurora.regenstrief.org>
Date: Wed Jun 13 21:19:57 2001

Jeff Hellige wrote:

> >Re: OCR -- Trouble I've had is (and this is just pickiness, if the
> >actual info's all you care about then it's no prob) you invariably
> >lose the font and other aspects of the original appearance of the
> >document, which is a bummer. I converted a PDF of Sun Remarketing's
> >Lisa DIY guide into HTML with images because I wanted search engines
> >to be able to index the content.
>
> The ability to search the PDF would be nice, but I think the
> amount of work required to do the OCR and then do all the formatting
> and such would outweigh that benifit, though the OCR'd PDF's tend to
> be smaller as well. I'd prefer to keep the original layout, fonts
> and all, though.

What comes as relief -- if you have that software -- you can have
PDF with two layers: a searchable OCRed layer and a viewable pixel
layer. You view the pixels and search the OCRed text. I have been
told it works nicely.

-Gunther

-- 
Gunther Schadow, M.D., Ph.D.                    gschadow_at_regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org

Received on Wed Jun 13 2001 - 21:19:57 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:33:58 BST