Jeff Hellige wrote:
> >Re: OCR -- Trouble I've had is (and this is just pickiness, if the
> >actual info's all you care about then it's no prob) you invariably
> >lose the font and other aspects of the original appearance of the
> >document, which is a bummer. I converted a PDF of Sun Remarketing's
> >Lisa DIY guide into HTML with images because I wanted search engines
> >to be able to index the content.
>
> The ability to search the PDF would be nice, but I think the
> amount of work required to do the OCR and then do all the formatting
> and such would outweigh that benifit, though the OCR'd PDF's tend to
> be smaller as well. I'd prefer to keep the original layout, fonts
> and all, though.
What comes as relief -- if you have that software -- you can have
PDF with two layers: a searchable OCRed layer and a viewable pixel
layer. You view the pixels and search the OCRed text. I have been
told it works nicely.
-Gunther
--
Gunther Schadow, M.D., Ph.D. gschadow_at_regenstrief.org
Medical Information Scientist Regenstrief Institute for Health Care
Adjunct Assistant Professor Indiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org
Received on Wed Jun 13 2001 - 21:19:57 BST