paper -> HTML (and The First PC)

From: Sam Ismail <dastar_at_ncal.verio.com>
Date: Wed Dec 30 03:15:10 1998

On Wed, 30 Dec 1998, Doug Yowza wrote:

> I struggled for a bit trying to convert paper to HTML, but found it an
> awkward task. I'm sure the state of the art has advanced beyond:
<...>
> I don't much like PDF for web docs, so an HTML solution would be best. It
> looks like the "pro" version of Xerox's OCR software might automate the
> task somewhat. Any recommendations?

I'm having the same trouble. I'm using TextBridge Classic (which is a
lame way of saying "pared-down version"). It has lots of trouble just
scanning the characters as it sees them on the page and putting them in a
text file. Instead, it tries to make some wacky sense out of columnar
text in a page and turns ASCII art graphics into garbage, throwing text
sections all over the place. It doesn't like the zeroes in my Apple
Pascal Operating System Manual. It inserts tabs where it should just put
spaces. I don't see any options anywhere to fix these problems.

I'm also having trouble getting a clean scan of a black text on white
type-written page. I'm doing it at 300dpi and the page comes out looking
like shit on the computer. There are all sorts of spots and blotches
everywhere that are clearly not on the original, and the glass of the
scanner is crystal clear. I don't see an easy way of cleaning the page up
with the software I'm using (PaintShop Pro 3.0, I just downloaded 4.12 but
am still getting used to it).

There was a discussion a short while ago on the techniques people were
using to successfully scan in docs. I wish I'd saved that.

The OCR is OK when the text is just normal, and does remarkably well. But
I need an OCR suite smarter than Xerox's TextBridge Classic. I also need
some good post-processing software, or at least need to know how to scan a
simple black & white document without the scanner introducing blotches and
crap. Any suggestions?

Sellam Alternate e-mail: dastar_at_siconic.com
------------------------------------------------------------------------------
Always being hassled by the man.

                  Coming in 1999: Vintage Computer Festival 3.0
                   See http://www.vintage.org/vcf for details!
                        [Last web site update: 12/07/98]
Received on Wed Dec 30 1998 - 03:15:10 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:30:51 BST