20,046 page doc archive still available
 
>>>>> "ed" == ed sharpe <esharpe_at_uswest.net> writes:
 ed> Paul, who made the engine for the plugin iris or? 
I have no idea.  I just grabbed it when I saw it and put it to work.
The biggest issue is that cleanup is a real pain.  Acrobat lets you
"edit" the text, but only line by line.  Yuck.  And the OCR does a
fair job of picking up changes in font, but it does get it wrong some
of the time, so you can end up with a fairly ugly mix of Courier and
Times.  If the goal is to make searchable text and a smaller file,
that isn't a big deal.  If the goal is to make a clean document,
that's different.
I did two projects: a 400 or so page A-10 flight manual with goal #1,
and the Ethernet standard, 90 or so pages, goal #2.  That second one
was quite a lot of work.  It's arguable whether it was worth the
trouble.  Unfortunately, I did not find Al Kossow's archive until
after I was finished...
      paul
 ----- Original Message ----- From: "Paul Koning" <pkoning_at_equallogic.com> To:
 >> I have had good success with Adobe's OCR plugin for Acrobat --
 >> free for the download with a 50 page at a time limit.  (It will do
 >> bigger docs, in 50 page pieces.)  It worked well enough to produce
 >> useful output from a manual full of pictures (a flight manual).
Received on Thu Sep 09 2004 - 09:54:56 BST
This archive was generated by hypermail 2.3.0
: Fri Oct 10 2014 - 23:37:28 BST