Any problem with Document Archive in Bitsavers.org ?

From: Paul Koning <pkoning_at_equallogic.com>
Date: Tue Jun 29 13:30:08 2004

>>>>> "Patrick" == Patrick Finnegan <pat_at_computer-refuge.org> writes:

 Patrick> On Tuesday 29 June 2004 12:16, Paul Williams wrote:
>> Al Kossow wrote: > As soon as bitsavers came on line again, google
>> crawlers started > downloading EVERYTHING from multiple IP adrs.
>>
>> Put this in your robots.txt:
>>
>> User-agent: Googlebot Disallow: /*.pdf$

 Patrick> Grr. Don't do this. I really hate it when people disallow
 Patrick> google to index content. It always makes it harder to find
 Patrick> stuff. The only time I'd consider doing it is if the
 Patrick> "webserver" is on a dialup connection or something that
 Patrick> won't stay at the same IP address.

There's a second reason, which applies here -- there IS no content
that Google can index, because those files are all bitmap -- no text.

If there were text files or OCRed PDF files, that would be different,
but as it is, Google will find absolutely nothing. So Al might as
well tell it not to try.

     paul
Received on Tue Jun 29 2004 - 13:30:08 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:01 BST