Internet Archive (Wayback) and robots.txt (was: Re: Liberty Freedom One Plus)

From: Philip Pemberton <philpem_at_dsl.pipex.com>
Date: Tue Aug 31 03:05:43 2004

In message <Pine.LNX.4.33.0408301711540.21716-100000_at_siconic.com>
          Vintage Computer Festival <vcf_at_siconic.com> wrote:

> I share the sentiment but why the hell did archive.org nuke the previous
> web pages? That makes *no* sense. You might want to contact them and
> bring it to their attention. It might've been a glitch. If it's an
> actual practice then it needs to be adjusted.

Okay then. Go to <http://web.archive.org/*/www.leus.com> and pick the October
1st, 2000 archive entry. You'll get a "This site has moved to
www.libertyus.com" page. Now feed the URL "www.libertyus.com" to Wayback
(which should dump you at <http://web.archive.org/*/www.libertyus.com>". Not
the message - "Access to this site has been blocked by the site owner via
robots.txt."
Now open the ROBOTS.TXT Wayback has and <http://www.libertyus.com/robots.txt>
and compare them...

The juicy bit is at <http://www.archive.org/exclude.php>... Looks like
they've got Wayback rigged to destroy any historical data if the current site
owner decides to put up a robots.txt file that blocks IA/Wayback. I consider
this to be severely bugged, but %DEITY knows if they do. In any case, Liberty
Electronics seem to be no more and no-one seems to have taken a mirror of
their website either. And manuals for the terminals seem to be like hen's
teeth :-/

I've contacted the Internet Archive regarding this - %DEITY knows if I'll get
a reply.

Later.
-- 
Phil.                              | Acorn Risc PC600 Mk3, SA202, 64MB, 6GB,
philpem_at_dsl.pipex.com              | ViewFinder, 10BaseT Ethernet, 2-slice,
http://www.philpem.dsl.pipex.com/  | 48xCD, ARCINv6c IDE, SCSI
... Can I stop typing in taglines now please?
Received on Tue Aug 31 2004 - 03:05:43 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:36 BST