The debate on what per say is a mini... from Mike Kenzie on 2000-12-16 (2000-December)

From: Mike Kenzie <KenzieM_at_sympatico.ca>
Date: Sat Dec 16 15:50:10 2000

----- Original Message -----
From: "Pete Turnbull" <pete_at_dunnington.u-net.com>
To: <classiccmp_at_classiccmp.org>
Sent: Friday, December 15, 2000 8:27 PM
Subject: Re: The debate on what per say is a mini...

> To solve the problem properly (after adding a second PSU
and moving all the
> disks to separate housings) it was decided that we'd buy a
replacement
> system, from another Computer Company of high repute (this
one with six
> letters in its name). The new, overkill, spec was for
dual-processor, two
> banks of ECC RAM, triple redundant hot-swappable power
supplies,
> hot-swappable RAID disks, two network interfaces, and a
UPS. And just for
> good measure, we wanted a pair of these, linked directly
by a crossover
> cable on the second network interfaces, and running some
smart software
> that allowed one to mirror the other. In theory, if the
"live" server
> failed, the other would adopt its IP address and take
over.
>
> In theory, theory and practice are the same. In practice,
they are
> different.
>
> In practice, our network turns out not to like duplicate
IP addresses, that
> is, two devices with different MAC addresses but using the
same IP address
> -- and the second machine was not always perfectly silent.
In practice,
> the backup server was always a bit too enthusiastic. The
live server would
> see a glitch on the RAID disks and report it, and the
backup would try to
> take over. But the live one wouldn't let go, and they'd
fight. Almost
> daily, partly because the RAID system was perfectly
capable of correcting
> errors much of time, but its controller was perfectly
capable of generating
> them as well.
>
> In the end, we found it better to switch one off. The
live one fails only
> occasionally, usually when doing an overnight backup. And
we have a heavy
> box to prop open the machine room door. Or run VMware
from time to time.
>
> Moral: there is such a thing as overkill, and such a thing
as
> over-engineering.

Saw a presentation on this last week from Red Hat. Their
piranha and STONITH protocol deal with this. We set up a
demonstration with 3 bad-end web servers and 2-front end
machines. We then alternately unplugged various machines
and all the while kept serving the web pages.

The second machine when it detected trouble with the primary
would power off the first machine (Shoot The Other Node In
The Head). It was quit impressive but had me thinking of
VMS and MVS and wishing everyone would talk to each other
instead of reinventing the wheel.
Received on Sat Dec 16 2000 - 15:50:10 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:32:50 BST