Parity (was Re: stepping machanism of Apple Disk ][ drive)

From: Richard Erlacher <edick_at_idcomm.com>
Date: Fri Apr 9 09:40:51 1999

Yes, increasing the amount of memory by 1/8 increases the likelihood of
failure by 1/8. The inclination to bury one's head and ignore the potential
for memory or bus failure comes from the competition for price advantage on
the personal computer market, though. The argument I've heard is "if MAC'c
can live with it, so can PC's" which may not be true, but appears to be true
enough for the typical user.

Once such a memory failure is detected, there's nothing you can do about it
except endeavor NOT to save the data which may be corrupted and become aware
of the problem. I see memory parity errors (most of the PC's here use
parity or single bit error correction)
about twice a year. Normally it's when a new box is being brought up and
memories aren't seated right or something on that order. I don't know what
that says about the memory systems of today.

It's been a few years, but I always preferred single-bit correction over
parity in sizeable memory arrays. I designed one fairly large buffer memory
for Honeywell, which had 72-bit-wide memory, 64 MB deep, which was quite a
bit for that time (1991) with single bit correction only to have the manager
tell me it was not needed. "Whom are we helping with this added expense?"
was his position. I pointed out that it would make memory problems a depot
or even field repair whereas it would be a return-to-factory otherwise. He
insisted, though. The software lead and I agreed we'd base our memory check
on parity, which still allowed for isolation of the faulty SIMM. Since this
was not a main system memory but just a data buffer, it didn't matter that
it ws defective and firmware could rigorously isolate the faulty device.

I'm not sure what you're saying about the relative value of the extra bit of
memory versus the risk of promulgating a transient error into infinity by
recording it as though it were correct, Ethan. You seem to suggest that it
would have been better not to have had the 60-cent memory part in place
rather than to find and repair it once its failure was detected by parity
circuitry. I doubt you believe that, however. It is true that the addition
of parity circuitry means that there is an elevated likelihood of failure
proportinal to the increased memory size. It is also true that parity
checking circuitry requires time to work, and can, itself, fail as well.
Increased circuit complexity does increase the statistical probability of
failure. ECC circuitry doesn't decrease the probability of memory failure.
It does decrease the amount of down-time resulting from it, and it avoids
the data loss and down-time associated with single-bit transient failures,
which are more common than hard failures.

I guess it's like automobile insurance. If you have assets you need to
protect, you buy it. If you haven't you don't. My assessment is that Apple
started with the assumption that you don't.

Dick


-----Original Message-----
From: Ethan Dicks <ethan_dicks_at_yahoo.com>
To: Discussion re-collecting of classic computers
<classiccmp_at_u.washington.edu>
Date: Friday, April 09, 1999 6:10 AM
Subject: Parity (was Re: stepping machanism of Apple Disk ][ drive)


>
>
>> On Thu, 8 Apr 1999, Richard Erlacher wrote:
>>
>> > My contempt for Apple begins and ends with their total disregard for
the
>> > value of your data.
>> > They designed the MAC with no memory parity assuming that you'd not
mind
>> > if your data was corrupted without your knowledge...
>
>Multiple studies of memory reliability (DRAM) show that parity memory is
>more prone to failure than non-parity memory. If you want reliability, you
>have to go to something like Error Correcting Codes (ECC) like the big boys
>use. We had 39-bit memory on a 32-bit VAX (11/750) because the extra seven
>bits let you *detect* two faulty bits and *correct* a single bit failure.
>The Sun Enterprise servers I babysit have ECC memory - we used to get one
or
>two failures in the machine room per year, but they were logged and
corrected
>without any loss of data. My Alpha board (AXP-133 "no-name" board) uses
72-pin
>*parity* SIMMs in pairs to implement ECC on a 64-bit memory bit.
>
>The problem with parity is that yes, you do know that you had a failure,
but
>now you have 9 bits that might fail, not 8, raising your risk by 12%. DRAM
>failures are more often total rather than intermittent. A memory test at
>power-up is a better insurance policy than relying on parity to save your
butt.
>
>I did have the parity circuit on a PeeCee cough up a lung once... it was
even
>a five-slot original PC (256K on M.B.). We were using it into the 90's
because
>it was merely the terminal for a Northwest Instruments logic/CPU analyzer
that
>we used to check for problems in our MC68000-based serial boards. One day,
the
>PC would not come up. Because everything was socketed and because I owned
an
>IC tester, we got a bottom-of-the-totem-pole tech grunt to pull each chip
and
>test it. It was a faulty 4164. Labor costs: $25. Parts cost: $0.60 for a
>part
>we stocked thousands of for one of our older products. I still have the
>machine. It still works. I wish I had the invoice for that CPU; the
company
>bought it new in 1981, around $5K, I know, but I'd like to know the exact
>figure.
>
>Bottom line: Apple not using parity is not a reason to trash the Mac. How
>many PCs have parity since we moved to EDO and SDRAM? It's extra cost and
>extra complexity and extra possibilities for failure. Unless you can
correct
>the failure, it's not mathematically worth the extra expense and reduced
>reliability.
>
>-ethan
>
>_________________________________________________________
>Do You Yahoo!?
>Get your free _at_yahoo.com address at http://mail.yahoo.com
>
Received on Fri Apr 09 1999 - 09:40:51 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:31:40 BST