Parity error recovery. Was: S/1 How Much?

From: Gary Oliver <go_at_ao.com>
Date: Sun Feb 8 21:46:29 1998

In fact, there are some useful things you can do to ACTUALLY recover
on computer running a reasonable operating system. Back in the late
60s and through 1981, Oregon State University ran an operating system
(called humbly OS-3 (short for Oregon State Open Shop Operating System
or OSOSOS)) but I digress.

OS-3 ran on a Control Data 3300 which had been fixed to actually
conform to the Control Data specifications for User/Supervisor
operation. The operating system was written to be as reentrant as
possible; all OS code (but a very small part pertaining to interrupt
dispatch) was PURE (not self modifying.) This is noted, since the
standard subroutine call (as was common those days) altered the
first instruction of the subroutine to be a JUMP back to the calling
code. I.e. there was no "stack" mechanism in hardware.

Since the monitor code was not self modifying, the OS could at boot
time compute an exclusive-or checksum for itself and save it for the
occurrence of a parity error. Parity errors only reported the memory
location of the error, not the actual value read. So when a parity
error occurred and if it was in the monitor reentrant section, the OS
could recompute the value of the bad location by exclusive-oring
all the OTHER locations together and then exclusive-oring the computed
monitor checksum. This reproduced the contents of the bad location,
it was stored back at the address indicated by the parity check
hardware and the OS resumed from the parity error interrupt.

Of course, for user programs (which were seldom re-entrant) the OS
just aborted the program with an error and went on about it's business.
Consequently, the parity error interrupt was RARELY fatal (at least to
the system at large. Individual users did occasionally complain when we
had a memory stack that started to go bad, but didn't get bad enough
to find with the overnight diagnostics.)

This was all back in a day when ECC did exist, but it was Horribly Expensive
(and this was compared to memory that cost about 1 to 2 dollars per byte.)

Gary.

At 05:24 PM 2/8/98 -0500, you wrote:
>> Tis' true if you meant ECC parity but this is really OVERKILL in
>> consumer machines that we're using. My machine is happily running
>> for years on non-parity as long as the memory are top quality kind
>> and cover by life-time warrenty if possible. Mac are doing that for
>> years ever since first Apple II all the way to today's Mac PCI's.
>
>Yes, I would not think of adding full ECC to a home computer. Doing so
>would probably add $25 to the cost of producing the machine - something
>the marketing types would scream about because it really adds nothing for
>them to sell!
>
>Anyway, it would have been nice if PeeCees were made so a parity error
>would tell the BIOS (or DOS) to try to clean up and do a gracefull
>shutdown, rather than just reporting the error and halting. Many parity
>errors are soft errors, only effecting one bit of the memory, so there is
>a chance that the programs (or DOS) could react and do a little damage
>control.
>
>William Donzelli
>william_at_ans.net
>
Received on Sun Feb 08 1998 - 21:46:29 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:30:52 BST