Many things from Jerome H. Fine on 2005-01-31 (2005-January)

From: Jerome H. Fine <jhfinexgs2_at_compsys.to>
Date: Mon Jan 31 09:28:59 2005

>Tony Duell wrote:

>No, I stand by what I said. If there's a bug, then that software is not
>correctly-working (even if it seems to be).
>
>The analogy for hardware would be a marginal timing problem (wher,
>peerhaps, every 10 years 2 signals arrive effetively in the wrong order).
>That's a mis-design, it's not working correctly.
>
>But correnctly-working software carries on working. Hardware on the other
>hand can stop workin if a component fails.
>
Jerome Fine replies:

The problem with design bugs is that they can sometimes
exist in both hardware and software (usually complex)
for decades before the bug is identified. In the case
of the software bug that I see in RT-11, it was found
by inspecting the code, NOT by the bug causing a failure.

PLUS, the difficulty with marginal timing problems is that
they occur so infrequently AND they are almost impossible
to duplicate. BOTH hardware and software, as far as I
understand - although you would have the knowledge about
being able to duplicate a marginal timing problem in
hardware. BUT, when there is a marginal timing problem
in software, that is often described as a race condition.

The code was written (almost always NOT intentional due to
a lack of understanding that the race condition exists -
which is why it really is a bug in the first place) with
the assumption that between instruction (a) and (b) - usually
with relatively few instructions in-between - no other code
would ever execute that relies upon using the results of the
instructions at both (a) and (b) being executed at exactly
the same time. The software solution is to reduce the
number of instructions between (a) and (b) as much as possible
(often having no instructions in-between) and then locking out
interrupts before (a) and unlocking after (b) thereby ensuring
that no other code which assumes that (a) and (b) were executed
at the same time can be executed in-between (a) and (b).

Perhaps you could define how hardware handles a timing problem.
It would be interesting to hardware fellows like yourself, just
as my description of the software timing problem would interest
software fellows like myself (maybe).

Sincerely yours,

Jerome Fine

--
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.

Received on Mon Jan 31 2005 - 09:28:59 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST