Many things

From: der Mouse <mouse_at_Rodents.Montreal.QC.CA>
Date: Mon Jan 31 12:30:31 2005

> Perhaps you could define how hardware handles a timing problem. It
> would be interesting to hardware fellows like yourself, just as my
> description of the software timing problem would interest software
> fellows like myself (maybe).

Well, I'm not the person you're responding to, but I was somewhat
involved when a friend of mine found a timing problem in hardware.
What's more, it was on Classic hardware. :-)

This fellow was doing robot control work. A MicroVAX-II was the main
host, but it couldn't handle both the control loop (running at
something in the 1KHz-10KHz range) and the main OS. So we got a KA620
and slapped it in as a second CPU. (The KA620 is a KA630 deliberately
mutated enough that VMS won't run on it - P0 and P1 page tables live in
physcal space rather than kernel virtual space. DEC at the time made
it quite hard to get a KA630 that didn't have a machine around it; the
KA620 was their answer for people who wanted multi-CPU machines.)

The inter-processor interrupt mechanism on the KA6[23]0 takes the form
of a register in Qbus device space that any bus master (such as another
CPU) can prod; when prodded in a particular way, it produces a
distinctive "doorbell" interrupt.

But every now and then it wasn't. Wasn't producing an interrupt, that
is. He finally wrote some _really_ simple test programs - we're
talking tens of instructions - and left them running overnight. Sure
enough, some small fraction (ca. 0.1%, if memory serves) of doorbell
interrupts simply got lost.

We reported this to DEC, together with the test programs (which were
simple enough to hand-assemble and type in on the machine's consoles)
and they eventually found the bug.

Apparently some relevant signal's etch run ran clear across the board.
The design output impedance of the driver and the capacitance between
the etch run and the rest of the world combined such that it should
_never_ have worked, and the only reason it worked most of the time was
that they overdesigned their hardware by something approaching an order
of magnitude. They came up with an ECO/FCO for it, which I think
involved decreasing a pullup resistor, and as far as I know, as long as
their service organization for uVAXIIs existed, you could get it if you
knew to ask for that change order by number.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
 X Against HTML
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Received on Mon Jan 31 2005 - 12:30:31 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST