DEC field service (was Re: HP's not cool? (was Re: pdp-11/60 Semi-Rescue))

From: Bill Pechter <pechter_at_pechter.dyndns.org>
Date: Mon Nov 22 17:04:39 1999

Tony --
I'm starting to sound like an old fart here. Please shoot me before I
call the DEC Field Service days the good old days.

<BANG> Oops folks... that was just my wife with a .38. Before I
end up more than a quart low on B Positive...

> > Of course, repair by module swapping is most effective if you have some
> > idea of which modules it makes sense to swap, but that would require
> > expen$ive training.

DEC invested a ton on almost unlimited training if you worked hard and
had a decent manager. I took a ton of courses because I volunteered to
take any that the married folks with lived couldn't make due to
personal committments or other work issues.

>
> Module swapping makes sense in one particular case : When the
> tools/equipment to repair the old module are not availabe in the field,
> or when the module would require complex realignment after the repair. In
> that case it makes sense to swap a _known faulty_ module for a _known
> good_ one and then repair the old one later.

Of course, when you moved PDP's and Vaxes from the lab to commercial
environments when downtime is REAL MONEY the amount of time given
to troubleshoot a problem is determined by the customer who controls
field service access to the box.

We had sites that a whole controlled plant with a pair of 11/34s and the only
maintenance you could do was Christmas or Easter and downtime was
impossible to schedule until both machines flamed up.

Often small sites would run intermittant machines until 5 minutes before their
service contract would run out for the day (usually either 5 pm or
pm or midnight) and then log the DECservice call which meant DEC had to
do continous around the clock maintenance until the machine was up.

Boy the number of 4:55pm calls on Friday used to really %R$^& me off
including the one a printer at Roche medicalwhere the customer shimmed the
ribbon path with test tube lids and left scalpul blades in the LP25 when I
got sent out to work it.

The printer was obviously having problems for at least a day or so
before they finished their medical test runs -- but they waited to call
until it was completely unreadable.


>
> What _never_ makes sense (IMHO) is module-swapping as a diagnostic
> techinque -- that's to say replacing modules in the order specified by
> the service manual until the fault goes away. We've had this arguement
> before, and you'll never convince me otherwise. There are 2 main problems
> with doing this :
>
> 1) A fault elsewhere in the machine may be damaging the module. Unless
> you find _this_ fault, the new module will die soon as well. Seen it happen

True. But this one usually damages the module (on digital, not analog
signals) pretty quickly -- usually before the diag run finishes.

I rarely saw the board slowly fail due to another board.
Got any examples on DEC hardware where this happened. I'd like to hear
of it.

This is different with analog and video stuff or power supplies.

>
> 2) A marginal voltage/timing problem may confuse things. Supposing there
> are 2 modules A and B, and some signals between them. Module A develops a
> fault, so one signal is out of spec. The system falls over. The service
> manual tells you to replace B first. You do, and by chance get one that
> can accept a wider margin on this signal. The system comes up again. But
> A continues to drift, and eventually the machine falls over again.
>

Absolutely, but again this is less likely in digital stuff than in analog
stuff like read amps, video, power amps, servo signals etc.

The CPU and communications stuff usually works correctly or it doesn't.
(There are occasional wierd problems like the DMR unit that ran diags
and DECnet but had a specific pattern sensitivity in the line unit that
would go intermittant and drop out on large file transfers... but the
thing that proved that problem WAS A BOARD SWAP.)


> So you need to do tests -- proper tests -- to determine exactly which
> modules are giving problems. And by the time you've done said tests
> you've probably got enough information to know what's actually wrong with
> the module, so you might as well solder in a new chip or whatever.
>

What pissed me off no end was the customer (Naval Air Propulsion,
Trenton) who called me to ask me to change U41 on the TE16 logic and
write board because his tape drive wouldn't go on line.

I called this engineer back and told him I'd be in in about 3 weeks when
logistics got me the chip. Or I could come down and swap the LAW board
(was it an 8916 or 8912-- my memory seems to point to that number) in
fifteen minutes and have him running again.

I did scope the pin out and he was correct... but does it really matter
that much if the online pulse doesn't get from the switch the one
board that controlled all the circuitry on the door. It's obviously the
one board if the switch contacts open and short ok.

As far as tests go... DEC and IBM seem to have the best set of diagnostics
out and the most redundant data paths to allow you to call the module
out quickly. I really was amazed when I saw Perkin Elmer (Concurrent)
3200 series stuff and realized DEC put so much more into diagnostic
engineering and mantainability. I guess AT&T and Lucent do similar
stuff since I've seen a little of the AT&T specs for DEC stuff used by
AT&T.

Most of the minicomputer companies seemed to skimp on the diags
compared to DEC. Their remote diag console in the 11/70 really amazed
me for a mini.

Geez. Engineers just want to bully the poor old technicians because
they can. 8-)



>
> >
>
> -tony
>
>
>


Bill

---
  bpechter_at_shell.monmouth.com|pechter_at_pechter.dyndns.org
      Three things never anger: First, the one who runs your DEC,
      The one who does Field Service and the one who signs your check.
Received on Mon Nov 22 1999 - 17:04:39 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:32:30 BST