RT-11 Crashes During UNLOAD

From: Jerome Fine <jhfine_at_idirect.com>
Date: Thu Dec 31 23:47:35 1998

Before I go into the detail, let me first say that this bug likely
is as old as V5.5 (I only have looked at the V5.6 code
on a friend's system) which is when extended device drivers
were first introduced. So, we can say that the bug is about
10 years old and qualifies for this list since V5.5 of RT-11
was released in 1989.

The bug in RT-11 occurs when RT-11 has had a SYSGEN to allow
extended device drivers and a device driver which is extended
(allows more than 8 devices - in my case, I was using a DU:
MSCP device driver which allowed 64 partitions). In a test
case, the following sequence produces a crash:
SRUN KEX.SAV/LEVEL:n/NAME:KEXn for n = 1 => 6
after each do a CTRL/B to return to the background
Then do:
ABORT KEXn for n = 1 => 6
UNLOAD KEXn for n = 1 => 6
I tried this under RT11XM on V5.6 with an RL02 being
the boot device and DU: being auto-installed and not
referenced the whole time. In the actual situation,
DU: was being referenced on a magneto optical disk
drive to obtain specified files, but the DU: device
driver was not LOADed since there was insufficient
space left in low memory. So, although there are
probably a number of work arounds, the real problem
is that UNLOAD has a bug and does not work correctly
in these circumstances.

UNLOAD in KMOVLY has a bug, as far as I can understand.
In fact, possibly more than one. But, for now, just one that I can
handle. It would seem, in addition, that co-ordination between
different portions of the monitor may not have been done very
successfully since the USR does have the correct code to handle
the situation (a "Beq" instruction) while UNLOAD seems to
want to ignore the problem. Of course, if the USR had not
handled the problem correctly, there are likely going to be many
more occasions when the bug would have occurred, so in the
USR it was caught and the code is correct.

The exact description of the bug and owner tables may not be
correct. If so, please refer to the SSM. But the essential nature
of the bug is, I believe, correctly described.

THE PROBLEM, from what I can understand, is that in UNLOAD,
a system job MAY "own" a specific device (done via a LOAD
command). There is a two word owner table entry for each device
which has 4 bits allowed for each of the possible 8 drives normally
associated (DU0: => DU7:) with each device driver. When a
SYSGEN is done which includes extended device drivers, that
owner table entry of two words is too small and is used to "point"
to a 16 word (maximum size) table within the device driver ONLY
when the device driver is LOADed (I presume that a .FETCH may
also allocate the same 16 words, but they would not be used). SO
IF THE DEVICE DRIVER IS INSTALLed, BUT NOT LOADed,
the two word owner table entry can't "point" anywhere and the pointer
word is set at a default of zero. In the USR, when that word is picked
up, a "Beq" is used to detect that the device driver has not yet been
LOADed and no owner specific code is executed. BUT, UNLOAD
in KMOVLY does not have that instruction ("Beq") and merrily goes
and assumes that the extended device driver owner table entry (in the
case of an extended device driver SYSGENed system PLUS an
actual extended device driver such as DU:) is at the location starting
at zero. In the process of disconnecting the system job from a device
driver, the first 16 words in low memory are assumed to contain the
owner table entry AND the 8 vectors there (00 => 34 - which
obviously includes the EMT vector) can be "MODIFIED". Which
results in crashes in RT-11. If anyone is truly interested in this bug,
but does not understand what I have stated - the explanation I
have given is only a small portion of all the detail, please inquire
further.

Since this bug has not been encountered before - or if anyone has
but was unable to track it down - then likely the situation does not
occur very often. When DU: is resident, obviously DU: has been
LOADed. But I suspect that an extended LD: which is not LOADed
could also cause the same problem. The simple solution that I was
told about a number of years ago (when I had not yet been informed
that it was UNLOAD which was causing the problem) is to not
do the UNLOAD, but to instead do a BOOT which, of course,
does an UNLOAD of everything.

Sincerely yours,

Jerome Fine
Received on Thu Dec 31 1998 - 23:47:35 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:30:51 BST