8086 (was Re: more talking to the press.) from Hans Franke on 2003-11-14 (2003-November)

From: Hans Franke <Hans.Franke_at_mch20.sbs.de>
Date: Fri Nov 14 15:49:52 2003

Am 14 Nov 2003 16:17 meinte Sean 'Captain Napalm' Conner:
> It was thus said that the Great Hans Franke once stated:
> > > > should I do so on my PC?
> > > /* Fill a block of memory with 0's. */
> > > void foofill (start, end)
> > > {
> > > char *start, *end; /* memory pointers */
> > >
> > > while (start < end) { /* until we reach the end... */
> > > *start++= 0; /* write 0, incr pointer... */
> > > }
> > > }

> Um, that should be:

> void foofill(char *start,char *end)
> {
> while (start < end)
> {
> *start++ = 0;
> }
> }

:)

> > BTW: above example is exactly one thing why I hate C.
> > basicly every compiler will generate a stupid loop, while
> > in assembly a REP STOSW would do the trick at maximum
> > speed possible.

> True, but if you stick with the ANSI C library call memset(), an ANSI C
> copmiler is free to replace the actual call to an inlined version, which on
> the 8086 would probably be a REP STOSW. In fact, about the only routines I
> still regularly use from ANSI C are the mem*() and str*() routines (the rest
> are ... eh).

Yes, I noticed that - and I made it part of my checkpoints before
accepting code from any programmer. Nonetheless it's just a work
around for a shortcomming in the basic structure of C. The C core
is too much models after a specific CPU, and simplified, so a lot
of additional stuff is needed to help the compiler again to
understand the meaning.

> Also, on modern x86 architectures, using REP STOS* may be *slower* than
> using the stupid loop (GCC generated four instructions for the inner loop,
> using the sequence that an assembly programmer would use if you can't use
> REP STOSW) but again, a compiler (like GCC) can be programmed with such
> optimization assumptions built in. I ran the following through GCC:

You're right, at least for some of the complex operations (enter,
leave jcxz, loop) the simple operations became equal or faster
than the complex.

I always admired the Memory interface used in the Siemens X-CPUs,
a family of /370 compatible machines. While the CPU itself was
doing a byteoperation, the memory interface was made in a way that
the microcode could squeeze out the last bit of speed by aligning
all operations as much as possible to the memory size (which was
16 or 32 Byte at that time).

> struct foo
> {
> int a;
> int b;
> };

> void fillfoo(struct foo *p)
> {
> memset(p,0,sizeof(struct foo));
> }

> Guess what? fillfoo() ended up being 4 instructions:
>
> movl 4(%esp), %eax
> movl $0, (%eax)
> movl $0, 4(%eax)
> ret

> When I changed struct foo to have an array inside, GCC used "rep stosl" to
> do the fill (since I'm using memset() GCC "knows" what I'm trying to do).

interesting

> So don't sell the compilers short unless you check the actual generated
> output (programs here were compiled using "gcc -c -O4 -S
> -fomit-frame-pointer")

Hmm. I know that it's in most real world cases useless to
invest the time and optimize beyond what a compiler already
does. It's just fun still beeing better than the machine.

In an optimal world, we would use languages that allow us to
formulate what we want to do, and have compilers optimize by
understandig that (and ADA comes to me as close as I know),
instead of formulating a problem with a synthetic CPU in mind
and therefore rather write assembler than high level language.

Anyway...
Gruss
H.

--
VCF Europa 5.0 am 01./02. Mai 2004 in Muenchen
http://www.vcfe.org/

Received on Fri Nov 14 2003 - 15:49:52 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:19 BST