Request from Intel's Museum from Dwight K. Elvey on 2002-10-09 (2002-October)

From: Dwight K. Elvey <dwightk.elvey_at_amd.com>
Date: Wed Oct 9 16:21:00 2002

>From: "Ross Archer" <archer_at_topnow.com>
>
>"Dwight K. Elvey" wrote:
>>
>> >From: "Ross Archer" <dogbert_at_mindless.com>
>> >
>> >Jerome H. Fine wrote:
>> >
>> >>>Jim Kearney wrote:
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >>>I just had an email exchange with someone at Intel's Museum
>> >>>(http://www.intel.com/intel/intelis/museum/index.htm)
>> >>>
>> >>>
>> >>
>> >>Jerome Fine replies:
>> >>
>> >>I am not sure why the information is so blatant in its
>> >>stupid attempt to ignore anything but Intel hardware
>> >>as far a anything that even look like a CPU chip, but
>> >>I guess it is an "Intel" museum.
>> >>
>> >>Of course, even now, Intel, in my opinion, is so far
>> >>behind from a technical point of view that is is a sad
>> >>comment just to read about the products that were
>> >>way behind, and still are, the excellence of other
>> >>products. No question that if the Pentium 4 had been
>> >>produced 10 years ago, it would have been a major
>> >>accomplishment.
>> >>
>> >Harsh! :)
>> >
>> >Guess it depends on what you mean by "far behind from a
>> >technical point of view."
>> >
>> >If you mean that x86 is an ugly legacy architecture, with
>> >not nearly enough registers, an instruction set which
>> >doesn't fit any reasonable pipeline, that's ugly to decode
>> >and not particularly orthogonal, that from purely technical
>> >reasons ought to have died a timely death in 1990,
>> >I'd have to agree.
>> >
>> >However, look at the performance. P4 is up near the
>> >top of the tree with the best RISC CPUs, which have
>> >the advantage of clean design and careful evolution.
>> >
>> >It surely takes a great deal of inspiration, creativity,
>> >and engineering talent to take something as ill-suited
>> >as the x86 architecture and get this kind of performance
>> >out of it. IMHO.
>> >
>> >In other words, making x86 fast must be a lot like
>> >getting Dumbo off the air. That ought to count as
>> >some kind of technical achievement. :)
>>
>> ---snip---
>>
>> It is all done with smoke and mirrors.
>
>Anything the results in a net faster CPU isn't, in my book,
>akin to smoke and mirrors.
>
>If anyone's guilty of "smoke and mirrors", it's probably
>Intel by making a ridiculous long (20-24 stage) pipeline
>just to allow the wayupcrankinzee of clock rates so they can
>be first CPU to X Ghz. Why not a 50 stage pipeline that hits
>8 Ghz, nevermind the hideous branch-misprediction penalties
>and exception overhead?
>
>
>> We do the same
>> here at AMD. The trick is to trade immediate execution
>> for known execution. The x86 code is translated to run
>> on a normal RISC engine.
>
>Yes, and this in and of itself must be rather tricky, no?
>X86 instructions are variable-length, far from load/store,
>have gobs of complexity in protected nonflat mode, etc.
>I'd bet a significant portion of the Athlon or P4 is devoted
>just to figuring out how to
>translate/align/schedule/dispatch
>such a mess with a RISC core under the hood. :)

It doesn't take as much as one would think but it is a hit
on speed and space. Still, the overall hit is really quite
small.

>
>> This means that the same tricks
>> on a normal RISC engine would most likely only buy about
>> a couple percent. It would only show up on the initial
>> load of the local cache. Once that is done, there is
>> really little difference.
>> Choices of pipeline depth, out of order execution, multiple
>> execution engines and such are just the fine tuning.
>> Intel, like us is just closer to the fine edge of what
>> the silicon process can do than anything tricky that
>> people like MIPS don't know about.
>
>Well, why isn't something elegant like Alpha, HP-PA, or MIPS
>at the top of the performance tree then? (Or are they and
>I'm
>just not aware of the latest new products.)
>
>My pet theory is that the higher code density of x86
>vs. mainline RISC helps utilize the memory subsystem
>more efficiently, or at least overtaxes it less often.
>The decoding for RISC is a lot simpler, but
>if the caching systems can't completely compensate for the
>higher
>memory bandwidth requirements, you're stalling more often or
>limiting
>the maximum internal CPU speed indirectly due to the
>mismatch.
>And decoding on-chip can go much faster than any sort of
>external
>memory these days.

This is why the newer processor chips are really a memory
chip with some processor attached, rather than a processor
with some memory attached. We and Intel are turning into
RAM makers. Memory bandwidth is on the increase but it
isn't keeping up with chip speed.
Still, I don't understand why many are not going to more
efficient memory optimization than apparent execution speed.
The compiler writers have a ways to go. The day is gone
when pinhole optimization buys much. Keeping the process
in on chip cache is really the important thing. There isn't
an application out there that if one removed the large data
array and image bit tables, couldn't completely fit in the
caches that are being used today. The compilers just don't
write code well enough to keep the size down. It is just
that we've gotten into the poor choice of languages and poor
connection of software writers to the actual machine code
that is run.
Just my opinion.
Dwight

>
>This isn't really a discussion for classiccmp, but I
>couldn't
>resist since I'm sure at least some folks enjoy
>speculationalism
>on such topics. :)
>
>
>>
>> On a separate subject, I was very disappointed in the
>> Intel Museum. I'd thought it might be a good place to
>> research early software or early IC's. They have vary
>> little to offer to someone looking into this level of
>> stuff. Any local library has better references on this
>> kind of stuff ( and that isn't saying much ).
>> Dwight
>
>n
>
Received on Wed Oct 09 2002 - 16:21:00 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:35:32 BST