ChatGPT解决这个技术问题 Extra ChatGPT

Why is x86 ugly? Why is it considered inferior when compared to others? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

I've been reading some SO archives and encountered statements against the x86 architecture.

Why do we need different CPU architecture for server & mini/mainframe & mixed-core? says "PC architecture is a mess, any OS developer would tell you that."

Is learning Assembly Language worth the effort? (archived) says "Realize that the x86 architecture is horrible at best"

Any easy way to learn x86 assembler? says "Most colleges teach assembly on something like MIPS because it's much simpler to understand, x86 assembly is really ugly"

and many more comments like

"Compared to most architectures, X86 sucks pretty badly."

"It's definitely the conventional wisdom that X86 is inferior to MIPS, SPARC, and PowerPC"

"x86 is ugly"

I tried searching but didn't find any reasons. I don't find x86 bad probably because this is the only architecture I'm familiar with.

Can someone kindly give me reasons for considering x86 ugly/bad/inferior compared to others.

I'm going with S&A on the basis of the answers so far, but I'll note in passing that CISC isn't a problem for the m68k instruction set. x86 is what it is, and you can keep it.
what is "S&A"? " CISC isn't a problem for the m68k instruction set." -- Why not?
The motorala 68000 series chips have a highly CISC architecture, but they have a uniform, fairly orthogonal, and very easy instruction set. Why the difference from x86? I don't know. But take note that there is a big difference between complexity in the chip and complexity in the instruction set (i.e. in the interface that an assembly programmer sees).
+1 for a very interesting question.
Recent study on energy efficiency of different processors found here, with a good discussion of what drove CISC & RISC designs. extremetech.com/extreme/…

P
Peter Cordes

Couple of possible reasons for it:

x86 is a relatively old ISA (its progenitors were 8086s, after all) x86 has evolved significantly several times, but hardware is required to maintain backwards compatibility with old binaries. For example, modern x86 hardware still contains support for running 16 bit code natively. Additionally, several memory-addressing models exist to allow older code to inter-operate on the same processor, such as real mode, protected mode, virtual 8086 mode, and (amd64) long mode. This can be confusing to some. x86 is a CISC machine. For a long time this meant it was slower than RISC machines like MIPS or ARM, because instructions have data interdependency and flags making most forms of instruction level parallelism difficult to implement. Modern implementations translate the x86 instructions into RISC-like instructions called "micro-ops" under the covers to make these kinds of optimizations practical to implement in hardware. In some respects, the x86 isn't inferior, it's just different. For example, input/output is handled as memory mapping on the vast majority of architectures, but not on the x86. (NB: Modern x86 machines typically have some form of DMA support, and communicate with other hardware through memory mapping; but the ISA still has I/O instructions like IN and OUT) The x86 ISA has a very few architectural registers, which can force programs to round-trip through memory more frequently than would otherwise be necessary. The extra instructions needed to do this take execution resources that could be spent on useful work, although efficient store-forwarding keeps the latency low. Modern implementations with register renaming onto a large physical register file can keep many instructions in flight, but lack of architectural registers was still a significant weakness for 32-bit x86. x86-64's increase from 8 to 16 integer and vector registers is one of the biggest factors in 64bit code being faster than 32-bit (along with the more efficient register-call ABI), not the increased width of each register. A further increase from 16 to 32 integer registers would help some, but not as much. (AVX512 does increase to 32 vector registers, though, because floating-point code has higher latency and often needs more constants.) (see comment) x86 assembly code is complicated because x86 is a complicated architecture with many features. An instruction listing for a typical MIPS machine fits on a single letter sized piece of paper. The equivalent listing for x86 fills several pages, and the instructions just do more, so you often need a bigger explanation of what they do than a listing can provide. For example, the MOVSB instruction needs a relatively large block of C code to describe what it does: if (DF==0) *(byte*)DI++ = *(byte*)SI++; else *(byte*)DI-- = *(byte*)SI--; That's a single instruction doing a load, a store, and two adds or subtracts (controlled by a flag input), each of which would be separate instructions on a RISC machine. While MIPS (and similar architectures) simplicity doesn't necessarily make them superior, for teaching an introduction to assembler class it makes sense to start with a simpler ISA. Some assembly classes teach an ultra-simplified subset of x86 called y86, which is simplified beyond the point of not being useful for real use (e.g. no shift instructions), or some teach just the basic x86 instructions. The x86 uses variable-length opcodes, which add hardware complexity with respect to the parsing of instructions. In the modern era this cost is becoming vanishingly small as CPUs become more and more limited by memory bandwidth than by raw computation, but many "x86 bashing" articles and attitudes come from an era when this cost was comparatively much larger. Update 2016: Anandtech has posted a discussion regarding opcode sizes under x64 and AArch64.

EDIT: This is not supposed to be a bash the x86! party. I had little choice but to do some amount of bashing given the way the question's worded. But with the exception of (1), all these things were done for good reasons (see comments). Intel designers aren't stupid -- they wanted to achieve some things with their architecture, and these are some of the taxes they had to pay to make those things a reality.


It's a tradeoff. It's a strength in that the binary size might be smaller, but it's a weakness in that you need to have very complicated hardware to implement a parser for these instructions. The vast majority of instructions are the same size anyway -- most of the reason for variable length opcodes on x86 is for when they decided to add features and found they couldn't represent what they wanted in the number of bits they had to work with. The vast majority of people aren't concerned with binary size nearly as much as hardware complexity or power consumption.
@Joey Adams: Contrast the x86's variable length instructions with the ARM's Thumb Mode ( en.wikipedia.org/wiki/ARM_architecture#Thumb ). Thumb Mode results in significantly smaller object code for the ARM because the shorter instructions map directly to normal instructions. But since there is a 1:1 mapping between the larger instructions and the smaller ones, the parsing hardware is simple to implement. The x86's variable length instructions don't have these benefits because they weren't designed that way in the first place.
(6) Not every op-code needs to be used by every program, but dammit, when I need SSE3, I'm glad I have it.
@Chris Kaminski: How does that not affect the hardware? Sure, on a modern full sized computer nobody's going to care, but if I'm making something like a cell phone, I care more about power consumption than almost anything else. The variable length opcodes don't increase execution time but the decode hardware still requires power to operate.
Which is one of the things that make the x86 instruction set so ugly, since it can't decide if it's an accumulator or a register-file based architecture (though this was mostly fixed with the 386, which made the instruction set much more orthogonal, irregardless of whatever the 68k fans tell you).
d
dthorpe

The main knock against x86 in my mind is its CISC origins - the instruction set contains a lot of implicit interdependencies. These interdependencies make it difficult to do things like instruction reordering on the chip, because the artifacts and semantics of those interdependencies must be preserved for each instruction.

For example, most x86 integer add & subtract instructions modify the flags register. After performing an add or subtract, the next operation is often to look at the flags register to check for overflow, sign bit, etc. If there's another add after that, it's very difficult to tell whether it's safe to begin execution of the 2nd add before the outcome of the 1st add is known.

On a RISC architecture, the add instruction would specify the input operands and the output register(s), and everything about the operation would take place using only those registers. This makes it much easier to decouple add operations that are near each other because there's no bloomin' flags register forcing everything to line up and execute single file.

The DEC Alpha AXP chip, a MIPS style RISC design, was painfully spartan in the instructions available, but the instruction set was designed to avoid inter-instruction implicit register dependencies. There was no hardware-defined stack register. There was no hardware-defined flags register. Even the instruction pointer was OS defined - if you wanted to return to the caller, you had to work out how the caller was going to let you know what address to return to. This was usually defined by the OS calling convention. On the x86, though, it's defined by the chip hardware.

Anyway, over 3 or 4 generations of Alpha AXP chip designs, the hardware went from being a literal implementation of the spartan instruction set with 32 int registers and 32 float registers to a massively out of order execution engine with 80 internal registers, register renaming, result forwarding (where the result of a previous instruction is forwarded to a later instruction that is dependent on the value) and all sorts of wild and crazy performance boosters. And with all of those bells and whistles, the AXP chip die was still considerably smaller than the comparable Pentium chip die of that time, and the AXP was a hell of a lot faster.

You don't see those kinds of bursts of performance boosting things in the x86 family tree largely because the x86 instruction set's complexity makes many kinds of execution optimizations prohibitively expensive if not impossible. Intel's stroke of genius was in giving up on implementing the x86 instruction set in hardware anymore - all modern x86 chips are actually RISC cores that to a certain degree interpret the x86 instructions, translating them into internal microcode which preserves all the semantics of the original x86 instruction, but allows for a little bit of that RISC out-of-order and other optimizations over the microcode.

I've written a lot of x86 assembler and can fully appreciate the convenience of its CISC roots. But I didn't fully appreciate just how complicated x86 was until I spent some time writing Alpha AXP assembler. I was gobsmacked by AXP's simplicity and uniformity. The differences are enormous, and profound.


I'll listen to no bashing of CISC per se unless and until you can explain m68k.
I'm not familiar with the m68k, so I can't critique it.
I don't think this answer is bad enough to downvote, but I do think the whole "RISC is smaller and faster than CISC" argument isn't really relevant in the modern era. Sure, the AXP might have been a hell of a lot faster for it's time, but the fact of the matter is that modern RISCs and modern CISCs are about the same when it comes to performance. As I said in my answer, the slight power penalty for x86 decode is a reason not to use x86 for something like a mobile phone, but that's little argument for a full sized desktop or notebook.
@Billy: size is more than just code size or instruction size. Intel pays quite a penalty in chip surface area to implement the hardware logic for all those special instructions, RISC microcode core under the hood or not. Size of the die directly impacts cost to manufacture, so it's still a valid concern with modern system designs.
There was an article by arstechnica's Jon Stokes that said that the number of transistors used for x86-RISC translation has remained mostly constant, which means that it's relative size compared to the total number of transistors in the die has shrunk: arstechnica.com/old/content/2004/07/pentium-1.ars/2
s
staticsan

The x86 architecture dates from the design of the 8008 microprocessor and relatives. These CPUs were designed in a time when memory was slow and if you could do it on the CPU die, it was often a lot faster. However, CPU die-space was also expensive. These two reasons are why there are only a small number of registers that tend to have special purposes, and a complicated instruction set with all sorts of gotchas and limitations.

Other processors from the same era (e.g. the 6502 family) also have similar limitations and quirks. Interestingly, both the 8008 series and the 6502 series were intended as embedded controllers. Even back then, embedded controllers were expected to be programmed in assembler and in many ways catered to the assembly programmer rather than the compiler writer. (Look at the VAX chip for what happens when you cater to the compiler write.) The designers didn't expect them to become general purpose computing platforms; that's what things like the predecessors of the POWER archicture were for. The Home Computer revolution changed that, of course.


+1 for the only answer here from someone who actually seems to have historical background on the issue.
Memory has always been slow. It is possibly (relatively speaking) slower today than it was when I began with Z80s and CP/M in 1982. Extinction is not the only path of evolution because with extinction that particular evolutionary direction stops. I would say the x86 has adapted well in its 28 year (so far existence).
Memory speeds briefly hit near parity with CPUs around the time of the 8086. The 9900 from Texas Instruments has a design that only works because this happened. But then the CPU raced ahead again and has stayed there. Only now, there are caches to help manage this.
@Olof Forshell: It was assembler compatible in that 8080 assembly code could translate into 8086 code. From that point of view, it was 8080 plus extensions, much like you could view 8080 as 8008 plus extensions.
@Olof Forshell: Except that the 8086 was designed for that to happen. It was an extension of the 8080, and most (possibly all) 8080 instructions mapped one-to-one, with obviously similar semantics. That isn't true of the IBM 360 architecture, no matter which way you want to push it.
O
Olof Forshell

I have a few additional aspects here:

Consider the operation "a=b/c" x86 would implement this as

  mov eax,b
  xor edx,edx
  div dword ptr c
  mov a,eax

As an additional bonus of the div instruction edx will contain the remainder.

A RISC processor would require first loading the addresses of b and c, loading b and c from memory to registers, doing the division and loading the address of a and then storing the result. Dst,src syntax:

  mov r5,addr b
  mov r5,[r5]
  mov r6,addr c
  mov r6,[r6]
  div r7,r5,r6
  mov r5,addr a
  mov [r5],r7

Here there typically won't be a remainder.

If any variables are to be loaded through pointers both sequences may become longer though this is less of a possibility for the RISC because it may have one or more pointers already loaded in another register. x86 has fewer register so the likelihood of the pointer being in one of them is smaller.

Pros and cons:

The RISC instructionss may be mixed with surrounding code to improve instruction scheduling, this is less of a possibility with x86 which instead does this work (more or less well depending on the sequence) inside the CPU itself. The RISC sequence above will typically be 28 bytes long (7 instructions of 32-bit/4 byte width each) on a 32-bit architecture. This will cause the off-chip memory to work more when fetching the instructions (seven fetches). The denser x86 sequence contains fewer instructions and though their widths vary you're probably looking at an average of 4 bytes/instruction there too. Even if you have instruction caches to speed this up seven fetches means that you will have a deficit of three elsewhere to make up for compared to the x86.

The x86 architecture with fewer registers to save/restore means that it will probably do thread switches and handle interrupts faster than RISC. More registers to save and restore requires more temporary RAM stack space to do interrupts and more permanent stack space to store thread states. These aspects should make x86 a better candidate for running pure RTOS.

On a more personal note I find it more difficult to write RISC assembly than x86. I solve this by writing the RISC routine in C, compiling and modifying the generated code. This is more efficient from a code production standpoint and probably less efficient from an execution standpoint. All those 32 registers to keep track of. With x86 it is the other way around: 6-8 registers with "real" names makes the problem more manageable and instills more confidence that the code produced will work as expected.

Ugly? That's in the eye of the beholder. I prefer "different."


a, b and c in my examples should be viewed as memory-based variables and not to immediate values.
... "dword ptr" is used to specifiy the size of a variable whose size is not known if, for instance, it is simply declared as external or if you've been lazy.
That isn't the first time I heard the suggestion to write it in C first, and then distill it into assembler. That definitely helps
In the early days all processors were RISC. CISC came about as a mitigation strategy for ferric core memory systems that were VERY slow, thus CISC, with fewer, more powerful instructions, put less stress on the memory subsystem, and made better use of bandwidth. Likewise, registers were originally thought of as on-chip, in-CPU memory locations for doing accumulations. The last time I seriously benchmarked a RISC machine was 1993 - SPARC and HP Prisim. SPARC was horrible across the board. Prisim was up to 20x as fast as a 486 on add/sub/mul but sucked on transcendentals. CISC is better.
@OlofForshell You say there typically won't be a reminder but wiki says that mips have it: en.wikipedia.org/wiki/MIPS_instruction_set#Integer
R
R.. GitHub STOP HELPING ICE

I think this question has a false assumption. It's mainly just RISC-obsessed academics who call x86 ugly. In reality, the x86 ISA can do in a single instruction operations which would take 5-6 instructions on RISC ISAs. RISC fans may counter that modern x86 CPUs break these "complex" instructions down into microops; however:

In many cases that's only partially true or not true at all. The most useful "complex" instructions in x86 are things like mov %eax, 0x1c(%esp,%edi,4) i.e. addressing modes, and these are not broken down. What's often more important on modern machines is not the number of cycles spent (because most tasks are not cpu-bound) but the instruction cache impact of code. 5-6 fixed-size (usually 32bit) instructions will impact the cache a lot more than one complex instruction that's rarely more than 5 bytes.

x86 really absorbed all the good aspects of RISC about 10-15 years ago, and the remaining qualities of RISC (actually the defining one - the minimal instruction set) are harmful and undesirable.

Aside from the cost and complexity of manufacturing CPUs and their energy requirements, x86 is the best ISA. Anyone who tells you otherwise is letting ideology or agenda get in the way of their reasoning.

On the other hand, if you are targetting embedded devices where the cost of the CPU counts, or embedded/mobile devices where energy consumption is a top concern, ARM or MIPS probably makes more sense. Keep in mind though you'll still have to deal with the extra ram and binary size needed to handle code that's easily 3-4 times larger, and you won't be able to get near the performance. Whether this matters depends a lot on what you'll be running on it.


where energy consumption is a top concern, ARM or MIPS probably makes more sense... so, if there is at least one aspect where ARM or MIPS make more sense, doesn't it make x86 not necessarily the best ISA?
That's why I qualified "the best" with "aside from the cost...and their energy requirements".
I think Intel's throttling down the CPU speed, and smaller die sizes have largely eliminated the power differential. The new Celeron dual 64-bit CPU with 64k L1 and 1MB L2 caches is a 7.5 watt chip. It's my "Starbucks" hangout machine, and the battery life is ridiculously long and will run rings around a P6 machine. As a guy doing mostly floating point computations I gave up on RISC a long time ago. It just crawls. SPARC in particular was atrociously glacial. The perfect example of why RISC sucks was the Intel i860 CPU. Intel never went THERE again.
@RocketRoy: 7.5 watt isn't really acceptable for a device that's powered 24/7 (and not performing useful computations the whole time) or running off a 3.7v/2000mAh battery.
@RocketRoy "Intel i860 CPU. Intel never went THERE again." After a little research, the i860 sounds a lot like Itanium: VLIW, compiler-ordered instruction parallelism....
P
Peter Cordes

x86 assembler language isn't so bad. It's when you get to the machine code that it starts to get really ugly. Instruction encodings, addressing modes, etc are much more complicated than the ones for most RISC CPUs. And there's extra fun built in for backward compatibility purposes -- stuff that only kicks in when the processor is in a certain state.

In 16-bit modes, for example, addressing can seem downright bizarre; there's an addressing mode for [BX+SI], but not one for [AX+BX]. Things like that tend to complicate register usage, since you need to ensure your value's in a register that you can use as you need to.

(Fortunately, 32-bit mode is much saner (though still a bit weird itself at times -- segmentation for example), and 16-bit x86 code is largely irrelevant anymore outside of boot loaders and some embedded environments.)

There's also the leftovers from the olden days, when Intel was trying to make x86 the ultimate processor. Instructions a couple of bytes long that performed tasks that no one actually does any more, cause they were frankly too freaking slow or complicated. The ENTER and LOOP instructions, for two examples -- note the C stack frame code is like "push ebp; mov ebp, esp" and not "enter" for most compilers.


I believe the "enter" versus "push/mov" issue arose because on some processors, "push/mov" is faster. On some processors, "enter" is faster. C’est la vie.
When I was forced to a x86 based machine and started to take a look at it (having m68k background),I started to feel asm programming frustrating, ... like if I've learned programming with a language like C, and then be forced to get in touch with asm... you "feel" you lose power of expression, ease, clarity, "coherence", "intuitionability".I am sure that if I would have started asm programming with x86,I would have thought it is not so bad...maybe... I did also MMIX and MIPS, and their "asm lang" is far better than x86 (if this is the right PoV for the Q, but maybe it is not)
The addressing mode problem was rectified in the 80386. Only 16 bit code has limited addressing modes, 32 bit code is much better. You can get the 32 bit addressing modes in 16 bit code using a special prefix and vice versa.
@FUZxxl: Yeah...i probably should have mentioned that the ugliness is mostly limited to 16-bit code. Fixed (i think). :)
The perceived inelegance mostly comes from the misconception that the registers of an 8086 are general purpose registers; that's incorrect. Each of them has a special purpose and if you don't stick to their purposes, you are going to have a bad time.
g
gatoatigrado

I'm not an expert, but it seems that many of the features why people don't like it can be the reasons it performs well. Several years ago, having registers (instead of a stack), register frames, etc. were seen as nice solutions for making the architecture seem simpler to humans. However, nowadays, what matters is cache performance, and x86's variable-length words allow it to store more instructions in cache. The "instruction decode", which I believe opponents pointed out once took up half the chip, is not nearly so much that way anymore.

I think parallelism is one of the most important factors nowadays -- at least for algorithms that already run fast enough to be usable. Expressing high parallelism in software allows the hardware to amortize (or often completely hide) memory latencies. Of course, the farther reaching architecture future is probably in something like quantum computing.

I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware. CUDA's PTX does some fast register use calculations (graph coloring), so nVidia can use a register machine instead of a stack machine, but still have an upgrade path that doesn't break all old software.


RISC was not designed with human developers in mind. One of the ideas behind RISC was to offload some of the complexity of the chip onto whoever wrote the assembly, ideally the compiler. More registers meant less memory usage and fewer dependencies between instructions, allowing deeper pipelines and higher performance. Note that x86-64 has twice as many general registers as x86, and this alone is responsible for significant performance gains. And instructions on most x86 chips are decoded before they are cached, not after (so size doesn't matter here).
@Dietrich Epp: That's not entirely true. The x86-64 does have more registers visible in the ISA, but modern x86 implementations usually have a RISC style register file which is mapped to the ISA's registers on demand to speed up execution.
"I have heard from nVidia that one of Intel's mistakes was that they kept the binary formats close to the hardware." -- I didn't get this and the CUDA's PTX part.
@Dietrech Epp: "And instructions on most x86 chips are decoded before they are cached, not after" That's not true. They are cached before they are decoded. I believe the Pentium 4 had an additional trace cache that cached after decode, but that's been discontinued.
that is not true, the newest "sandy bridge" processors use a kind of a trace cache (like that for the pentium 4, oh that old boy :D ), so technologies go away and come back...
d
dan04

Besides the reasons people have already mentioned:

x86-16 had a rather strange memory addressing scheme which allowed a single memory location to be addressed in up to 4096 different ways, limited RAM to 1 MB, and forced programmers to deal with two different sizes of pointers. Fortunately, the move to 32-bit made this feature unnecessary, but x86 chips still carry the cruft of segment registers.

While not a fault of x86 per se, x86 calling conventions weren't standardized like MIPS was (mostly because MS-DOS didn't come with any compilers), leaving us with the mess of __cdecl, __stdcall, __fastcall, etc.


Hmm.. when I think of x86 competitors, I don't think of MIPS. ARM or PowerPC maybe....
@Billy: x86 has been around near forever. At one time MIPS was an x86 competitor. As I remember x86 had its work cut out to get to a level where it was competitive with MIPS. (Back when MIPS and SPARC were fighting it out in the workstation arena.)
@Shannon Severance: Just because something once was does not mean something that is.
@supercat: what people in the era of the flat x86-32 memory model tend to forget is that 16 bits means 64k of memory (anyone who bothers doing the math will understand that magic isn't possible, that the 8086 wasn't a nasty punishment for unsuspecting programmers). There are few ways to get around 64k but the 8086 solution was a good compromise.
@OlofForshell: I think many people bemoaned the fact that the 8086 wasn't as nice as the 68000 (which had a 16MB linear addressing space and a clear path to 4 gigs). Certainly going to a 32-bit processor will make it easier to access more than 64K, but the 8086 is a 16-bit architecture which was designed to be a step up from the 8-bit 8080. I see no reason Intel should have leapt directly from an 8-bit to a 32-bit one.
B
Bernd Jendrissek

I think you'll get to part of the answer if you ever try to write a compiler that targets x86, or if you write an x86 machine emulator, or even if you try to implement the ISA in a hardware design.

Although I understand the "x86 is ugly!" arguments, I still think it's more fun writing x86 assembly than MIPS (for example) - the latter is just plain tedious. It was always meant to be nice to compilers rather than to humans. I'm not sure a chip could be more hostile to compiler writers if it tried...

The ugliest part for me is the way (real-mode) segmentation works - that any physical address has 4096 segment:offset aliases. When last did you need that? Things would have been so much simpler if the segment part were strictly higher-order bits of a 32-bit address.


m68k is a lot funnier, and nice to humans far more than x86 (which can't seem so "human" to many m68k programmers), if the right PoV is the way human can write code in those assembly.
The segment:offset addressing was an attempt to stay compatible to some extent with the CP/M - world. One of the worst decisions ever.
@Turing Complete: segment:offset was NOT primarily an attempt to stay compatible with the CP/M world. What it was was a very successful attempt to allow a 16 bit processor to address more than 64 KBytes by placing code, data, stack and other memory areas in different segments.
In reality placing data and stack in different segments was utterly useless for C; it was only usable for asm. In C, a pointer can point to data with static, automatic, or dynamically allocated storage duration, so there's no way to elide the segment. Maybe it was useful for Pascal or Fortran or something, but not for C, which was already the dominant language at the time...
@Bernd: The reason fs/gs were chosen for thread-local storage is not that segment registers are good for this. It's just that x86 is seriously starved for registers, and the segment registers were unused. A general-purpose register pointing to the thread structure would have worked just as well, and in fact many RISC systems with more registers use one as a thread pointer.
T
Turing Complete

x86 has a very, very limited set of general purpose registers it promotes a very inefficient style of development on the lowest level (CISC hell) instead of an efficient load / store methodology Intel made the horrifying decision to introduce the plainly stupid segment / offset - memory adressing model to stay compatible with (at this time already!) outdated technology At a time when everyone was going 32 bit, the x86 held back the mainstream PC world by being a meager 16 bit (most of them - the 8088 - even only with 8 bit external data paths, which is even scarier!) CPU

For me (and I'm a DOS veteran that has seen each and every generation of PCs from a developers perspective!) point 3. was the worst.

Imagine the following situation we had in the early 90s (mainstream!):

a) An operating system that had insane limitations for legacy reasons (640kB of easily accessible RAM) - DOS

b) An operating system extension (Windows) that could do more in terms of RAM, but was limited when it came to stuff like games, etc... and was not the most stable thing on Earth (luckily this changed later, but I'm talking about the early 90s here)

c) Most software was still DOS and we had to create boot disks often for special software, because there was this EMM386.exe that some programs liked, others hated (especially gamers - and I was an AVID gamer at this time - know what I'm talking about here)

d) We were limited to MCGA 320x200x8 bits (ok, there was a bit more with special tricks, 360x480x8 was possible, but only without runtime library support), everything else was messy and horrible ("VESA" - lol)

e) But in terms of hardware we had 32 bit machines with quite a few megabytes of RAM and VGA cards with support of up to 1024x768

Reason for this bad situation?

A simple design decision by Intel. Machine instruction level (NOT binary level!) compatibility to something that was already dying, I think it was the 8085. The other, seemingly unrelated problems (graphic modes, etc...) were related for technical reasons and because of the very narrow minded architecture the x86 platform brought with itself.

Today, the situation is different, but ask any assembler developer or people who build compiler backends for the x86. The insanely low number of general purpose registers is nothing but a horrible performance killer.


The only major problems with the 8086 segmented architecture was that there was only one non-dedicated segment register (ES), and that programming languages were not designed to work with it effectively. The style of scaled addressing it uses would work very well in an object-oriented language which does not expect objects to be able to start at arbitrary addresses (if one aligns objects on paragraph boundaries, object references will only need to be two bytes rather than four). If one compares early Macintosh code to PC code, the 8086 actually looks pretty good compared to 68000.
@supercat: actually, the es register WAS dedicated to something, namely to those string instructions that required storing (movs, stos) or scanning (cmps and scas). Given 64KiB addressing from every segment register es also provided the "missing link" to memory other than code, data and stack memory (cs, ds, ss). The segment registers provided a sort of memory protection scheme in that you could not address outside the registers' 64Kib memory blocks. What better solution do you propose given that the x86 was a 16-bit architecture and the lithography constraints of the day?
@OlofForshell: ES was used for string instructions, but could be used as an uncommitted register for code not using them. A way to ease the seg-reg bottleneck without requiring too much opcode space would be to have an "rseg" prefix which would specify that for the following r/m-format instruction the "r" field would select from CS/SS/DS/ES/FS/GS/??/?? instead of AX/BX/CX/DX/SI/DI/SP/BP, and to have prefixes for FS/GS and instructions for LFS and LGS (like LDS and LES). I don't know how the micro-architecture for the 8086 was laid out, but I would think something like that could have worked.
@supercat: as I wrote, "register es also provide the missing link to memory other than ..." Fs and gs didn't arrive until the 386 as I recall.
@OlofForshell: They didn't, which made the 80286 architecture even worse than the 8086 architecture in most regards. My point was that adding a couple more segment registers (or even one, for that matter) would have made the 8086 architecture a lot more useful, and the instruction set could have been cleaner and more useful if segment registers could be accessed much like the other ones.