What is the purpose of a stack? Why do we need it?

c# .net vb.net cil .net-assembly

So I am learning MSIL right now to learn to debug my C# .NET applications.

I've always wondered: what is the purpose of the stack?

Just to put my question in context: Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory?

Is it because it's faster?

Is it because it's RAM based?

For efficiency?

I'm trying to grasp this to help me understand CIL codes much more deeply.

The stack is one part of the memory, just like the heap is another part of the memory.

@CodeInChaos are you talking about value types vs reference types? or is that the same in terms of IL Codes? ...I know that stack is just faster and more efficient than heap (but that is in the value/ref types world.. which I don't know if is the same here?)

@CodeInChaos - I think the stack that Jan's referencing is the stack machine that IL is written against, as opposed to the region of memory which accepts stack frames during function calls. They're two different stacks, and after JIT, the IL stack doesn't exist (on x86, anyway)

How MSIL knowledge will help you debug .NET applications?

On modern machines, caching behavior of code is a performance maker-and-breaker. Memory is everywhere. Stack is, usually, just here. Assuming that the stack is a real thing, and not just a concept used in expressing the operation of some code. In implementing a MSIL-running platform, there's no requirement that the stack concept makes it to the hardware actually pushing the bits around.

Callum Watkins

UPDATE: I liked this question so much I made it the subject of my blog on November 18th 2011. Thanks for the great question!

I've always wondered: what is the purpose of the stack?

I assume you mean the evaluation stack of the MSIL language, and not the actual per-thread stack at runtime.

Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory?

MSIL is a "virtual machine" language. Compilers like the C# compiler generate CIL, and then at runtime another compiler called the JIT (Just In Time) compiler turns the IL into actual machine code that can execute.

So first let's answer the question "why have MSIL at all?" Why not just have the C# compiler write out machine code?

Because it is cheaper to do it this way. Suppose we didn't do it that way; suppose each language has to have its own machine code generator. You have twenty different languages: C#, JScript .NET, Visual Basic, IronPython, F#... And suppose you have ten different processors. How many code generators do you have to write? 20 x 10 = 200 code generators. That's a lot of work. Now suppose you want to add a new processor. You have to write the code generator for it twenty times, one for each language.

Furthermore, it is difficult and dangerous work. Writing efficient code generators for chips that you are not an expert on is a hard job! Compiler designers are experts on the semantic analysis of their language, not on efficient register allocation of new chip sets.

Now suppose we do it the CIL way. How many CIL generators do you have to write? One per language. How many JIT compilers do you have to write? One per processor. Total: 20 + 10 = 30 code generators. Moreover, the language-to-CIL generator is easy to write because CIL is a simple language, and the CIL-to-machine-code generator is also easy to write because CIL is a simple language. We get rid of all of the intricacies of C# and VB and whatnot and "lower" everything to a simple language that is easy to write a jitter for.

Having an intermediate language lowers the cost of producing a new language compiler dramatically. It also lowers the cost of supporting a new chip dramatically. You want to support a new chip, you find some experts on that chip and have them write an CIL jitter and you're done; you then support all those languages on your chip.

OK, so we've established why we have MSIL; because having an intermediate language lowers costs. Why then is the language a "stack machine"?

Because stack machines are conceptually very simple for language compiler writers to deal with. Stacks are a simple, easily understood mechanism for describing computations. Stack machines are also conceptually very easy for JIT compiler writers to deal with. Using a stack is a simplifying abstraction, and therefore again, it lowers our costs.

You ask "why have a stack at all?" Why not just do everything directly out of memory? Well, let's think about that. Suppose you want to generate CIL code for:

int x = A() + B() + C() + 10;

Suppose we have the convention that "add", "call", "store" and so on always take their arguments off the stack and put their result (if there is one) on the stack. To generate CIL code for this C# we just say something like:

load the address of x // The stack now contains address of x
call A()              // The stack contains address of x and result of A()
call B()              // Address of x, result of A(), result of B()
add                   // Address of x, result of A() + B()
call C()              // Address of x, result of A() + B(), result of C()
add                   // Address of x, result of A() + B() + C()
load 10               // Address of x, result of A() + B() + C(), 10
add                   // Address of x, result of A() + B() + C() + 10
store in address      // The result is now stored in x, and the stack is empty.

Now suppose we did it without a stack. We'll do it your way, where every opcode takes the addresses of its operands and the address to which it stores its result:

Allocate temporary store T1 for result of A()
Call A() with the address of T1
Allocate temporary store T2 for result of B()
Call B() with the address of T2
Allocate temporary store T3 for the result of the first addition
Add contents of T1 to T2, then store the result into the address of T3
Allocate temporary store T4 for the result of C()
Call C() with the address of T4
Allocate temporary store T5 for result of the second addition
...

You see how this goes? Our code is getting huge because we have to explicitly allocate all the temporary storage that would normally by convention just go on the stack. Worse, our opcodes themselves are all getting enormous because they all now have to take as an argument the address that they're going to write their result into, and the address of each operand. An "add" instruction that knows that it is going to take two things off the stack and put one thing on can be a single byte. An add instruction that takes two operand addresses and a result address is going to be enormous.

We use stack-based opcodes because stacks solve the common problem. Namely: I want to allocate some temporary storage, use it very soon and then get rid of it quickly when I'm done. By making the assumption that we have a stack at our disposal we can make the opcodes very small and the code very terse.

UPDATE: Some additional thoughts

Incidentally, this idea of drastically lowering costs by (1) specifing a virtual machine, (2) writing compilers that target the VM language, and (3) writing implementations of the VM on a variety of hardware, is not a new idea at all. It did not originate with MSIL, LLVM, Java bytecode, or any other modern infrastructures. The earliest implementation of this strategy I'm aware of is the pcode machine from 1966.

The first I personally heard of this concept was when I learned how the Infocom implementors managed to get Zork running on so many different machines so well. They specified a virtual machine called the Z-machine and then made Z-machine emulators for all the hardware they wanted to run their games on. This had the added enormous benefit that they could implement virtual memory management on primitive 8-bit systems; a game could be larger than would fit into memory because they could just page the code in from disk when they needed it and discard it when they needed to load new code.

WOW. That is just EXACTLY what I was looking for. Best way to get an answer is to get one from the principal developer himself. Thanks for the time, and I'm sure this will help everyone who wonders the intricacies of the compiler and MSIL. Thanks Eric.

That was a great answer. Reminds me why I read your blog even though I'm a Java guy. ;-)

@JanCarloViray: You are very welcome! I note that I am a Principal Developer, not the principal developer. There are several people on this team with that job title and I am not even the most senior of them.

@Eric: If/when you ever stop loving coding, you should consider going for teaching programmers. Beside the fun, you could be making a killing without the pressures of business. Awesome flair is what you got in that area (and wonderful patience, I might add). I say that as an ex-university lecturer.

About 4 paragraphs in I was saying to my self "This sounds like Eric", by the 5th or 6th I'd graduated to "Yep, definitely Eric" :) Another truly & epically comprehensive answer.

James Ko

Keep in mind that when you're talking about MSIL then you're talking about instructions for a virtual machine. The VM used in .NET is a stack based virtual machine. As opposed to a register based VM, the Dalvik VM used in Android operating systems is an example of that.

The stack in the VM is virtual, it is up to the interpreter or the just-in-time compiler to translate the VM instructions into actual code that runs on the processor. Which in the case of .NET is almost always a jitter, the MSIL instruction set was designed to be jitted from the get go. As opposed to Java bytecode for example, it has distinct instructions for operations on specific data types. Which makes it optimized to be interpreted. An MSIL interpreter actually exists though, it is used in the .NET Micro Framework. Which runs on processors with very limited resources, can't afford the RAM required to store machine code.

The actual machine code model is mixed, having both a stack and registers. One of the big jobs of the JIT code optimizer is to come up with ways to store variables that are kept on the stack in registers, thus greatly improving execution speed. A Dalvik jitter has the opposite problem.

The machine stack is otherwise a very basic storage facility that has been around in processor designs for a very long time. It has very good locality of reference, a very important feature on modern CPUs that chew through data a lot faster than RAM can supply it and supports recursion. Language design is heavily influenced by having a stack, visible in support for local variables and scope limited to the method body. A significant problem with the stack is the one that this site is named for.

+1 for a very detailed explanation, and +100 (if I could) for extra DETAILED comparison to other systems and language :)

Why is Dalvik a Register machine? Sicne its primarily targeted at ARM Processors. Now, x86 has the same amount of registers but being a CISC, only 4 of them are really usable for storing locals because the rest are implicitly used in common instructions. ARM architectures on the other hand have a lot more registers that can be used to store locals, hence they facilitate a register based execution model.

@JohannesRudolph That hasn't been true for almost two decades now. Just because most C++ compilers still target 90s x86 instruction set doesn't mean that x86 itself is defficient. Haswell has 168 general purpose integer registers and 168 GP AVX registers, for example - far more than any ARM CPU I know of. You can use all of those from (modern) x86 assembly any way you want. Blame compiler writers, not the architecture/CPU. In fact, it's one of the reasons why intermediate compilation is so attractive - one binary, best code for a given CPU; no mucking around with 90s age architecture.

@JohannesRudolph The .NET JIT compiler actually uses registers quite heavily; the stack is mostly an abstraction of the IL virtual machine, the code that actually runs on your CPU is very different. Method calls may be pass-by register, locals may be lifted to registers... The main benefit of the stack in machine code is the isolation it gives to subroutine calls - if you put a local in a register, a function call may make you lose that value, and you can't really tell.

@RahulAgarwal The generated machine code may or may not use the stack for any given local or intermediate value. In IL, every argument and local is on the stack - but in the machine code, this is not true (it's allowed, but not required). Some things are useful on the stack, and they're put on the stack. Some things are useful on the heap, and they're put on the heap. Some things aren't necessary at all, or only need a few moments in a register. Calls can be eliminated entirely (inlined), or their arguments may be passed in registers. The JIT has a lot of freedom.

Peter Mortensen

There is a very interesting/detailed Wikipedia article on this, Advantages of stack machine instruction sets. I would need to quote it entirely, so it's easier to simply put a link. I'll simply quote the sub-titles

Very compact object code

Simple compilers / simple interpreters

Minimal processor state

-1 @xanatos Could you try and summarize the headings you have taken?

@chibacity If I wanted to summarize them, I would have done an answer. I was trying to salvage a very good link.

@xanatos I understand your goals, but sharing a link to such a large wikipedia article is not a great answer. It's not hard to find by just googling. On the other hand, Hans has a nice answer.

@chibacity The OP was probably lazy in not searching first. The answerer here gave a good link (without describing it). Two evils do one good :-) And I'll upvote Hans.

to answerer and @xanatos +1 for a GREAT link. I was waiting for someone to fully summarize and have a knowledge-pack answer.. if Hans didn't give an answer, I would have made yours as the accepted answer.. it's just it was just a link, so it wasn't fair for Hans who placed a good effort on his answer.. :)

skyman

To add a little more to the stack question. The stack concept derives from CPU design where the machine code in the arithmetic logic unit (ALU) operates on operands that are located on the stack. For example a multiply operation may take the two top operands from the stack, multiple them and place the result back on the stack. Machine language typically has two basic functions to add and remove operands from the stack; PUSH and POP. In many cpu's dsp's (digital signal processor) and machine controllers (such as that controlling a washing machine) the stack is located on the chip itself. This gives faster access to the ALU and consolidates the required functionality into a single chip.

Azodious

If the concept of stack/heap is not followed and data is loaded to random memory location OR data is stored from random memory locations ... it'll be very unstructured and unmanaged.

These concepts are used to store data in a predefined structure to improve performance, memory usage ... and hence called data structures.

Basile Starynkevitch

One can have a system working without stacks, by using continuation passing style of coding. Then call frames become continuations allocated in the garbage collected heap (the garbage collector would need some stack).

See Andrew Appel's old writings: Compiling with Continuations and Garbage Collection can be faster than Stack Allocation

(He might be a little bit wrong today because of cache issues)

MicroservicesOnDDD

I looked for "interrupt" and nobody included that as an advantage. For each device that interrupts a microcontroller or other processor, there are usually registers that are pushed onto a stack, an interrupt service routine is called, and when it is done, the registers are popped back off of the stack, and put back where they were. Then the instruction pointer is restored, and normal activity picks up where it left off, almost as if the interrupt never happened. With the stack, you can actually have several devices (theoretically) interrupt each other, and it all just works -- because of the stack.

There is also a family of stack-based languages called concatenative languages. They are all (I believe) functional languages, because the stack is an implicit parameter passed-in, and also the changed stack is an implicit return from each function. Both Forth and Factor (which is excellent) are examples, along with others. Factor has been used similar to Lua, for scripting games, and was written by Slava Pestov, a genius currently working at Apple. His Google TechTalk on youtube I have watched a few times. He talks about Boa constructors, but I'm not sure what he means ;-).

I really think that some of the current VM's, like the JVM, Microsoft's CIL, and even the one I saw was written for Lua, should be written in some of these stack-based languages, to make them portable to even more platforms. I think that these concatenative languages are somehow missing their callings as VM creation kits, and portability platforms. There is even pForth, a "portable" Forth written in ANSI C, that could be used for even more universal portability. Anybody tried to compile it using Emscripten or WebAssembly?

With the stack based languages, there is a style of code called zero-point, because you can just list the functions to be called without passing any parameters at all (at times). If the functions fit together perfectly, you would have nothing but a list of all of the zero-point functions, and that would be your application (theoretically). If you delve into either Forth or Factor, you'll see what I'm talking about.

At Easy Forth, a nice online tutorial written in JavaScript, here is a small sample (note the "sq sq sq sq" as an example of zero-point calling style):

: sq dup * ;  ok
2 sq . 4  ok
: ^4 sq sq ;  ok
2 ^4 . 16  ok
: ^8 sq sq sq sq ;  ok
2 ^8 . 65536  ok

Also, if you look at the Easy Forth web page source, you'll see at the bottom that it is very modular, written in about 8 JavaScript files.

I have spent a lot of money on just about every Forth book I could get my hands on in an attempt to assimilate Forth, but am now just beginning to grok it better. I want to give a heads up to those who come after, if you really want to get it (I found this out too late), get the book on FigForth and implement that. The commercial Forths are all too complicated, and the greatest thing about Forth is that it is possible to comprehend the whole system, from top to bottom. Somehow, Forth implements a whole development environment on a new processor, and though the need for that has seemed to pass with C on everything, it is still useful as a rite of passage to write a Forth from scratch. So, if you choose to do this, try the FigForth book -- it is several Forths implemented simultaneously on a variety of processors. A kind of Rosetta Stone of Forths.

Why do we need a stack -- efficiency, optimization, zero-point, saving registers upon interrupt, and for recursive algorithms it's "the right shape."

What is the purpose of a stack? Why do we need it?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US