What is the function of the push / pop instructions used on registers in x86 assembly?

assembly x86 terminology stack-memory stack-pointer

When reading about assembler I often come across people writing that they push a certain register of the processor and pop it again later to restore it's previous state.

How can you push a register? Where is it pushed on? Why is this needed?

Does this boil down to a single processor instruction or is it more complex?

Warning: all the current answers are given in Intel's assembly syntax; push-pop in AT&T syntax for example uses a post-fix like b, w, l, or q to denote the size of the memory being manipulated. Ex: pushl %eax and popl %eax

@hawken On most assemblers able to swallow AT&T syntax (notably gas) the size postfix can be omitted if the operand size can be deduced from the operand size. This is case for the examples you have given, as %eax is always 32 bit in size.

Linus Kleen

pushing a value (not necessarily stored in a register) means writing it to the stack.

popping means restoring whatever is on top of the stack into a register. Those are basic instructions:

push 0xdeadbeef      ; push a value to the stack
pop eax              ; eax is now 0xdeadbeef

; swap contents of registers
push eax
mov eax, ebx
pop ebx

The explicit operand for push and pop is r/m, not just register, so you can push dword [esi]. Or even pop dword [esp] to load and then store the same value back to the same address. (github.com/HJLebbink/asm-dude/wiki/POP). I only mention this because you say "not necessarily a register".

You can also pop into an area of memory: pop [0xdeadbeef]

Hi there, what is the difference between push/pop and pushq/popq? I'm on macos/intel

pushq pushes a qword (64 bits) onto stack whereas push has to infer the size from its operands. (stackoverflow.com/a/48374826/12357035)

It's only useful to push imm/pop reg for small values that fit in an 8-bit immediate. Like push 1 (2 bytes) / pop eax (1 byte) for 3 bytes total, vs. mov eax, 1 (5 bytes total, with 3 zero bytes in the imm32 so it's also a problem for shellcode). See Tips for golfing in x86/x64 machine code. Also, swapping registers that way is insane vs. xchg eax, ebx (1 byte, 3 uops on modern Intel CPUs but none of them are memory access. And only 2 uops on modern AMD).

Nate Eldredge

Here is how you push a register. I assume we are talking about x86.

push ebx
push eax

It is pushed on stack. The value of ESP register is decremented to size of pushed value as stack grows downwards in x86 systems.

It is needed to preserve the values. The general usage is

push eax           ;   preserve the value of eax
call some_method   ;   some method is called which will put return value in eax
mov  edx, eax      ;    move the return value to edx
pop  eax           ;    restore original eax

A push is a single instruction in x86, which does two things internally.

Decrement the ESP register by the size of pushed value. Store the pushed value at current address of ESP register.

Ciro Santilli Путлер Капут 六四事

Where is it pushed on?

esp - 4. More precisely:

esp gets subtracted by 4

the value is pushed to esp

pop reverses this.

The System V ABI tells Linux to make rsp point to a sensible stack location when the program starts running: What is default register state when program launches (asm, linux)? which is what you should usually use.

How can you push a register?

Minimal GNU GAS example:

.data
    /* .long takes 4 bytes each. */
    val1:
        /* Store bytes 0x 01 00 00 00 here. */
        .long 1
    val2:
        /* 0x 02 00 00 00 */
        .long 2
.text
    /* Make esp point to the address of val2.
     * Unusual, but totally possible. */
    mov $val2, %esp

    /* eax = 3 */
    mov $3, %ea 

    push %eax
    /*
    Outcome:
    - esp == val1
    - val1 == 3
    esp was changed to point to val1,
    and then val1 was modified.
    */

    pop %ebx
    /*
    Outcome:
    - esp == &val2
    - ebx == 3
    Inverses push: ebx gets the value of val1 (first)
    and then esp is increased back to point to val2.
    */

The above on GitHub with runnable assertions.

Why is this needed?

It is true that those instructions could be easily implemented via mov, add and sub.

They reason they exist, is that those combinations of instructions are so frequent, that Intel decided to provide them for us.

The reason why those combinations are so frequent, is that they make it easy to save and restore the values of registers to memory temporarily so they don't get overwritten.

To understand the problem, try compiling some C code by hand.

A major difficulty, is to decide where each variable will be stored.

Ideally, all variables would fit into registers, which is the fastest memory to access (currently about 100x faster than RAM).

But of course, we can easily have more variables than registers, specially for the arguments of nested functions, so the only solution is to write to memory.

We could write to any memory address, but since the local variables and arguments of function calls and returns fit into a nice stack pattern, which prevents memory fragmentation, that is the best way to deal with it. Compare that with the insanity of writing a heap allocator.

Then we let compilers optimize the register allocation for us, since that is NP complete, and one of the hardest parts of writing a compiler. This problem is called register allocation, and it is isomorphic to graph coloring.

When the compiler's allocator is forced to store things in memory instead of just registers, that is known as a spill.

Does this boil down to a single processor instruction or is it more complex?

All we know for sure is that Intel documents a push and a pop instruction, so they are one instruction in that sense.

Internally, it could be expanded to multiple microcodes, one to modify esp and one to do the memory IO, and take multiple cycles.

But it is also possible that a single push is faster than an equivalent combination of other instructions, since it is more specific.

This is mostly un(der)documented:

Peter Cordes mentions that techniques described at http://agner.org/optimize/microarchitecture.pdf suggest that push and pop take one single micro operation.

Johan mentions that since the Pentium M Intel uses a "stack engine", which stores precomputed esp+regsize and esp-regsize values, allowing push and pop to execute in a single uop. Also mentioned at: https://en.wikipedia.org/wiki/Stack_register

What is Intel microcode?

https://security.stackexchange.com/questions/29730/processor-microcode-manipulation-to-change-opcodes

How many CPU cycles are needed for each assembly instruction?

You don't need to guess about how push/pop decode into uops. Thanks to performance counters, experimental testing is possible, and Agner Fog has done it and published instruction tables. Pentium-M and later CPUs have single-uop push/pop thanks to the stack engine (See Agner's microarch pdf). This include recent AMD CPUs, thanks to the Intel/AMD patent-sharing agreement.

@PeterCordes awesome! So the performance counters are documented by Intel to count micro-operations?

Also, local variables spilled from regs will typically still be hot in L1 cache if any of them are actually being used. But reading from a register is effectively free, zero latency. So it's infinitely faster than L1 cache, depending on how you want to define terms. For read-only locals spilled to the stack, the main cost is just extra load uops (sometimes memory operands, sometimes with separate mov loads). For spilled non-const variables, the store-forwarding round trips are a lot of extra latency (an extra ~5c vs. forwarding directly, and the store instructions aren't cheap).

Yeah, there are counters for total uops at a few different pipeline stages (issue/execute/retire), so you can count fused-domain or unfused-domain. See this answer for example. If I was rewriting that answer now, I'd use the ocperf.py wrapper script to get easy symbolic names for the counters.

gowrath

Pushing and popping registers are behind the scenes equivalent to this:

push reg   <= same as =>      sub  $8,%rsp        # subtract 8 from rsp
                              mov  reg,(%rsp)     # store, using rsp as the address

pop  reg    <= same as=>      mov  (%rsp),reg     # load, using rsp as the address
                              add  $8,%rsp        # add 8 to the rsp

Note this is x86-64 At&t syntax.

Used as a pair, this lets you save a register on the stack and restore it later. There are other uses, too.

Yes, those sequences correctly emulate push/pop. (except push/pop don't affect flags).

You'd better use lea rsp, [rsp±8] instead of add/sub to better emulate the effect of push/pop on flags.

GJ.

Almost all CPUs use stack. The program stack is LIFO technique with hardware supported manage.

Stack is amount of program (RAM) memory normally allocated at the top of CPU memory heap and grow (at PUSH instruction the stack pointer is decreased) in opposite direction. A standard term for inserting into stack is PUSH and for remove from stack is POP.

Stack is managed via stack intended CPU register, also called stack pointer, so when CPU perform POP or PUSH the stack pointer will load/store a register or constant into stack memory and the stack pointer will be automatic decreased xor increased according number of words pushed or poped into (from) stack.

Via assembler instructions we can store to stack:

CPU registers and also constants. Return addresses for functions or procedures Functions/procedures in/out variables Functions/procedures local variables.

What is the function of the push / pop instructions used on registers in x86 assembly?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US