For me, it just seems like a funky MOV. What's its purpose and when should I use it?
As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:
struct Point
{
int xcoord;
int ycoord;
};
Now imagine a statement like:
int y = points[i].ycoord;
where points[]
is an array of Point
. Assuming the base of the array is already in EBX
, and variable i
is in EAX
, and xcoord
and ycoord
are each 32 bits (so ycoord
is at offset 4 bytes in the struct), this statement can be compiled to:
MOV EDX, [EBX + 8*EAX + 4] ; right side is "effective address"
which will land y
in EDX
. The scale factor of 8 is because each Point
is 8 bytes in size. Now consider the same expression used with the "address of" operator &:
int *p = &points[i].ycoord;
In this case, you don't want the value of ycoord
, but its address. That's where LEA
(load effective address) comes in. Instead of a MOV
, the compiler can generate
LEA ESI, [EBX + 8*EAX + 4]
which will load the address in ESI
.
From the "Zen of Assembly" by Abrash:
LEA, the only instruction that performs memory addressing calculations but doesn't actually address memory. LEA accepts a standard memory addressing operand, but does nothing more than store the calculated memory offset in the specified register, which may be any general purpose register. What does that give us? Two things that ADD doesn't provide: the ability to perform addition with either two or three operands, and the ability to store the result in any register; not just one of the source operands.
And LEA
does not alter the flags.
Examples
LEA EAX, [ EAX + EBX + 1234567 ] calculates EAX + EBX + 1234567 (that's three operands)
LEA EAX, [ EBX + ECX ] calculates EBX + ECX without overriding either with the result.
multiplication by constant (by two, three, five or nine), if you use it like LEA EAX, [ EBX + N * EBX ] (N can be 1,2,4,8).
Other usecase is handy in loops: the difference between LEA EAX, [ EAX + 1 ]
and INC EAX
is that the latter changes EFLAGS
but the former does not; this preserves CMP
state.
LEA EAX, [ EAX + EBX + 1234567 ]
calculates the sum of EAX
, EBX
and 1234567
(that's three operands). LEA EAX, [ EBX + ECX ]
calculates EBX + ECX
without overriding either with the result. The third thing LEA
is used for (not listed by Frank) is multiplication by constant (by two, three, five or nine), if you use it like LEA EAX, [ EBX + N * EBX ]
(N
can be 1,2,4,8). Other usecase is handy in loops: the difference between LEA EAX, [ EAX + 1 ]
and INC EAX
is that the latter changes EFLAGS
but the former does not; this preserves CMP
state
LEA
can be used for... (see "LEA (load effective address) is often used as a "trick" to do certain computations" in IJ Kennedy's popular answer above)
LEA EAX, [EBX + ECX]
EAX will contain 8. And after LEA EAX, [EBX + ECX + 2]
EAX will contain 10.
Another important feature of the LEA
instruction is that it does not alter the condition codes such as CF
and ZF
, while computing the address by arithmetic instructions like ADD
or MUL
does. This feature decreases the level of dependency among instructions and thus makes room for further optimization by the compiler or hardware scheduler.
lea
is sometimes useful for the compiler (or human coder) to do math without clobbering a flag result. But lea
isn't faster than add
. Most x86 instructions write flags. High-performance x86 implementations have to rename EFLAGS or otherwise avoid the write-after-write hazard for normal code to run fast, so instructions that avoid flag writes aren't better because of that. (partial flag stuff can create issues, see INC instruction vs ADD 1: Does it matter?)
Despite all the explanations, LEA is an arithmetic operation:
LEA Rt, [Rs1+a*Rs2+b] => Rt = Rs1 + a*Rs2 + b
It's just that its name is extremelly stupid for a shift+add operation. The reason for that was already explained in the top rated answers (i.e. it was designed to directly map high level memory references).
LEA
on the AGUs but on the ordinary integer ALUs. One has to read the CPU specs very closely these days to find out "where stuff runs" ...
LEA
gives you the address which arises from any memory-related addressing mode. It is not a shift and add operation.
Maybe just another thing about LEA instruction. You can also use LEA for fast multiplying registers by 3, 5 or 9.
LEA EAX, [EAX * 2 + EAX] ;EAX = EAX * 3
LEA EAX, [EAX * 4 + EAX] ;EAX = EAX * 5
LEA EAX, [EAX * 8 + EAX] ;EAX = EAX * 9
LEA EAX, [EAX*3]
?
shl
instruction for multiplying registers by 2,4,8,16... it is faster and shorter. But for multiplying with numbers different of power of 2 we normaly use mul
instruction which is more pretentious and slower.
lea eax,[eax*3]
would translate to equivalent of lea eax,[eax+eax*2]
.
lea
is an abbreviation of "load effective address". It loads the address of the location reference by the source operand to the destination operand. For instance, you could use it to:
lea ebx, [ebx+eax*8]
to move ebx
pointer eax
items further (in a 64-bit/element array) with a single instruction. Basically, you benefit from complex addressing modes supported by x86 architecture to manipulate pointers efficiently.
The biggest reason that you use LEA
over a MOV
is if you need to perform arithmetic on the registers that you are using to calculate the address. Effectively, you can perform what amounts to pointer arithmetic on several of the registers in combination effectively for "free."
What's really confusing about it is that you typically write an LEA
just like a MOV
but you aren't actually dereferencing the memory. In other words:
MOV EAX, [ESP+4]
This will move the content of what ESP+4
points to into EAX
.
LEA EAX, [EBX*8]
This will move the effective address EBX * 8
into EAX, not what is found in that location. As you can see, also, it is possible to multiply by factors of two (scaling) while a MOV
is limited to adding/subtracting.
LEA
does.
The 8086 has a large family of instructions that accept a register operand and an effective address, perform some computations to compute the offset part of that effective address, and perform some operation involving the register and the memory referred to by the computed address. It was fairly simple to have one of the instructions in that family behave as above except for skipping that actual memory operation. Thus, the instructions:
mov ax,[bx+si+5]
lea ax,[bx+si+5]
were implemented almost identically internally. The difference is a skipped step. Both instructions work something like:
temp = fetched immediate operand (5)
temp += bx
temp += si
address_out = temp (skipped for LEA)
trigger 16-bit read (skipped for LEA)
temp = data_in (skipped for LEA)
ax = temp
As for why Intel thought this instruction was worth including, I'm not exactly sure, but the fact that it was cheap to implement would have been a big factor. Another factor would have been the fact that Intel's assembler allowed symbols to be defined relative to the BP
register. If fnord
was defined as a BP
-relative symbol (e.g. BP+8
), one could say:
mov ax,fnord ; Equivalent to "mov ax,[BP+8]"
If one wanted to use something like stosw
to store data to a BP-relative address, being able to say
mov ax,0 ; Data to store
mov cx,16 ; Number of words
lea di,fnord
rep movs fnord ; Address is ignored EXCEPT to note that it's an SS-relative word ptr
was more convenient than:
mov ax,0 ; Data to store
mov cx,16 ; Number of words
mov di,bp
add di,offset fnord (i.e. 8)
rep movs fnord ; Address is ignored EXCEPT to note that it's an SS-relative word ptr
Note that forgetting the world "offset" would cause the contents of location [BP+8]
, rather than the value 8, to be added to DI
. Oops.
The LEA (Load Effective Address) instruction is a way of obtaining the address which arises from any of the Intel processor's memory addressing modes.
That is to say, if we have a data move like this:
MOV EAX, <MEM-OPERAND>
it moves the contents of the designated memory location into the target register.
If we replace the MOV
by LEA
, then the address of the memory location is calculated in exactly the same way by the <MEM-OPERAND>
addressing expression. But instead of the contents of the memory location, we get the location itself into the destination.
LEA
is not a specific arithmetic instruction; it is a way of intercepting the effective address arising from any one of the processor's memory addressing modes.
For instance, we can use LEA
on just a simple direct address. No arithmetic is involved at all:
MOV EAX, GLOBALVAR ; fetch the value of GLOBALVAR into EAX
LEA EAX, GLOBALVAR ; fetch the address of GLOBALVAR into EAX.
This is valid; we can test it at the Linux prompt:
$ as
LEA 0, %eax
$ objdump -d a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 8d 04 25 00 00 00 00 lea 0x0,%eax
Here, there is no addition of a scaled value, and no offset. Zero is moved into EAX. We could do that using MOV with an immediate operand also.
This is the reason why people who think that the brackets in LEA
are superfluous are severely mistaken; the brackets are not LEA
syntax but are part of the addressing mode.
LEA is real at the hardware level. The generated instruction encodes the actual addressing mode and the processor carries it out to the point of calculating the address. Then it moves that address to the destination instead of generating a memory reference. (Since the address calculation of an addressing mode in any other instruction has no effect on CPU flags, LEA
has no effect on CPU flags.)
Contrast with loading the value from address zero:
$ as
movl 0, %eax
$ objdump -d a.out | grep mov
0: 8b 04 25 00 00 00 00 mov 0x0,%eax
It's a very similar encoding, see? Just the 8d
of LEA
has changed to 8b
.
Of course, this LEA
encoding is longer than moving an immediate zero into EAX
:
$ as
movl $0, %eax
$ objdump -d a.out | grep mov
0: b8 00 00 00 00 mov $0x0,%eax
There is no reason for LEA
to exclude this possibility though just because there is a shorter alternative; it's just combining in an orthogonal way with the available addressing modes.
As the existing answers mentioned, LEA
has the advantages of performing memory addressing arithmetic without accessing memory, saving the arithmetic result to a different register instead of the simple form of add instruction. The real underlying performance benefit is that modern processor has a separate LEA ALU unit and port for effective address generation (including LEA
and other memory reference address), this means the arithmetic operation in LEA
and other normal arithmetic operation in ALU could be done in parallel in one core.
Check this article of Haswell architecture for some details about LEA unit: http://www.realworldtech.com/haswell-cpu/4/
Another important point which is not mentioned in other answers is LEA REG, [MemoryAddress]
instruction is PIC (position independent code) which encodes the PC relative address in this instruction to reference MemoryAddress
. This is different from MOV REG, MemoryAddress
which encodes relative virtual address and requires relocating/patching in modern operating systems (like ASLR is common feature). So LEA
can be used to convert such non PIC to PIC.
lea
on one or more of the same ALUs that execute other arithmetic instructions (but generally fewer of them than other arithmetic). For instance, the Haswell CPU mentioned can execute add
or sub
or most other basic arithmetic operations on four different ALUs, but can only execute lea
on one (complex lea
) or two (simple lea
). More importantly, those two lea
-capable ALUs are simply two of the four that can execute other instructions, so there is no parallelism benefit as claimed.
The LEA instruction can be used to avoid time consuming calculations of effective addresses by the CPU. If an address is used repeatedly it is more effective to store it in a register instead of calculating the effective address every time it is used.
[esi]
is rarely cheaper than say [esi + 4200]
and is only rarely cheaper than [esi + ecx*8 + 4200]
.
[esi]
isn't cheaper than [esi + ecx*8 + 4200]
. But why bother comparing? They are not equivalent. If you want the former to designate the same memory location as the latter, you need additional instructions: you have to add to esi
the value of ecx
multiplied by 8. Uh oh, multiplication is going to clobber your CPU flags! Then you have to add the 4200. These additional instructions add to the code size (taking up space in the instruction cache, cycles to fetch).
[esi + 4200]
repeatedly in a sequence of instructions, then it is better to first load the effective address into a register and use that. For example, rather than writing add eax, [esi + 4200]; add ebx, [esi + 4200]; add ecx, [esi + 4200]
, you should prefer lea edi, [esi + 4200]; add eax, [edi]; add ebx, [edi]; add ecx, [edi]
, which is rarely faster. At least that's the plain interpretation of this answer.
[esi]
and [esi + 4200]
(or [esi + ecx*8 + 4200]
is that this is the simplification the OP is proposing (as I understand it): that N instructions with the same complex address are transformed into N instructions with simple (one reg) addressing, plus one lea
, since complex addressing is "time consuming". In fact, it is slower even on modern x86, but only latency-wise which seems unlikely to matter for consecutive instructions with the same address.
lea
so it increases pressure in that case. In general, storing intermediates is a cause of register pressure, not a solution to it - but I think in most situations it is a wash. @Kaz
It seems that lots of answers already complete, I'd like to add one more example code for showing how the lea and move instruction work differently when they have the same expression format.
To make a long story short, lea instruction and mov instructions both can be used with the parentheses enclosing the src operand of the instructions. When they are enclosed with the (), the expression in the () is calculated in the same way; however, two instructions will interpret the calculated value in the src operand in a different way.
Whether the expression is used with the lea or mov, the src value is calculated as below.
D ( Rb, Ri, S ) => (Reg[Rb]+S*Reg[Ri]+ D)
However, when it is used with the mov instruction, it tries to access the value pointed to by the address generated by the above expression and store it to the destination.
In contrast of it, when the lea instruction is executed with the above expression, it loads the generated value as it is to the destination.
The below code executes the lea instruction and mov instruction with the same parameter. However, to catch the difference, I added a user-level signal handler to catch the segmentation fault caused by accessing a wrong address as a result of mov instruction.
Example code
#define _GNU_SOURCE 1 /* To pick up REG_RIP */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <signal.h>
uint32_t
register_handler (uint32_t event, void (*handler)(int, siginfo_t*, void*))
{
uint32_t ret = 0;
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO;
ret = sigaction(event, &act, NULL);
return ret;
}
void
segfault_handler (int signum, siginfo_t *info, void *priv)
{
ucontext_t *context = (ucontext_t *)(priv);
uint64_t rip = (uint64_t)(context->uc_mcontext.gregs[REG_RIP]);
uint64_t faulty_addr = (uint64_t)(info->si_addr);
printf("inst at 0x%lx tries to access memory at %ld, but failed\n",
rip,faulty_addr);
exit(1);
}
int
main(void)
{
int result_of_lea = 0;
register_handler(SIGSEGV, segfault_handler);
//initialize registers %eax = 1, %ebx = 2
// the compiler will emit something like
// mov $1, %eax
// mov $2, %ebx
// because of the input operands
asm("lea 4(%%rbx, %%rax, 8), %%edx \t\n"
:"=d" (result_of_lea) // output in EDX
: "a"(1), "b"(2) // inputs in EAX and EBX
: // no clobbers
);
//lea 4(rbx, rax, 8),%edx == lea (rbx + 8*rax + 4),%edx == lea(14),%edx
printf("Result of lea instruction: %d\n", result_of_lea);
asm volatile ("mov 4(%%rbx, %%rax, 8), %%edx"
:
: "a"(1), "b"(2)
: "edx" // if it didn't segfault, it would write EDX
);
}
Execution result
Result of lea instruction: 14
inst at 0x4007b5 tries to access memory at 14, but failed
=d
to tell the compiler the result is in EDX, saving a mov
. You also left out an early-clobber declaration on the output. This does demonstrate what you're trying to demonstrate, but is also a misleading bad example of inline asm that will break if used in other contexts. That's a Bad Thing for a stack overflow answer.
%%
on all those register names in Extended asm, then use input constraints. like asm("lea 4(%%ebx, %%eax, 8), %%edx" : "=d"(result_of_lea) : "a"(1), "b"(2));
. Letting the compiler init registers means you don't have to declare clobbers, either. You're overcomplicating things by xor-zeroing before mov-immediate overwrites the whole register, too.
mov 4(%ebx, %eax, 8), %edx
is invalid? Anyway, yes, for mov
it would make sense to write "a"(1ULL)
to tell the compiler you have a 64-bit value, and thus it needs to make sure it's extended to fill the whole register. In practice it will still use mov $1, %eax
, because writing EAX zero-extends into RAX, unless you have a weird situation of surrounding code where the compiler knew that RAX = 0xff00000001
or something. For lea
, you're still using 32-bit operand-size, so the any stray high bits in input registers have no effect on the 32-bit result.
Here is an example.
// compute parity of permutation from lexicographic index
int parity (int p)
{
assert (p >= 0);
int r = p, k = 1, d = 2;
while (p >= k) {
p /= d;
d += (k << 2) + 6; // only one lea instruction
k += 2;
r ^= p;
}
return r & 1;
}
With -O (optimize) as compiler option, gcc will find the lea instruction for the indicated code line.
LEA : just an "arithmetic" instruction..
MOV transfers data between operands but lea is just calculating
mov eax, offset GLOBALVAR
instead. You can use LEA, but it's slightly larger code-size than mov r32, imm32
and runs on fewer ports, because it still goes through the address-calculation process. lea reg, symbol
is only useful in 64-bit for a RIP-relative LEA, when you need PIC and/or addresses outside the low 32 bits. In 32 or 16-bit code, there is zero advantage. LEA is an arithmetic instruction that exposes the ability of the CPU to decode / compute addressing modes.
imul eax, edx, 1
doesn't calculate: it just copies edx to eax. But actually it runs your data through the multiplier with 3 cycle latency. Or that rorx eax, edx, 0
just copies (rotate by zero).
LEA vs MOV (reply to the original question)
LEA
is not a funky MOV
. When you use MOV
, it calculates the address and accesses the memory. LEA
just calculates the address, it doesn't actually access memory. This is the difference.
In 8086 and later, LEA
just sets a sum of up to two source registers and an immediate value to a destination register. For example, lea bp, [bx+si+3]
sets to the bp register the sum of bx
plus si
plus 3. You cannot achieve this calculation to save the result to a register with MOV
.
The 80386 processor introduced a series of scaling modes, in which the index register value can be multiplied by a valid scaling factor to obtain the displacement. The valid scale factors are 1, 2, 4, and 8. Therefore, you can use instructions like lea ebp, [ebx+esi*8+3]
.
LDS & LES (optional further reading)
In contrast to LEA
, there are instructions LDS
and LES
, that, to the contrary, load values from memory to the pair of registers: one segment register (DS
or ES
) and one general register. There are also versions for the other registers: LFS
, LGS
and LSS
for FS
, GS
and SS
segment registers, respectively (introduced in 80386).
So, these instructions load "far" pointer - a pointer consisting of a 16-bit segment selector and a 16-bit (or a 32-bit, depending on the mode) offset, so the total far pointer size was 32-bit in 16-bit mode and 48-bit in 32-bit mode.
These are handy instructions for 16-bit mode, be it 16-bit real mode or 16-bit protected mode.
Under 32-bit mode, there is no need in these instructions since OSes set all segment bases to zero (flat memory model), so there is no need to load segment registers. We just use 32-bit pointers, not 48.
Under 64-bit modes, these instructions are not implemented. Their opcodes give access violation interrupt (exception). Since Intel's implementation of VEX - "vector extensions - (AVX), Intel took their opcodes of LDS
and LES
and started using them for VEX prefixes. As Peter Cordes pointed out, that is why only x/ymm0..7 are accessible in 32-bit mode (quote): "the VEX prefixes were carefully designed to only overlap with invalid encodings of LDS and LES in 32-bit mode, where R̅ X̅ B̅ are all 1. That's why some of the bits are inverted in VEX prefixes".
[bx*2+si+3]
isn't a valid 16-bit addressing mode. 16-bit doesn't allow any scale factors. lea bp, [ebx*2 + esi + 3]
would be legal, though, in 16-bit mode on a 386 or later. (Normally you write the base first and then the scaled-index, but assemblers would accept that.)
All normal "calculating" instructions like adding multiplication, exclusive or set the status flags like zero, sign. If you use a complicated address, AX xor:= mem[0x333 +BX + 8*CX]
the flags are set according to the xor operation.
Now you may want to use the address multiple times. Loading such an addres into a register is never intended to set status flags and luckily it doesn't. The phrase "load effective address" makes the programmer aware of that. That is where the weird expression comes from.
It is clear that once the processor is capable of using the complicated address to process its content, it is capable of calculating it for other purposes. Indeed it can be used to perform a transformation x <- 3*x+1
in one instruction. This is a general rule in assembly programming: Use the instructions however it rocks your boat. The only thing that counts is whether the particular transformation embodied by the instruction is useful for you.
Bottom line
MOV, X| T| AX'| R| BX|
and
LEA, AX'| [BX]
have the same effect on AX but not on the status flags. (This is ciasdis notation.)
call lbl
lbl: pop rax
technically "working" as a way to get the value of rip
, but you'll make branch prediction very unhappy. Use the instructions however you want, but don't be surprised if you do something tricky and it has consequences you didn't foresee
Forgive me if someone already mentioned, but in case anyone's wondering about the bad old days of x86 when memory segmentation was still relevant: you will always get the same results from these two instructions:
LEA AX, DS:[0x1234]
and
LEA AX, CS:[0x1234]
The "effective address" is just the offset part of the seg:off logical address. In this case, 0x1234.
LEA does not add the segment base. That would defeat one of the original use-cases, for doing address math to get a pointer (offset) you could actually dereference. Such as lea bx, [array + si]
. If that added DS base to give a linear address, a later mov ax, [bx]
would add the DS base again.
Also, the 20-bit result would often not fit in a 16-bit register.
See https://www.stevemorse.org/8086/index.html - the architect of 8086 wrote a book about the instruction set, and it's now free on his web site. The section on LEA mentions some of his design intent.
seg:off
pair. LEA isn't affected by the segment base; both those instructions will (inefficiently) put 0x1234
into AX. x86 unfortunately doesn't have an easy way to calculate a full linear address (effective + segment base) into a register or register-pair.
Success story sharing
mov
instruction and leave off the brackets?MOV EDX, EBX + 8*EAX + 4
MOV
with an indirect source, except it only does the indirection and not theMOV
. It doesn't actually read from the computed address, just computes it.