ChatGPT解决这个技术问题 Extra ChatGPT

What is the purpose of XORing a register with itself? [duplicate]

This question already has an answer here: What is the best way to set a register to zero in x86 assembly: xor, mov or and? (1 answer) Closed 6 years ago.

xor eax, eax will always set eax to zero, right? So, why does MSVC++ sometimes put it in my executable's code? Is it more efficient that mov eax, 0?

012B1002  in          al,dx 
012B1003  push        ecx  
    int i = 5;
012B1004  mov         dword ptr [i],5 
    return 0;
012B100B  xor         eax,eax 

Also, what does it mean to do in al, dx?

It's very unlikely that the MSVC++ compiler actually emits an"in" instruction. You're probably disassembling at a wrong address / wrong alignment.
I'm just using the disassembler when in debug mode.
Yes, the real instructions starts a few bytes earlier. There is no C-equivalent of the "in" instruction, and reading from a 16 bit I/O port and overwriting the result a few instructions later is a very unlikely generated instruction sequence.
A very very similar question: stackoverflow.com/questions/1135679/…
An interesting tips&tricks document from the past and recently emerged is "86fun.doc" from the MS WinWord 1.1 Source (computerhistory.org/_static/atchm/…). The file is located in 'OpusEtAl\cashmere\doc' and describes "best/fast pratices" of assembler programming, also mentioning the xor bx,bx practice.

G
Gunther Piez

Yes, it is more efficient.

The opcode is shorter than mov eax, 0, only 2 bytes, and the processor recognizes the special case and treats it as a mov eax, 0 without a false read dependency on eax, so the execution time is the same.


"processor regonizes the special case and treats it as a "mov eax,0" without a false read dependency on eax, so the execution time is the same" The processor actually does even better: it just executes a register rename internally, and doesn't even do anything at all with eax.
Actually, in the big picture it's faster. There are fewer bytes that have to be fetched from RAM.
preventing generate null byte opcode also ;) by doing xor eax, eax
@Gunslinger_ Writing shellcode 101 :)
in modern architectures xor will be faster because the register is set to zero at the rename stage without using any execution unit stackoverflow.com/a/18027854/995714
G
GDP2

Also to avoid 0s when compiled as used on shell codes for exploitation of buffer overflows, etc. Why avoid the 0 ? Well, 0 represents the end of string in c/c++ and the shell code would be truncated if the mean of exploitation is a string processing function or the like.

Btw im referring to the original question: "Any reason to do a “xor eax, eax”?" not what the MSVC++ compiler does.

Since there's some debate in the comments about how this is pertinent in the real world, see this article and this section on Wikipedia.


This sounds like nonsense to me. There are bound to be zero bytes somewhere in your code, so I don't see how one more would make much difference. Anyway, who cares if you can trick a program into reading code as data. The real problem is executing data as code.
Who cares? Hackers do, and apparently most of the computer security related industry. Please educate yourself before voting down on something. You can find more references here [The Art of Exploitation - Chapter 0x2a0][1] as well as sample shell code that doesn't contain 0s. [1] [books.google.es/…
I don't know why this gets downvoted so many times, wtf. Down voters, please educate yourself about this MOST BASIC TECHNIQUE/KNOWLEDGE in shellcodes before downvoting.
@kizzx2 probably because no one here has explained how a string was being parsed in the .text segment. I also can't see how terminating a string somehow allows someone to move the .data segment to mirror the .text segment to modify anything in the first place. Please be more specific than "MOST BASICIST TECHNIQUE"
@kizzx2 could you please give an explanation as to how having null bytes in your program's instruction segment makes it more easily exploited. Having null bytes only affects string parsing, as far as I know, nothing parses the instruction segment of a program. Please explain, not quote some irrelevance about using msvc++ or not
K
Konrad Rudolph

xor eax, eax is a faster way of setting eax to zero. This is happening because you're returning zero.

The in instruction is doing stuff with I/O ports. Basically reading a word of data from the port specified dx in and storing it in al. It's not clear why it is happening here. Here's a reference that seems to explain it in detail.


"The in instruction is doing stuff with I\O ports". But in this case, it is probably an "artifact" caused by the debugger starting disassembly in the middle of an instruction.
I agree. But still, that's what it does.
@Abel Why did you rename all the mnemonics and register names? That’s unconventional to say the least. As you can see in OP’s code, most modern assemblers and disassemblers use all-lowercase spelling.
@Konrad I stand corrected. My asm books, including the processor references of Intel (all > 5yrs old), (EDIT: and apparently Wikipedia), use uppercase only, wasn't aware this convention was changed.
@abelmar also uppercase is not allowed in AT&T syntax if I remember correctly
Q
Quonux

Another reason to use XOR reg, reg or XORPS reg, reg is to break dependency chains, this allows the CPU to optimize the parallel execution of the assembly commands more efficiently (even it it adds some more instruction throughput preasure).


That gives it an advantage over AND reg, 0, but not over MOV reg, 0. Dep-chain breaking is a special case for xor, but always the case for mov. It doesn't get mentioned, leading to occasional confusion from people thinking that mov has a false dependency on the old value being overwritten. But of course it doesn't.
any refernce for this, I dont know the ist of dep breakers on top of my head.
Everything that overwrites the destination without depending on it breaks the dep chain. e.g. every 3-operand op dest, src1, src2 instruction (e.g. VPSHUFB dest, src1, src2 or lea eax, [rbx + 2*rdx]) breaks the dep chain on the old value of dest. It's only notable when there's a false dependency on the old value: like mov ax, bx, which (on AMD/Silvermont/P4, but not P6/SnB) has a false dep on the old value of eax, even if you never read eax. On Intel, the big notable one is that popcnt/lzcnt/tzcnt have a false dep on their output
Of course, mov ax, bx / mov [mem], eax has a dependency on the previous value of eax, but it's not a false dependency. You're actually using those bits, so it's a true dependency.
@LewisKelsey: See my answer on the linked duplicate (What is the best way to set a register to zero in x86 assembly: xor, mov or and?) for the full details, including the early P6-family stuff. There are a few AMD CPUs like IIRC Bulldozer family where mov reg,imm can run on AGU ports as well as ALU where mov could have a back-end throughput advantage over xor zeroing for some surrounding code. But compilers always just use xor-zeroing when tuning for anything, and IMO that's the correct decision.
w
weiji

The XOR operation is indeed very fast. If the result is to set a register to zero, the compiler will often do it the fastest way it knows. A bit operation like XOR might take only one CPU cycle, whereas a copy (from one register to another) can take a small handful.

Often compiler writers will even have different behaviors given different target CPU architectures.


J
James Foote

from the OP > any reason to do "xor eax,eax" return 0; 012B100B xor eax,eax ret <-- OP doesn't show this

The XOR EAX,EAX simply 0's out the EAX register, it executes faster than a MOV EAX,$0 and doesn't need to fetch immediate data of 0 to load into eax

It's very obvious this is the "return 0" that MSVC is optimizing EAX is the register used to return a value from a function in MSVC


This answer doesn't add anything that isn't in the other answers. And yes, all the x86 ABIs use eax / rax as the register for return values. Also, immediate data doesn't have to be fetched, other than as a pre-requisite for instruction decoding. xor is shorter than mov, and even leaves more spare space in the uop cache line it's in, but neither of those effects are well described as "not having to fetch".
r
rio22

xor is often used to encrypt a code for example

      mov eax,[ecx+ValueHere]
      xor eax,[ecx+ValueHere]
      mov [ebx+ValueHere],esi
      xor esi,[esp+ValueHere]
      pop edi
      mov [ebx+ValueHere],esi

The XOR instruction connects two values using logical exclusive OR remember OR uses inclusive OR To understand XOR better, consider those two binary values:

      1001010110
      0101001101

If you OR them, the result is 1100011011 When two bits on top of each other are equal, the resulting bit is 0. Else the resulting bit is 1. You can use calc.exe to calculate XOR.


Sure but that's not what the question is about, it's about "xor with itself"