- In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.
Yeah, sadly the 6502 didn't allow you to do EOR A; while the Z80 did allow XOR A. If I remember correctly XOR A was AF and LD A, 0 was 3E 01[1]. So saved a whole byte! And I think the XOR was 3 clock cycles fast than the LD. So less space taken up by the instruction and faster.
I have a very distinct memory in my first job (writing x86 assembly) of the CEO walking up behind my desk and pointing out that I'd done MOV AX, 0 when I could have done XOR AX, AX.
[1] 3E 00
- Back in 2005 or 2006, I was working at a little startup with "DVD Jon" Johansen and we'd have Quake 3 tournaments to break up the monotony of reverse-engineering and juggling storage infrastructure. His name was always "xor eax,eax" and I always just had to laugh at the idea of getting zeroed out by someone with that name. (Which happened a lot -- I was good, but he was much better!)
- > Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free.
I’m familiar with 32-bit x86 assembly from writing it 10-20 years ago. So I was aware of the benefit of xor in general, but the above quote was new to me.
I don’t have any experience with 64-bit assembly - is there a guide anywhere that teaches 64-bit specifics like the above? Something like “x64 for those who know x86”?
by wildlogic
1 subcomments
- I learned this trick writing shellcode - the shellcode has to be null byte (0x00) free, or it will terminate and not progress past the null byte, since it is the string terminator. of course, when you xor something with itself, the result is zero. the byte code generated by the instruction xor eax, eax doesn't contain null bytes, whereas mov eax, 0 does.
- I remember a lot of code zeroing registrers, dating at least back from the IBM PC XT days (before the 80286).
If you decode the instruction, it makes sense to use XOR:
- mov ax, 0 - needs 4 bytes (66 b8 00 00)
- xor ax,ax - needs 3 bytes (66 31 c0)
This extra byte in a machine with less than 1 Megabyte of memory did id matter.
In 386 processors it was also
- mov eax,0 - needs 5 bytes (b8 00 00 00 00)
- xor eax,eax - needs 2 bytes (31 c0)
Here Intel made the decision to use only 2 bytes. I bet this helps both the instruction decoder and (of course) saves more memory than the old 8086 instruction.
- It's funny how machine code is a high level language nowadays, for this example the CPU recognizes the zeroing pattern and does something quite a bit different.
- I had Atari Portfolio, sold without software, except basic command.com. I could only copy raw ASCII bytes from keyboard to file. When it is in COM-file, it is directly executable. What is the simple program I made from ASCII-bytes to read bytes in hexadecimal. I thought that for a week, but finally solved it.
@grok: This is a classic and brilliant puzzle from the early days of DOS. The challenge is to write a self-displaying hex dumper
(.COM file) using only the copy con command, which means every byte of the program must correspond to a standard, typeable
ASCII character. This is a form of code-golfing that relies on a deep understanding of 8086 machine code.
The solution is a clever piece of self-modifying code. Since many essential 8086 opcodes (like MOV, INT, RET) don't have printable ASCII equivalents, the program must construct these instructions in memory and then execute them.
Here is a well-known 128-byte solution. I will first show you the ASCII characters you would type, and then explain how it works.
The Program to Type
You would have started by typing copy con hex.com and then carefully typed the following 128 characters (including spaces and case-sensitive characters) and finally pressed Ctrl+Z to save the file.
1 j!>1!QhH-@0X,jHj.Ph1PhD0Ph 0@h, j, h.0@h, h 0@h.H-@,jH1X,h.H.@,H1j j X,h.H.@,H1j j X.H-@H-@,jHj.Ph1PhE0Ph@0@h, j, h.0@h, h
0@h.
How It Works: A High-Level ExplanationThis program is a marvel of 8086 trickery. Here is a simplified breakdown of what's happening:
etc.etc
- In modern CPUs, a lot of these are recognized as zeroing idioms and they end up doing the same thing (often a register renaming trick). Using the shortest one makes sense. If you use a really weird zeroing pattern, you can also see it as a backend uop while many of these zeroing idioms are elided by the frontend on some cores.
- Matt Godbolt also uploads to his self titled Youtube channel: https://www.youtube.com/watch?v=eLjZ48gqbyg
by omnicognate
3 subcomments
- It happens to be the first instruction of the first snippet in the wonderful xchg rax,rax.
https://www.xorpd.net/pages/xchg_rax/snip_00.html
by charles_f
1 subcomments
- > By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero
Meanwhile, most "apps" we get nowadays contain half of npmjs neatly bundled in electron. I miss the days when default was native and devs had constraints to how big their output could be.
- > In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.
I had no idea this happened. Talk about a fascinating bit of X86 trivia! Do other architectures do this too? I'd imagine so, but you never know.
by vanderZwan
0 subcomment
- > In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.
Meanwhile, people like me who got started with a Z80 instead immediately knew why, since XOR A is the smallest and fastest way to clear the accumulator and flag register. Funny how that also shows how specific this is to a particular CPU lineage or its offshoots.
by flustercan
0 subcomment
- As a longtime developer currently perusing their first computer science degree, it makes me happy that I understood this article. Nearly makes all the trouble seem worth it.
- I'd like to learn about the earliest pronunciations of these instructions. Only because watching a video earlier, I heard "MOV" pronounced "MAUV" not "MOVE"
Not sure exactly how I could dig up pronunciations, except finding the oldest recordings
- > It gets better though! Since this is a very common operation, x86 CPUs spot this “zeroing idiom” early in the pipeline and can specifically optimise around it: the out-of-order tracking systems knows that the value of “eax” (or whichever register is being zeroed) does not depend on the previous value of eax, so it can allocate a fresh, dependency-free zero register renamer slot.
While this is probably true ("probably" because I haven't checked it myself, but it makes sense), the CPU could do the exact same thing for "mov eax, 0", couldn't it? (Does it?)
- I'm building a gameboy emulator and when I was debugging the boot ROM I noticed there was the instruction `xor A` (which xor's a with itself). I was wondering why they chose such a weird way to set A to 0. Now it makes sense -- since the boot ROM is only 256 bytes, they really needed to conserve space! Thanks for this, looking forward to the rest of the series!
- In some older IBM-built processors (channel controllers, the various iterations of the CSP), an xor of something against itself also had the effect of safely clearing a stored bad parity without triggering a parity check from reading the operand. You would see strategic clearing in this manner done by system software or firmware during error recovery or early initialization.
by ternaryoperator
0 subcomment
- The origin AFAIK stems from the mainframe days. When using BAL (the assembly language for the IBM/360 family and its descendants), xoring was faster than moving 0 to the variable. Many of the early devs who wrote assembly for PCs came from mainframe backgrounds and so the idiom was carried over.
- similarly IIRC, on (some generations of) x86 chips, NOP is sugar around `XCHG EAX, EAX` which is effectively a do-nothing operation
- It’s not just about code size or cycle count anymore; modern OoO (Out-of-Order) processors treat this idiomatically. The renamer recognizes xor reg, reg as a dependency-breaking zeroing idiom immediately, which frees up the physical register allocation faster than a mov. It's fascinating how hardware optimization has effectively leaked into the instruction set definition over time.
- What a great article! When the author mentionned "showing-off", that's what I thought at first, I mean, most of us have the "why not spend 2 hours trying to figure it out when you can read the manual for 2 minute" kind of mind-set, which is similar to the "why not make it really complex if we can make it simple".
But no, it's actually a really smart idea!!
by kwertyoowiyop
1 subcomments
- In this thread, we have found all the programmers born before 1975!
- Because "sub eax,eax" looks stupid. (and also clears the carry flag, unlike "xor eax, eax")
by Quitschquat
0 subcomment
- At some point I could disassemble 8086 (16 bit x86/real mode) as a kid. Byte sequences like 31 C9 or 31 C0 were a sure way to know if a loop of some kind was being initialized. Even simple compilers at the time made the mov xx, 0 → xor xx, xx optimization.
by kstrauser
1 subcomments
- Why wasn't that a standard assembler macro, like ZEROAX or something? It seems to come up enough that it seems like there'd be a common shortcut for it.
(Not suggesting it should be. Maybe that's a terrible idea, but I don't know why.)
by HackerThemAll
2 subcomments
- > Interestingly, when zeroing the “extended” numbered registers (like r8), GCC still uses the d (double width, ie 32-bit) variant.
Of course. I might have some data stored in the higher dword of that register.
by flohofwoe
1 subcomments
- The actually surprising part to me is that such an important instruction uses a two byte encoding instead of one byte :)
- The page crashes after 3 seconds, 100% of the time, on the latest version of Android Chrome and works fine on Brave, fyi.
by JuniperMesos
0 subcomment
- > In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.
Huh, news to me. Although the amount of x86-64 assembly programming I've personally done is extremely minimal. Frankly, this is exactly the sort of architecture-specific detail I'm happy to let an ASM-generating library know for me rather than know myself.
- Back when I did IBM 370 BAL Assembly Language, we did the same thing to clear a register to zero.
XR 15,15 XOR REGISTER 15 WITH REGISTER 15
vs L 15,=F'0' LOAD REGISTER 15 WITH 0
This was alleged to be faster on the 370 because because XR operated entirely within the CPU registers, and L (Load) fetched data from memory (i.e.., the constant came from program memory).
by silverfrost
0 subcomment
- Back on the Z80 'xor a' is the shortest sequence to zero A
by BiraIgnacio
0 subcomment
- Also cool this got at the top item on the HN front page
- My brain read this is "Why not ear wax?"
- I've wrote a lot of `xor al,al` in my youth.
- No RSS? I want to subscribe :'(
- Because mov eax, 0 requires fetching a constant and prolongs instruction fetching/execution. XOR A was a trick I learned back in the Z80 days.
- Because, unlike RISC-V, x86 has no x0 register.
- Remnant of RISC attempt without a zero register.
by tony-john12
0 subcomment
- [dead]