0802 | x86 Assembly Crash Course
Malware Analysis | x86 Assembly Crash Course | Summary:
The room discusses various aspects of x86 assembly language programming, covering essential concepts such as opcodes and operands, general assembly instructions, arithmetic and logical instructions, conditionals, and branching instructions.
It also includes some warnings about the use of these instructions in real-world scenarios, particularly related to shellcode injection.
Disclaimer: Please note that this write-up is NOT intended to replace the original room or its content, but rather serve as supplementary material for those who are stuck and need additional guidance.
Learning Objectives
- Opcodes and operands
- General assembly instructions
- Arithmetic and logical instructions
- Conditionals
- Branching instructions
1 | Introduction
Learning basic Assembly language is crucial for malware reverse engineering, as most malware samples are compiled binaries that cannot be directly viewed in their original C/C++ or other language code.
Decompiling these binaries using a decompiler or disassembler often removes important information, making Assembly language the most reliable and human-readable code to analyze, particularly when trying to understand what a binary is doing.
2 | Opcodes and Operands
Computer code is typically stored as binary (1s and 0s) or hexadecimal (hex) format on disk, which humans can't easily understand. Binary/hex code consists of instructions (opcodes) and data (operands), where opcodes tell the CPU what to do and operands specify what's being acted upon.
When disassembling a program, opcodes are translated into human-readable assembly language instructions, like mov eax, 0x5f
. These instructions typically have three parts: an instruction (mov
), and two operands (e.g. eax
and 0x5f
). There are three types of operands:
- Immediate Operands | fixed values like
0x5f
. - Registers | like
eax
, where data is stored. - Memory Operands | referenced by square brackets, e.g.
[eax]
, indicating a memory location to operate on.
Quote: "Please note that due to endianness, the operand 0x5f
is written as 5f 00 00 00
, which is actually 00 00 00 5f
but in little-endian notation."
Understanding these concepts helps reveal how code is executed at the CPU level, and is essential for malware analysis and reverse engineering.
3 | General Instructions
Instructions that Interact with Registers and Memory
-
MOV
| Moves data from one location (register, memory, or immediate operand) to another.mov destination, source
- The source will NOT be changed.
- Memory to memory data movement is NOT allowed.
- Examples
mov eax, 0x5f
| Move fixed value to registermov ebx, eax
| Move value from register to another registermov eax, [0x5fc53e]
| Move value from memory location to register
- Examples
-
LEA
| Loads effective address (address of source into destination).lea destination, source
- Example | Calculate
ebp+4
and move result to eax:lea eax, [ebp+4]
Other Instructions
NOP
| No operation (exchanges value in eax with itself).nop
- Used for consuming CPU cycles or redirecting execution.
SHIFT
| Shift each register bit to the adjacent bit and move overflowing bit into carry flag or add zeroes.- Shifting Right |
shr destination, count
- Shifting Left |
shl destination, count
- Example | Shift
eax
right by 1 bit and get same result as dividingeax
by 2.
- Example | Shift
- Note | The carry flag tracks any bits that are shifted out of the register.
- Shifting Right |
ROTATE
| Rotate bits back to other end of register instead of moving overflowing bit into carry flag or adding zeroes.- Rotate to the Right |
ror destination, count
- Rotate to the Left |
rol destination, count
- Example | Rotate
al
(10101010) right by 1 bit and get same result (01010101) as rotating 10101010 to the right.
- Example | Rotate
- Rotate to the Right |
4 | Flags
In x86 assembly language, the CPU uses flags (bits) in the EFLAGS
register to indicate the outcome of arithmetic and logical operations. The most common flags are listed below:
CF
| Carry Flag | Set when a carry-out or borrow is required from the most significant bit.PF
| Parity Flag | Set if the least significant byte has an even number of 1 bits.AF
| Auxiliary Flag | Set for BCD arithmetic, indicating a carry-out or borrow between bits 3 and 4.ZF
| Zero Flag | Set if the result is zero.SF
| Sign Flag | Set if the result is negative (most significant bit = 1).OF
| Overflow Flag | Set for signed arithmetic overflows (e.g., adding two positive numbers to get a negative result).DF
| Direction Flag: Determines string processing direction (forward or backward).infoQuote: "If
DF
=0, the string is processed forward; ifDF
=1, the string is processed backward."- IF | Interrupt Enable Flag | Enables maskable hardware interrupts.
These flags can be used in conditional jumps and are essential for implementing branching logic in assembly code.
5 | Arithmetic and Logical Instructions
Arithmetic Operations
add
| Additionadd destination, value
sub
| Subtractionsub destination, value
mul
| Multiplicationmul value
- multiplies the value with
eax
and stores the result inedx
(eax
as a 64-bit value) - result: lower 32 bits in
eax
register | higher 32 bits inedx
div
| Divisiondiv value
- divides the 64-bit value in edx
- result: in
eax
- remainder: in
edx
div value
inc
| Increment or decrement a register by 1.inc eax
dec
| Decrement or decrement a register by 1.dec eax
Logical Instructions
and
| Bitwise AND operationand al, 0x7c
or
| Bitwise OR operationor al, 0x7c
not
| Invert the bits of an operandnot al
xor
| Returns 1 if the inputs are opposite, and 0 if they are the samexor al, 0x7c
infoQuote: "XORing a register with itself results in 0. Therefore, the XOR instruction is often used to zero a register, which is more optimized than a MOV instruction."
6 | Conditionals and Branching
Conditionals
test
| Perform a bitwise AND and set the Zero Flag (ZF
) if the result is 0.test destination, source
infoQuote: "often used to check if an operand has a NULL value, for example, by testing the operand against itself. This is done because it takes fewer bytes to use the test instruction than by comparing to 0."
cmp
| Compare the two operands and set the Zero Flag (ZF) or the Carry Flag (CF
)cmp destination, source
- works similarly to subtract
- equal operands -> set
ZF
|source > destination
-> setCF
|source < destination
-> clearZF
andCF
Branching
Unconditional Jump
jmp
| Jumps to a specified location, changing the value of the Instruction Pointer.jmp location
Conditional Jumps
jz
| Jump if the ZF is set (ZF=1).jnz
| Jump if the ZF is not set (ZF=0).je
| Jump if equal. Often used after a CMP instruction.jne
| Jump if not equal. Often used after a CMP instruction.jg
| Jump if the destination is greater than the source operand. Performs signed comparison and is often used after a CMP instruction.jl
| Jump if the destination is lesser than the source operand. Performs signed comparison and is often used after a CMP instruction.jge
| Jump if greater than or equal to. Jumps if the destination operand is greater than or equal to the source operand. Similar to the above instructions.jle
| Jump if lesser than or equal to. Jumps if the destination operand is lesser than or equal to the source operand. Similar to the above instructions.ja
| Jump if above. Similar to jg, but performs an unsigned comparison.jb
| Jump if below. Similar to jl, but performs an unsigned comparison.jae
| Jump if above or equal to. Similar to the above instructions.jbe
| Jump if below or equal to. Similar to the above instructions.
7 | Stack and Function calls
The Stack | PUSH
push
| Push the source operand onto the stackpush source
- value: stored at the memory location pointed by the stack pointer (
ESP
) --> new top of the stack ESP
is decremented
pusha
| Push ALL words onto the stackAX
|BX
|CX
|DX
|SI
|DI
|SP
|BP
| (16-bit general purpose registers)
pushad
| Push ALL double words onto the stackEAX
|EBX
|ECX
|EDX
|ESI
|EDI
|ESP
|EBP
| (32-bit general purpose registers)
Quote: "When we encounter these instructions, it is often a sign of someone manually injecting assembly instructions to save the state of registers, as is often the case with shellcode."
The Stack | POP
pop
| Retrieve the value from the top of the stack and store it in the destination operandESP
is incremented
popa
| Pop ALL words- Pops the values sequentially from the top of the stack to general-purpose registers
- Order:
DI
|SI
|BP
|BX
|DX
|CX
|AX
ESP
is adjusted
popad
| Pop ALL double words- Pops the values sequentially from the top of the stack to general-purpose registers
- Order:
EDI
|ESI
|EBP
|EBX
|EDX
|ECX
|EAX
ESP
orSP
is adjusted
The CALL Instruction
call
| Perform a function call operation to perform a specific taskcall location
- Depending on the calling convention
- arguments: placed on the stack OR in the registers
- prologue | prepares the stack by adjusting the EBP and ESP AND pushing the return address on the stack
- epilogue | restores the stack for the caller function
8 | Practice Time
Play around with their provide Assembly Emulator for a bit. It's simply done amazingly and makes understanding so much more easier.
Arithmetic Code
mov eax,20h ; EIP=0x00000001 | EAX=0x00000020
mov ebx,30h ; EIP=0x00000002 | EBX=0x00000030
add eax,ebx ; EIP=0x00000003 | EAX=0x00000050 | PF=1 | EBX=0x00000030 (operand remains unchanged)
nop ; EIP=0x00000004 | PF=1 (parity flag remains set)
nop ; EIP=0x00000005
sub eax,ebx ; EIP=0x00000006 | EAX=0x00000020 | PF=0 | EBX remains unchanged
inc ebx ; EIP=0x00000007 | EBX=0x00000031
dec ebx ; EIP=0x00000008 | EBX=0x00000031 | PF=1 | AF=1
mul eax ; EIP=0x00000009 | EAX=0x00000400 | EDX=0x00000000 | PF=1 | AF=1
MOV Instructions
mov eax,10h ; EIP=0x00000001 | EAX=0x00000010
mov ebx,32h ; EIP=0x00000002 | EBX=0x00000032
mov ecx,eax ; EIP=0x00000003 | ECX=0x00000010 | EAX remains unchanged
mov [eax],40h ; EIP=0x00000004 | Memory address [0x10]=0x00000040
add [eax],30h ; EIP=0x00000005 | Memory address [0x10]=0x00000070
mov [ebx],[eax] ; ERROR | Memory to memory data movement is NOT allowed.
Stack
; NOTE-1 -- stack works from the higher memory location to the lower
; NOTE-2 -- stack works in LIFO mode
mov eax,10h ; EIP=0x00000001 | EAX=0x00000010
mov ebx, 15h ; EIP=0x00000002 | EBX=0x00000015
mov ecx, 20h ; EIP=0x00000003 | ECX=0x00000020
mov edx, 25h ; EIP=0x00000004 | EDX=0x00000025
; original values | ESP=0x00001000 | EBP=0x00001000 | EAX,EBX,ECX,EDX remain UNCHANGED
push eax ; EIP=0x00000005 | ESP=0x00000ffc | Stack location [0xffc]=0x00000010
push ebx ; EIP=0x00000006 | ESP=0x00000ff8 | Stack location [0xff8]=0x00000015
push ecx ; EIP=0x00000007 | ESP=0x00000ff4 | Stack location [0xff4]=0x00000020
push edx ; EIP=0x00000008 | ESP=0x00000ff0 | Stack location [0xff0]=0x00000025
pop eax ; EIP=0x00000009 | ESP=0x00000ff4 | Stack location [0xff0] CLEARED | EAX=0x00000025
pop ebx ; EIP=0x0000000a | ESP=0x00000ff8 | Stack location [0xff4] CLEARED | EBX=0x00000020
pop ecx ; EIP=0x0000000b | ESP=0x00000ffc | Stack location [0xff8] CLEARED | ECX=0x00000015
pop edx ; EIP=0x0000000c | ESP=0x00001000 | Stack location [0xffc] CLEARED | EDX=0x00000010
CMP and TEST Instructions
mov eax,10h ; EIP=0x00000001 | EAX=0x00000010
mov ebx,10h ; EIP=0x00000002 | EBX=0x00000010
cmp eax,ebx ; EIP=0x00000003 | ZF=1 and PF=1
test eax,ebx ; EIP=0x00000004 | ZF=0 and PF=0
mov eax,20h ; EIP=0x00000005 | EAX=0x00000020
mov ebx,10h ; EIP=0x00000006 | EBX=0x00000010
cmp eax,ebx ; EIP=0x00000007 | source < destination -> clear ZF and CF
test eax,ebx ; EIP=0x00000008 | bitwise AND and set ZF if 0 --> ZF=1 and PF=1
mov eax,20h ; EIP=0x00000009 | EAX=0x00000020 | ZF=1 and PF=1 (remain unchanged)
mov ebx,40h ; EIP=0x0000000a | EBX=0x00000040
cmp eax,ebx ; EIP=0x0000000b | source > destination -> set CF | CF=1 and SF=1 (most significant bit = 1)
test eax,ebx ; EIP=0x0000000c | ZF=1 and PF=1 | CF, SF are cleared
LEA Instruction
mov eax,20h ; EIP=0x00000001 | EAX=0x00000020
mov ebx,30h ; EIP=0x00000002 | EBX=0x00000030
add eax,ebx ; EIP=0x00000003 | EAX=0x00000050 | PF=1
nop ; EIP=0x00000004 | PF remains set
mov [eax],ebx ; EIP=0x00000005 | Memory location [0x50]=0x00000030
add ebx,15h ; EIP=0x00000006 | EBX=0x00000045 | Clear PF
mov ecx,6 ; EIP=0x00000007 | EXC=0x00000006
mov [ebx+ecx],eax ; EIP=0x00000008 | Memory locations [0x4b]=0x50000000 and [0x4c]=0x00000000 | (0x45+0x06=0x4b)
lea eax,[ebx+ecx] ; EIP=0x00000009 | EAX=0x0000004b
push eax ; EIP=0x0000000a | ESP=0x00000ffc and Stack location [0xffc]=0x0000004b
push ebx ; EIP=0x0000000b | ESP=0x00000ff8 and Stack location [0xff8]=0x00000045
pop ecx ; EIP=0x0000000c | ESP=0x00000ffc and Stack location [0xff8] cleared | ECX=0x00000045