THM | x86 Assembly Crash Course

Malware Analysis | x86 Assembly Crash Course | Summary:
The room discusses various aspects of x86 assembly language programming, covering essential concepts such as opcodes and operands, general assembly instructions, arithmetic and logical instructions, conditionals, and branching instructions.
It also includes some warnings about the use of these instructions in real-world scenarios, particularly related to shellcode injection.
Disclaimer: Please note that this write-up is NOT intended to replace the original room or its content, but rather serve as supplementary material for those who are stuck and need additional guidance.
Learning Objectives
- Opcodes and operands
- General assembly instructions
- Arithmetic and logical instructions
- Conditionals
- Branching instructions
1 | Introduction
Learning basic Assembly language is crucial for malware reverse engineering, as most malware samples are compiled binaries that cannot be directly viewed in their original C/C++ or other language code.
Decompiling these binaries using a decompiler or disassembler often removes important information, making Assembly language the most reliable and human-readable code to analyze, particularly when trying to understand what a binary is doing.
Prerequisites
- x86 Architecture Overview
Question 1: I have completed the prerequisite room.
No answer needed
2 | Opcodes and Operands
Computer code is typically stored as binary (1s and 0s) or hexadecimal (hex) format on disk, which humans can't easily understand. Binary/hex code consists of instructions (opcodes) and data (operands), where opcodes tell the CPU what to do and operands specify what's being acted upon.
When disassembling a program, opcodes are translated into human-readable assembly language instructions, like mov eax, 0x5f. These instructions typically have three parts: an instruction (mov), and two operands (e.g. eax and 0x5f). There are three types of operands:
- Immediate Operands | fixed values like 0x5f.
- Registers | like eax, where data is stored.
- Memory Operands | referenced by square brackets, e.g. [eax], indicating a memory location to operate on.
Quote: "Please note that due to endianness, the operand 0x5f is written as 5f 00 00 00, which is actually 00 00 00 5f but in little-endian notation."
Understanding these concepts helps reveal how code is executed at the CPU level, and is essential for malware analysis and reverse engineering.
Question 1: What are the hex codes that denote the assembly operations called?
Opcodes
Question 2: Which type of operand is denoted by square brackets?
memory operand
3 | General Instructions
Instructions that Interact with Registers and Memory
- 
MOV| Moves data from one location (register, memory, or immediate operand) to another.- mov destination, source
- The source will NOT be changed.
- Memory to memory data movement is NOT allowed.
- Examples
- mov eax, 0x5f| Move fixed value to register
- mov ebx, eax| Move value from register to another register
- mov eax, [0x5fc53e]| Move value from memory location to register
 
 
- Examples
 
- 
LEA| Loads effective address (address of source into destination).- lea destination, source
- Example | Calculate ebp+4and move result to eax:lea eax, [ebp+4]
 
Other Instructions
- NOP| No operation (exchanges value in eax with itself).- nop- Used for consuming CPU cycles or redirecting execution.
 
 
- SHIFT| Shift each register bit to the adjacent bit and move overflowing bit into carry flag or add zeroes.- Shifting Right | shr destination, count
- Shifting Left | shl destination, count- Example | Shift eaxright by 1 bit and get same result as dividingeaxby 2.
 
- Example | Shift 
- Note | The carry flag tracks any bits that are shifted out of the register.
 
- Shifting Right | 
- ROTATE| Rotate bits back to other end of register instead of moving overflowing bit into carry flag or adding zeroes.- Rotate to the Right | ror destination, count
- Rotate to the Left | rol destination, count- Example | Rotate al(10101010) right by 1 bit and get same result (01010101) as rotating 10101010 to the right.
 
- Example | Rotate 
 
- Rotate to the Right | 
Question 1: In mov eax, ebx, which register is the destination operand?
eax
Question 2: What instruction performs no action?
nop
4 | Flags
In x86 assembly language, the CPU uses flags (bits) in the EFLAGS register to indicate the outcome of arithmetic and logical operations. The most common flags are listed below:
- CF| Carry Flag | Set when a carry-out or borrow is required from the most significant bit.
- PF| Parity Flag | Set if the least significant byte has an even number of 1 bits.
- AF| Auxiliary Flag | Set for BCD arithmetic, indicating a carry-out or borrow between bits 3 and 4.
- ZF| Zero Flag | Set if the result is zero.
- SF| Sign Flag | Set if the result is negative (most significant bit = 1).
- OF| Overflow Flag | Set for signed arithmetic overflows (e.g., adding two positive numbers to get a negative result).
- DF| Direction Flag: Determines string processing direction (forward or backward).info- Quote: "If - DF=0, the string is processed forward; if- DF=1, the string is processed backward."
- IF | Interrupt Enable Flag | Enables maskable hardware interrupts.
These flags can be used in conditional jumps and are essential for implementing branching logic in assembly code.
Question 1: Which flag will be set if the result of the operation is zero? (Answer in abbreviation)
ZF
Question 2: Which flag will be set if the result of the operation is negative? (Answer in abbreviation)
SF
5 | Arithmetic and Logical Instructions
Arithmetic Operations
- add| Addition- add destination, value
 
- sub| Subtraction- sub destination, value
 
- mul| Multiplication- mul value
- multiplies the value with eaxand stores the result inedx(eaxas a 64-bit value)
- result: lower 32 bits in eaxregister | higher 32 bits inedx
 
- div| Division- div value
- divides the 64-bit value in edx
- result: in eax
- remainder: in edx
- div value
 
- inc| Increment or decrement a register by 1.- inc eax
 
- dec| Decrement or decrement a register by 1.- dec eax
 
Logical Instructions
- and| Bitwise AND operation- and al, 0x7c
 
- or| Bitwise OR operation- or al, 0x7c
 
- not| Invert the bits of an operand- not al
 
- xor| Returns 1 if the inputs are opposite, and 0 if they are the same- xor al, 0x7c
 info- Quote: "XORing a register with itself results in 0. Therefore, the XOR instruction is often used to zero a register, which is more optimized than a MOV instruction." 
Question 1: In a subtraction operation, which flag is set if the destination is smaller than the subtracted value?
Carry Flag
Question 2: Which instruction is used to increase the value of a register
inc
Question 3: Do the following instructions have the same result? (yea/nay)
- xor eax, eax|- mov eax, 0
yea
6 | Conditionals and Branching
Conditionals
- test| Perform a bitwise AND and set the Zero Flag (- ZF) if the result is 0.- test destination, source
 info- Quote: "often used to check if an operand has a NULL value, for example, by testing the operand against itself. This is done because it takes fewer bytes to use the test instruction than by comparing to 0." 
- cmp| Compare the two operands and set the Zero Flag (ZF) or the Carry Flag (- CF)- cmp destination, source
- works similarly to subtract
- equal operands -> set ZF|source > destination-> setCF|source < destination-> clearZFandCF
 
Branching
Unconditional Jump
- jmp| Jumps to a specified location, changing the value of the Instruction Pointer.- jmp location
 
Conditional Jumps
- jz| Jump if the ZF is set (ZF=1).
- jnz| Jump if the ZF is not set (ZF=0).
- je| Jump if equal. Often used after a CMP instruction.
- jne| Jump if not equal. Often used after a CMP instruction.
- jg| Jump if the destination is greater than the source operand. Performs signed comparison and is often used after a CMP instruction.
- jl| Jump if the destination is lesser than the source operand. Performs signed comparison and is often used after a CMP instruction.
- jge| Jump if greater than or equal to. Jumps if the destination operand is greater than or equal to the source operand. Similar to the above instructions.
- jle| Jump if lesser than or equal to. Jumps if the destination operand is lesser than or equal to the source operand. Similar to the above instructions.
- ja| Jump if above. Similar to jg, but performs an unsigned comparison.
- jb| Jump if below. Similar to jl, but performs an unsigned comparison.
- jae| Jump if above or equal to. Similar to the above instructions.
- jbe| Jump if below or equal to. Similar to the above instructions.
Question 1: Which flag is set as a result of the test instruction being zero?
Zero Flag
Question 2: Which of the below operations uses subtraction to test two values? 1 or 2?
- 
- cmp eax, ebx
 
- 
- test eax, ebx
 
1
Question 3: Which flag is used to identify whether a jump will be taken or not after a jz or jnz instruction?
Zero Flag
7 | Stack and Function calls
The Stack | PUSH
- push| Push the source operand onto the stack- push source
- value: stored at the memory location pointed by the stack pointer (ESP) --> new top of the stack
- ESPis decremented
 
- pusha| Push ALL words onto the stack- AX|- BX|- CX|- DX|- SI|- DI|- SP|- BP| (16-bit general purpose registers)
 
- pushad| Push ALL double words onto the stack- EAX|- EBX|- ECX|- EDX|- ESI|- EDI|- ESP|- EBP| (32-bit general purpose registers)
 
Quote: "When we encounter these instructions, it is often a sign of someone manually injecting assembly instructions to save the state of registers, as is often the case with shellcode."
The Stack | POP
- pop| Retrieve the value from the top of the stack and store it in the destination operand- ESPis incremented
 
- popa| Pop ALL words- Pops the values sequentially from the top of the stack to general-purpose registers
- Order: DI|SI|BP|BX|DX|CX|AX
- ESPis adjusted
 
- popad| Pop ALL double words- Pops the values sequentially from the top of the stack to general-purpose registers
- Order: EDI|ESI|EBP|EBX|EDX|ECX|EAX
- ESPor- SPis adjusted
 
The CALL Instruction
- call| Perform a function call operation to perform a specific task- call location
- Depending on the calling convention
- arguments: placed on the stack OR in the registers
 
- prologue | prepares the stack by adjusting the EBP and ESP AND pushing the return address on the stack
- epilogue | restores the stack for the caller function
 
Question 1: Which instruction is used for performing a function call?
call
Question 2: Which instruction is used to push all registers to the stack?
pusha
8 | Practice Time
Play around with their provide Assembly Emulator for a bit. It's simply done amazingly and makes understanding so much more easier.
Arithmetic Code
mov eax,20h ; EIP=0x00000001 | EAX=0x00000020 
mov ebx,30h ; EIP=0x00000002 | EBX=0x00000030 
add eax,ebx ; EIP=0x00000003 | EAX=0x00000050 | PF=1 | EBX=0x00000030 (operand remains unchanged)
nop         ; EIP=0x00000004 | PF=1 (parity flag remains set)
nop         ; EIP=0x00000005
sub eax,ebx ; EIP=0x00000006 | EAX=0x00000020 | PF=0 | EBX remains unchanged
inc ebx     ; EIP=0x00000007 | EBX=0x00000031
dec ebx     ; EIP=0x00000008 | EBX=0x00000031 | PF=1 | AF=1 
mul eax     ; EIP=0x00000009 | EAX=0x00000400 | EDX=0x00000000 | PF=1 | AF=1
MOV Instructions
mov eax,10h     ; EIP=0x00000001 | EAX=0x00000010
mov ebx,32h     ; EIP=0x00000002 | EBX=0x00000032
mov ecx,eax     ; EIP=0x00000003 | ECX=0x00000010 | EAX remains unchanged
mov [eax],40h   ; EIP=0x00000004 | Memory address [0x10]=0x00000040
add [eax],30h   ; EIP=0x00000005 | Memory address [0x10]=0x00000070
mov [ebx],[eax] ; ERROR | Memory to memory data movement is NOT allowed.
Stack
; NOTE-1 -- stack works from the higher memory location to the lower
; NOTE-2 --  stack works in LIFO mode
mov eax,10h   ; EIP=0x00000001 | EAX=0x00000010
mov ebx, 15h  ; EIP=0x00000002 | EBX=0x00000015
mov ecx, 20h  ; EIP=0x00000003 | ECX=0x00000020
mov edx, 25h  ; EIP=0x00000004 | EDX=0x00000025
; original values | ESP=0x00001000 | EBP=0x00001000 | EAX,EBX,ECX,EDX remain UNCHANGED
push eax      ; EIP=0x00000005 | ESP=0x00000ffc | Stack location [0xffc]=0x00000010
push ebx      ; EIP=0x00000006 | ESP=0x00000ff8 | Stack location [0xff8]=0x00000015
push ecx      ; EIP=0x00000007 | ESP=0x00000ff4 | Stack location [0xff4]=0x00000020
push edx      ; EIP=0x00000008 | ESP=0x00000ff0 | Stack location [0xff0]=0x00000025
pop eax       ; EIP=0x00000009 | ESP=0x00000ff4 | Stack location [0xff0] CLEARED | EAX=0x00000025
pop ebx       ; EIP=0x0000000a | ESP=0x00000ff8 | Stack location [0xff4] CLEARED | EBX=0x00000020
pop ecx       ; EIP=0x0000000b | ESP=0x00000ffc | Stack location [0xff8] CLEARED | ECX=0x00000015
pop edx       ; EIP=0x0000000c | ESP=0x00001000 | Stack location [0xffc] CLEARED | EDX=0x00000010
CMP and TEST Instructions
mov eax,10h   ; EIP=0x00000001 | EAX=0x00000010
mov ebx,10h   ; EIP=0x00000002 | EBX=0x00000010
cmp eax,ebx   ; EIP=0x00000003 | ZF=1 and PF=1
test eax,ebx  ; EIP=0x00000004 | ZF=0 and PF=0
mov eax,20h   ; EIP=0x00000005 | EAX=0x00000020
mov ebx,10h   ; EIP=0x00000006 | EBX=0x00000010
cmp eax,ebx   ; EIP=0x00000007 | source < destination -> clear ZF and CF
test eax,ebx  ; EIP=0x00000008 | bitwise AND and set ZF if 0 --> ZF=1 and PF=1
mov eax,20h   ; EIP=0x00000009 | EAX=0x00000020 | ZF=1 and PF=1 (remain unchanged)
mov ebx,40h   ; EIP=0x0000000a | EBX=0x00000040
cmp eax,ebx   ; EIP=0x0000000b | source > destination -> set CF | CF=1 and SF=1 (most significant bit = 1)
test eax,ebx  ; EIP=0x0000000c | ZF=1 and PF=1 | CF, SF are cleared
LEA Instruction
mov eax,20h       ; EIP=0x00000001 | EAX=0x00000020
mov ebx,30h       ; EIP=0x00000002 | EBX=0x00000030
add eax,ebx       ; EIP=0x00000003 | EAX=0x00000050 | PF=1
nop               ; EIP=0x00000004 | PF remains set
mov [eax],ebx     ; EIP=0x00000005 | Memory location [0x50]=0x00000030
add ebx,15h       ; EIP=0x00000006 | EBX=0x00000045 | Clear PF
mov ecx,6         ; EIP=0x00000007 | EXC=0x00000006
mov [ebx+ecx],eax ; EIP=0x00000008 | Memory locations [0x4b]=0x50000000 and [0x4c]=0x00000000 | (0x45+0x06=0x4b)
lea eax,[ebx+ecx] ; EIP=0x00000009 | EAX=0x0000004b
push eax          ; EIP=0x0000000a | ESP=0x00000ffc and Stack location [0xffc]=0x0000004b
push ebx          ; EIP=0x0000000b | ESP=0x00000ff8 and Stack location [0xff8]=0x00000045
pop ecx           ; EIP=0x0000000c | ESP=0x00000ffc and Stack location [0xff8] cleared | ECX=0x00000045
Question 1: While running the MOV instructions, what is the value of [eax] after running the 4th instruction? (in hex)
0x00000040
Question 2: What error is displayed after running the 6th instruction from the MOV instruction section?
Memory to memory data movement is not allowed.
Question 3: Run the instructions from the stack section. What is the value of eax after the 9th instruction? (in hex)
0x00000025
Question 4: Run the instructions from the stack section. What is the value of edx after the 12th instruction? (in hex)
0x00000010
Question 5: Run the instructions from the stack section. After POP ecx, what is the value left at the top of the stack? (in hex)
0x00000010
Question 6: Run the cmp and test instructions. Which flags are triggered after the 3rd instruction?
- Note | Use these abbreviations in alphabetical order with no spaces: CF,PF,SF,ZF
PF,ZF
Question 7: Run the test and the cmp instructions. Which flags are triggered after the 11th instruction?
- Note | Use these abbreviations in alphabetical order with no spaces: CF,PF,SF,ZF
CF,SF
Question 8: Run the instructions from the lea section. What is the value of eax after running the 9th instruction? (in hex)
0x0000004B
Question 9: Run the instructions from the lea section. What is the final value found in the ECX register? (in hex)
0x00000045
9 | Conclusion
Question 1: Join the discussion on our social channels.
No answer needed