Skip to main content

0802 | x86 Assembly Crash Course

Malware Analysis | x86 Assembly Crash Course | Summary:

The room discusses various aspects of x86 assembly language programming, covering essential concepts such as opcodes and operands, general assembly instructions, arithmetic and logical instructions, conditionals, and branching instructions.

It also includes some warnings about the use of these instructions in real-world scenarios, particularly related to shellcode injection.


Disclaimer: Please note that this write-up is NOT intended to replace the original room or its content, but rather serve as supplementary material for those who are stuck and need additional guidance.

Learning Objectives

  • Opcodes and operands
  • General assembly instructions
  • Arithmetic and logical instructions
  • Conditionals
  • Branching instructions

1 | Introduction

Learning basic Assembly language is crucial for malware reverse engineering, as most malware samples are compiled binaries that cannot be directly viewed in their original C/C++ or other language code.

Decompiling these binaries using a decompiler or disassembler often removes important information, making Assembly language the most reliable and human-readable code to analyze, particularly when trying to understand what a binary is doing.

2 | Opcodes and Operands

Computer code is typically stored as binary (1s and 0s) or hexadecimal (hex) format on disk, which humans can't easily understand. Binary/hex code consists of instructions (opcodes) and data (operands), where opcodes tell the CPU what to do and operands specify what's being acted upon.

When disassembling a program, opcodes are translated into human-readable assembly language instructions, like mov eax, 0x5f. These instructions typically have three parts: an instruction (mov), and two operands (e.g. eax and 0x5f). There are three types of operands:

  1. Immediate Operands | fixed values like 0x5f.
  2. Registers | like eax, where data is stored.
  3. Memory Operands | referenced by square brackets, e.g. [eax], indicating a memory location to operate on.
info

Quote: "Please note that due to endianness, the operand 0x5f is written as 5f 00 00 00, which is actually 00 00 00 5f but in little-endian notation."

Understanding these concepts helps reveal how code is executed at the CPU level, and is essential for malware analysis and reverse engineering.

3 | General Instructions

Instructions that Interact with Registers and Memory

  • MOV | Moves data from one location (register, memory, or immediate operand) to another.

    • mov destination, source
    • The source will NOT be changed.
    • Memory to memory data movement is NOT allowed.
      • Examples
        • mov eax, 0x5f | Move fixed value to register
        • mov ebx, eax | Move value from register to another register
        • mov eax, [0x5fc53e] | Move value from memory location to register
  • LEA | Loads effective address (address of source into destination).

    • lea destination, source
    • Example | Calculate ebp+4 and move result to eax: lea eax, [ebp+4]

Other Instructions

  • NOP | No operation (exchanges value in eax with itself).
    • nop
      • Used for consuming CPU cycles or redirecting execution.
  • SHIFT | Shift each register bit to the adjacent bit and move overflowing bit into carry flag or add zeroes.
    • Shifting Right | shr destination, count
    • Shifting Left | shl destination, count
      • Example | Shift eax right by 1 bit and get same result as dividing eax by 2.
    • Note | The carry flag tracks any bits that are shifted out of the register.
  • ROTATE | Rotate bits back to other end of register instead of moving overflowing bit into carry flag or adding zeroes.
    • Rotate to the Right | ror destination, count
    • Rotate to the Left | rol destination, count
      • Example | Rotate al (10101010) right by 1 bit and get same result (01010101) as rotating 10101010 to the right.

4 | Flags

In x86 assembly language, the CPU uses flags (bits) in the EFLAGS register to indicate the outcome of arithmetic and logical operations. The most common flags are listed below:

  • CF | Carry Flag | Set when a carry-out or borrow is required from the most significant bit.
  • PF | Parity Flag | Set if the least significant byte has an even number of 1 bits.
  • AF | Auxiliary Flag | Set for BCD arithmetic, indicating a carry-out or borrow between bits 3 and 4.
  • ZF | Zero Flag | Set if the result is zero.
  • SF | Sign Flag | Set if the result is negative (most significant bit = 1).
  • OF | Overflow Flag | Set for signed arithmetic overflows (e.g., adding two positive numbers to get a negative result).
  • DF | Direction Flag: Determines string processing direction (forward or backward).
    info

    Quote: "If DF=0, the string is processed forward; if DF=1, the string is processed backward."

  • IF | Interrupt Enable Flag | Enables maskable hardware interrupts.

These flags can be used in conditional jumps and are essential for implementing branching logic in assembly code.

5 | Arithmetic and Logical Instructions

Arithmetic Operations

  • add | Addition
    • add destination, value
  • sub | Subtraction
    • sub destination, value
  • mul | Multiplication
    • mul value
    • multiplies the value with eax and stores the result in edx (eax as a 64-bit value)
    • result: lower 32 bits in eax register | higher 32 bits in edx
  • div | Division
    • div value
    • divides the 64-bit value in edx
    • result: in eax
    • remainder: in edx
    • div value
  • inc | Increment or decrement a register by 1.
    • inc eax
  • dec | Decrement or decrement a register by 1.
    • dec eax

Logical Instructions

  • and | Bitwise AND operation
    • and al, 0x7c
  • or | Bitwise OR operation
    • or al, 0x7c
  • not | Invert the bits of an operand
    • not al
  • xor | Returns 1 if the inputs are opposite, and 0 if they are the same
    • xor al, 0x7c
    info

    Quote: "XORing a register with itself results in 0. Therefore, the XOR instruction is often used to zero a register, which is more optimized than a MOV instruction."

6 | Conditionals and Branching

Conditionals

  • test | Perform a bitwise AND and set the Zero Flag (ZF) if the result is 0.
    • test destination, source
    info

    Quote: "often used to check if an operand has a NULL value, for example, by testing the operand against itself. This is done because it takes fewer bytes to use the test instruction than by comparing to 0."

  • cmp | Compare the two operands and set the Zero Flag (ZF) or the Carry Flag (CF)
    • cmp destination, source
    • works similarly to subtract
    • equal operands -> set ZF | source > destination -> set CF | source < destination -> clear ZF and CF

Branching

Unconditional Jump

  • jmp | Jumps to a specified location, changing the value of the Instruction Pointer.
    • jmp location

Conditional Jumps

  • jz | Jump if the ZF is set (ZF=1).
  • jnz | Jump if the ZF is not set (ZF=0).
  • je | Jump if equal. Often used after a CMP instruction.
  • jne | Jump if not equal. Often used after a CMP instruction.
  • jg | Jump if the destination is greater than the source operand. Performs signed comparison and is often used after a CMP instruction.
  • jl | Jump if the destination is lesser than the source operand. Performs signed comparison and is often used after a CMP instruction.
  • jge | Jump if greater than or equal to. Jumps if the destination operand is greater than or equal to the source operand. Similar to the above instructions.
  • jle | Jump if lesser than or equal to. Jumps if the destination operand is lesser than or equal to the source operand. Similar to the above instructions.
  • ja | Jump if above. Similar to jg, but performs an unsigned comparison.
  • jb | Jump if below. Similar to jl, but performs an unsigned comparison.
  • jae | Jump if above or equal to. Similar to the above instructions.
  • jbe | Jump if below or equal to. Similar to the above instructions.

7 | Stack and Function calls

The Stack | PUSH

  • push | Push the source operand onto the stack
    • push source
    • value: stored at the memory location pointed by the stack pointer (ESP) --> new top of the stack
    • ESP is decremented
  • pusha | Push ALL words onto the stack
    • AX | BX | CX | DX | SI | DI | SP | BP | (16-bit general purpose registers)
  • pushad | Push ALL double words onto the stack
    • EAX | EBX | ECX | EDX | ESI | EDI | ESP | EBP | (32-bit general purpose registers)
warning

Quote: "When we encounter these instructions, it is often a sign of someone manually injecting assembly instructions to save the state of registers, as is often the case with shellcode."

The Stack | POP

  • pop | Retrieve the value from the top of the stack and store it in the destination operand
    • ESP is incremented
  • popa | Pop ALL words
    • Pops the values sequentially from the top of the stack to general-purpose registers
    • Order: DI | SI | BP | BX | DX | CX | AX
    • ESP is adjusted
  • popad | Pop ALL double words
    • Pops the values sequentially from the top of the stack to general-purpose registers
    • Order: EDI | ESI | EBP | EBX | EDX | ECX | EAX
    • ESP or SP is adjusted

The CALL Instruction

  • call | Perform a function call operation to perform a specific task
    • call location
    • Depending on the calling convention
      • arguments: placed on the stack OR in the registers
    • prologue | prepares the stack by adjusting the EBP and ESP AND pushing the return address on the stack
    • epilogue | restores the stack for the caller function

8 | Practice Time

Play around with their provide Assembly Emulator for a bit. It's simply done amazingly and makes understanding so much more easier.

Arithmetic Code

mov eax,20h ; EIP=0x00000001 | EAX=0x00000020 
mov ebx,30h ; EIP=0x00000002 | EBX=0x00000030
add eax,ebx ; EIP=0x00000003 | EAX=0x00000050 | PF=1 | EBX=0x00000030 (operand remains unchanged)
nop ; EIP=0x00000004 | PF=1 (parity flag remains set)
nop ; EIP=0x00000005
sub eax,ebx ; EIP=0x00000006 | EAX=0x00000020 | PF=0 | EBX remains unchanged
inc ebx ; EIP=0x00000007 | EBX=0x00000031
dec ebx ; EIP=0x00000008 | EBX=0x00000031 | PF=1 | AF=1
mul eax ; EIP=0x00000009 | EAX=0x00000400 | EDX=0x00000000 | PF=1 | AF=1

MOV Instructions

mov eax,10h     ; EIP=0x00000001 | EAX=0x00000010
mov ebx,32h ; EIP=0x00000002 | EBX=0x00000032
mov ecx,eax ; EIP=0x00000003 | ECX=0x00000010 | EAX remains unchanged
mov [eax],40h ; EIP=0x00000004 | Memory address [0x10]=0x00000040
add [eax],30h ; EIP=0x00000005 | Memory address [0x10]=0x00000070
mov [ebx],[eax] ; ERROR | Memory to memory data movement is NOT allowed.

Stack

; NOTE-1 -- stack works from the higher memory location to the lower
; NOTE-2 -- stack works in LIFO mode
mov eax,10h ; EIP=0x00000001 | EAX=0x00000010
mov ebx, 15h ; EIP=0x00000002 | EBX=0x00000015
mov ecx, 20h ; EIP=0x00000003 | ECX=0x00000020
mov edx, 25h ; EIP=0x00000004 | EDX=0x00000025

; original values | ESP=0x00001000 | EBP=0x00001000 | EAX,EBX,ECX,EDX remain UNCHANGED
push eax ; EIP=0x00000005 | ESP=0x00000ffc | Stack location [0xffc]=0x00000010
push ebx ; EIP=0x00000006 | ESP=0x00000ff8 | Stack location [0xff8]=0x00000015
push ecx ; EIP=0x00000007 | ESP=0x00000ff4 | Stack location [0xff4]=0x00000020
push edx ; EIP=0x00000008 | ESP=0x00000ff0 | Stack location [0xff0]=0x00000025

pop eax ; EIP=0x00000009 | ESP=0x00000ff4 | Stack location [0xff0] CLEARED | EAX=0x00000025
pop ebx ; EIP=0x0000000a | ESP=0x00000ff8 | Stack location [0xff4] CLEARED | EBX=0x00000020
pop ecx ; EIP=0x0000000b | ESP=0x00000ffc | Stack location [0xff8] CLEARED | ECX=0x00000015
pop edx ; EIP=0x0000000c | ESP=0x00001000 | Stack location [0xffc] CLEARED | EDX=0x00000010

CMP and TEST Instructions

mov eax,10h   ; EIP=0x00000001 | EAX=0x00000010
mov ebx,10h ; EIP=0x00000002 | EBX=0x00000010
cmp eax,ebx ; EIP=0x00000003 | ZF=1 and PF=1
test eax,ebx ; EIP=0x00000004 | ZF=0 and PF=0

mov eax,20h ; EIP=0x00000005 | EAX=0x00000020
mov ebx,10h ; EIP=0x00000006 | EBX=0x00000010
cmp eax,ebx ; EIP=0x00000007 | source < destination -> clear ZF and CF
test eax,ebx ; EIP=0x00000008 | bitwise AND and set ZF if 0 --> ZF=1 and PF=1

mov eax,20h ; EIP=0x00000009 | EAX=0x00000020 | ZF=1 and PF=1 (remain unchanged)
mov ebx,40h ; EIP=0x0000000a | EBX=0x00000040
cmp eax,ebx ; EIP=0x0000000b | source > destination -> set CF | CF=1 and SF=1 (most significant bit = 1)
test eax,ebx ; EIP=0x0000000c | ZF=1 and PF=1 | CF, SF are cleared

LEA Instruction

mov eax,20h       ; EIP=0x00000001 | EAX=0x00000020
mov ebx,30h ; EIP=0x00000002 | EBX=0x00000030
add eax,ebx ; EIP=0x00000003 | EAX=0x00000050 | PF=1
nop ; EIP=0x00000004 | PF remains set
mov [eax],ebx ; EIP=0x00000005 | Memory location [0x50]=0x00000030
add ebx,15h ; EIP=0x00000006 | EBX=0x00000045 | Clear PF
mov ecx,6 ; EIP=0x00000007 | EXC=0x00000006
mov [ebx+ecx],eax ; EIP=0x00000008 | Memory locations [0x4b]=0x50000000 and [0x4c]=0x00000000 | (0x45+0x06=0x4b)
lea eax,[ebx+ecx] ; EIP=0x00000009 | EAX=0x0000004b
push eax ; EIP=0x0000000a | ESP=0x00000ffc and Stack location [0xffc]=0x0000004b
push ebx ; EIP=0x0000000b | ESP=0x00000ff8 and Stack location [0xff8]=0x00000045
pop ecx ; EIP=0x0000000c | ESP=0x00000ffc and Stack location [0xff8] cleared | ECX=0x00000045