Reverse Engineering Binaries is a critical set of techniques enabling attackers to extract sensitive information from, or inject code into, both local and remote executables.
Binary Exploitation Techniques are very popular in CTF (Capture The Flag) competitions and much less common in bug bounty programs but it can be applied to enable much more complex attack chains.
Mastering Reverse Engineering is crucial for cyber defense, uncover vulnerabilities, and develop more secure software systems.
MEMORY LAYOUT OVERVIEW
Memory
- Stack
- Contains local variables, function parameters, and control flow information.
- Grows from higher to lower memory addresses.
- Heap
- Dynamically allocated memory (e.g.
malloc
). - Grows from lower to higher memory addresses.
- Dynamically allocated memory (e.g.
- Data
- Static and global variables.
- Code
- Executable instructions of the program.
Concepts
- Endianness
- Big-Endian
- Most significant byte (MSB) stored first.
- Little-Endian
- Least significant byte (LSB) stored first.
- Big-Endian
Functions
- Prologs
- Prepare the function for execution, including setting up the stack frame.
- Shortcut:
enter
- Shortcut:
- Prepare the function for execution, including setting up the stack frame.
- Epilogs
- Clean up the stack and return from the function.
- Shortcut:
leave
- Shortcut:
- Clean up the stack and return from the function.
- Canary (aka Stack Cookie/Guard)
- Detect stack buffer overflows by checking if the canary value has been altered and tries to prevent the return address to be overwritten causing control flow to be manipulated (hijacked).
- ROP (Return Oriented Programming)
- Exploit vulnerabilities by chaining together small snippets of code (Gadgets) found in the existing binary.
- Gadgets
- Small sequences of instructions ending in a
ret
used to perform arbitrary operations.
- Small sequences of instructions ending in a
- GOT (Global Offset Table)
- An address resolution table of global variables and functions at runtime in a shared library.
- Lazy Binding
- Resolve function addresses on their first call with the usage of the PLT, rather than at program startup.
- PLT (Procedure Linkage Table)
- Facilitate dynamic linking by redirecting function calls to the GOT (with the help of Dynamic Linker/Loader).
- ASLR (Address space layout randomization)
- Protect against buffer-overflow attacks by randomizing the memory addresses used by system and program components.
- Movaps
- This x86 instruction requires operands to be 16-byte aligned, otherwise leads to problems during exploitation because, as a protection feature, it will raise an exception and potentially crash the program (ends up being a common pitfall).
- The workaround is to add an extra
ret
right before the function call.
32-bit vs. 64-bit Architecture (x86)
- 32-bits
- Parameter Passing: Via stack.
- Calling Convention: Arguments pushed onto the stack in reverse order (right to left).
- System Calls:
- Instruction: interrupt 128 (
int 0x80
). - Arguments: register
eax
for the system call number (0x0b = execve) followed byebx
,ecx
,edx
,esi
,edi
, andebp
accordingly.
- Instruction: interrupt 128 (
- Address Space: Uses entire memory address space.
- Words:
- word (16 bits)
- dword (32 bits)
- 64 bits
- Parameter Passing: Via registers (RDI, RSI, RDX, RCX, R8, R9) and then the stack.
- Calling Convention: Register order as mentioned above, followed by stack.
- System Calls:
- Instruction:
syscall
- Arguments: register
rax
for the system call number (0x3b = execve) followed byrdi
,rsi
,rdx
,r10
,r8
,r9
accordingly.
- Instruction:
- Address Space:
- Low canonical addresses: 0x0000000000000000 to 0x00007FFFFFFFFFFF (User space)
- High canonical addresses: 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF (Kernel space)
- Only 48 bits have meaningful virtual addresses.
- Words:
- word (16 bits)
- dword (32 bits)
- qword (64 bits)
BASIC ASSEMBLY
- Registers and their Operands
- eip (32 bits) / rip (64 bits)
- A pointer to the value that represents the next instruction to be executed.
- ebp (32 bits) / rbp (64 bits)
- Points to the “base” of a function used for relative position of memory inside the stack and allocate memory for locals. While RBP itself does not hold the return address, it often points to the stack position where the return address is stored.
- esp (32 bits) / rsp (64 bits)
- Is a pointer to the “top” of the stack (the lowers memory address). It work in conjunction with push and pop.
- eip (32 bits) / rip (64 bits)
- Constructs
- mov
- Writes data into “destination address” reading “source address” (in this order). The pointer is dereferenced (data value).
- lea
- Writes pointer value into “destination address” reading “source address” (in this order). The pointer value it self, not dereferenced.
- ret
- Return. Pops the address from the top of the stack into the instruction pointer, causing execution to return to that address.
- push
- Pushes a value onto the stack, decrementing the stack pointer.
- pop
- Pops a value from the stack, incrementing the stack pointer.
- call
- Calls a function by pushing the return address onto the stack and then jumping to the function’s address.
- comp
- Basically an IF statement. It compares by subtracting on value the second value (Right) from the first (Left).
- mov
- Operators
- or, and, and xor
- Perform bitwise OR, AND, and XOR operations respectively.
- mul
- Multiplies two operands.
- div
- Divides one operand by another.
- shl and shr
- Shifts bits to left and right respectively.
- rol and ror
- Rotates bits around when shifting right or left.
- nop
- No Operation. It does nothing and is often used for timing or alignment purposes.
- or, and, and xor
INFORMATION GATHERING
file fileName
- ELF
- In short: Linux executable.
- 80386/i386 or x86-64
- 32-bit or 64-bit architecture respectively.
- dynamically linked
- It uses system libraries. It does not contain the libraries in the binary.
- not stripped
- Makes it easier to analyze because it reveals the function names after disassembled because it contains debugging information in the binary.
- LSB
- Least Significant Bit = Little-endian (the bytes look to be in reverse order).
See also a list of file signatures that file
uses to identify the file type at [Link].
checksec --file=fileName
- RELRO
- “Relocation Read-Only” can be one of the following levels:
- “Full RELRO” – Does not allow GOT and PLT because they are all processed before the application stared and marked as read-only.
- “Partial RELRO” – Allows overwriting some elements because they are marked as read+write to be resolved at runtime.
- “No RELRO” – No protection at all but it is too uncommon to find in modern systems.
- “Relocation Read-Only” can be one of the following levels:
- Canary Stack
- When disabled says “No canary found”. Means, no protection against Buffer Overflow to the return address.
- NX
- Aka No eXecutable Bit, when enabled it doe snot allow executing shell-code from Stack or Heap.
- PIC
- Aka Position-independent Code, when enabled allows a code sharing (libraries) to be loaded at relative memory addresses.
- PIE
- Aka Position-independent Execution, when disable says “No PIE”. It means each time the program is executed it will get the same memory address.
- RUNPATH
- At runtime, the dynamic linker searches for its required shared libraries in the specified path prior to use the standard locations such as
/lib
and/usr/lib
.
- At runtime, the dynamic linker searches for its required shared libraries in the specified path prior to use the standard locations such as
strings fileName strings fileName -t x
- It will output any printable characters to the terminal.
- Useful for low hanging fruit information disclosure.
ldd fileName
- Finds all shared libraries used by the “dynamically linked” flag enabled during compilation.
readelf -l fileName readelf -a fileName
- Extracts information about the headers, sections, symbols, and other aspects of an ELF file.
nm fileName
- Lists symbols from object files (executable).
objdump -d fileName > decompiled.asm
- Used to disassemble an executable.
- Alternatively use the GUI software called Ghidra [Link].
- On Kali, it can be installed with
sudo apt install -y ghidra
- On Kali, it can be installed with
- Alternatively use the GUI software called Ghidra [Link].
ropper --file=fileName --search "pop rdi"
- Find all Gadgets present in the binary for use in ROP exploits [Link].
sudo -H python3 -m pip install ROPgadget ROPgadget --binary fileName
- Locates the memory address of a register with Radare2 [Link].
r2 fileName
DEBUGGING TOOLS
ltrace ./fileName
- Intercepts and shows library calls as the program runs.
- E.g.
gets
,puts
, orprintf
fromlibc
.
- E.g.
strace ./fileName
- Intercepts and shows system calls as the program runs.
- E.g.
open
,read
,write
,close
,fork
,execve
,exit
, ormmap
from the Kernel.
- E.g.
gdb ./fileName
- A very powerful debugger that deserves a cheat-sheet of its own.
- A much better way of navigating through
gdb
is by using the Python module calledgdb-pwndbg
[Link].
gdb-pwndbg fileName
- file fileName
- Loads the binary to be executed.
- info functions
- Lists the functions in the binary
- info registers
- Shows the content of all registers
- info proc mappings
- Displays memory mappings and details such as the starting address, ending address, permissions (rwx).
- disassemble main
- Disassembles a function.
- info stack
- Shows the stack of a running or crashed execution.
- backtrace
- Similar to
info stack
.
- Similar to
- break main
- Adds a break-point to a function.
- break *0xffffffff
- Adds a break-point to a memory position.
- delete break
- Removes break-points.
- x/s 0xffffffff
- Shows the content of the memory that the pointer points to.
- cyclic 100
- Creates a series of 100 characters for overflowing tests.
- Similar online tool [Link].
- cyclic -l XXXX
- Locate and calculate the offset to the segment of characters provided.
- run
- Executes the binary.
- If using break-points:
n
will baby-step to the “next” instruction.c
will “continue” to the next break-point or move with no no breaks.
- run < payload
- Loads a payload file into the standard input.
See this great tutorial for installing the tools Pwndbg + GEF + Peda at once [Link].
CRAFTING PAYLOADS MANUALLY
python2 -c 'print "A"*4 + "BBBB" + "\xef\xbe\xad\xde"' > payload
- For 32-bit, it will create: AAAABBBB plus the binary for the address 0xdeadbeef.
python2 -c 'print "A"*10 + "\xef\xbe\xad\xde\x00\x00\x00\x00"' > payload
- For 64-bit, it will create: AAAAAAAAAA plus the binary for the address 0x00000000deadbeef
shellcraft -l
- Lists the available payloads (assembly instructions) for getting code execution (or a “shell”) into the application runtime.
shellcraft i386.linux.sh
- Outputs the desired payload in HEX.
shellcraft i386.linux.sh -f a
- Outputs the payload in assembly.
See also msfvenom
for more payloads at [Link].
BONUS
upx -h upx -d fileName
- Tool for compressing and expanding executable files.
hexedit fileName
- View and edit files in hexadecimal or in ASCI.
Pwndbg + GEF + Peda
There is a tutorial for setting up these 3 tools at once [Link]. Highly recommended to have it ready in a Kali base image for Pentesting or CTF.
cd ~ && git clone https://github.com/apogiatzis/gdb-peda-pwndbg-gef.git cd ~/gdb-peda-pwndbg-gef ./install.sh
./update.sh
IDA Free
It is not open-source but this free tool, originally made for Windows but now offer Linux and MacOS builds, is probably the best visuals decompiler [Link].
wget https://out7.hex-rays.com/files/idafree84_linux.run chmod +x idafree84_linux.run ./idafree84_linux.run
Radare2 Cutter
A graphic user interface version of the well known reverse engineering framework [Link].
sudo apt install radare2-cutter -y
Complete summary of the binary on the landing Dashboard:
Disassemble with Graphics:
Two most popular decompilers out of the box:
Unarguably, it has an impressively slick interface.
Binary Ninja
It is not open source but offer a free version that can run on Windows, MacOS, and Linux [Link].
ROP Emporium
A must know free source of binary exploitation binaries in challenges [Link]. It also has a walk-through for building the knowledge base necessary to complete the challenges.
Pwnables
It started with a private CTF but became an OpenToAll CTF [Link]. Currently if contains more than 50 challenges that progressively increase in difficulty.
Libc Database Search
In order to find what functions are available in an specific library, a linked memory address might reveal what version of the library is being used. This database contains a search feature plus also provides the libraries them selves for local exploitation tests before executing it on a remote target.
More Vulnerabilities to Consider
- Integer Overflow/Underflow
- It occurs when arithmetic operations cause the number exceed its maximum binary value representation thus wraps around.
- Format String Vulnerability
- It is very specific to languages like C and its
printf
family of functions (thef
character stands for Format). If parameters are not properly sanitised, can lead to arbitrary memory read or write.
- It is very specific to languages like C and its
Fast Pace Reverse Engineering for CTFs
- PatchELF [Link]
- It modifies ELFs and libraries to add, remove, shrink, and alter paths, dependencies…
- PwnInit [Link]
- Automation for binary exploit challenges by setting binary to executable, downloading linker, debug symbols, and unstrip the libc, then patches the binary to change the RPATH and filling the template for the pwntools solve script.
BONUS OF THE BONUS
Running and debugging x86-32, ARM and MIPS application on x86-64:
sudo apt update sudo apt-get install qemu-user -y sudo apt install libc6-i386 gdb-multiarch -y
sudo apt install libc6-armel-cross -y sudo mkdir /etc/qemu-binfmt -p sudo ln -s /usr/arm-linux-gnueabi /etc/qemu-binfmt/arm qemu-arm-static ./armv5.bin qemu-arm ./armv5.bin
sudo apt install libc6-mipsel-cross -y sudo mkdir /etc/qemu-binfmt -p sudo ln -s /usr/mipsel-linux-gnu /etc/qemu-binfmt/mipsel qemu-mipsel-static ./armv5.bin qemu-mipsel ./armv5.bim