Reverse Engineering Binaries is a very useful set of techniques that allow one attacker to extract sensitive information from or execute code into a local or remote executable.
BASIC ASSEMBLY KNOWLEDGE
Overview
- Memory Layout
- stack
- Contains the local variables, functions, and controls the program’s flow.
- It grows from a higher address to a lower address.
- heap
- Dynamically allocated memory (
malloc
). - It grows from a lower address to a higher address.
- Dynamically allocated memory (
- data
- Static or global values.
- code
- Set of instructions to be executed.
- stack
- Concepts
- Endianness
- Big-Endian
- Most significant bytes are stored first.
- Little-Endian
- Least significant bytes are stored first.
- Big-Endian
- Functions
- prologs
- Are sets of routines that happen in the start of a function to prepared to the time when it will need to return (end of the function).
- Shortcut:
enter
- epilogs
- Are sets of routines that happen in the end of a function to cleanup and returns.
- Shortcut:
leave
- prologs
- Canary (aka Stack Cookie or Stack Guard)
- Identifies when the function is done to identify if the memory was tampered if the canary values has changed.
- When there is a canary, it makes not possible to overwrite the return address and take control of the program’s execution flow.
- Return Oriented Programming (ROP)
- Is the idea of chaining together small snippets of assembly with stack control via buffer overflow to take control over the execution of the program.
- Procedure Linkage Table (PLT)
- Is a data structure which is a method of linking libraries and other modules to a program at run-time.
- Endianness
Basics
- Registers and their Operands
- eip (32 bits) / rip (64 bits)
- A pointer to the value that represents the next instruction to be executed.
- ebp (32 bits) / rbp (64 bits)
- Points to the “base” of a function used for relative position of memory inside the stack and allocate memory for locals.
- esp (32 bits) / rsp (64 bits)
- Is a pointer to the “top” of the stack (the lowers memory address). It work in conjunction with push and pop.
- eip (32 bits) / rip (64 bits)
- Constructs
- mov
- Writes data into “destination address” reading “source address” (in this order). The pointer is dereferenced.
- lea
- Writes pointer value into “destination address” reading “source address” (in this order). The pointer value it self, not dereferenced.
- rop
- Return-oriented Programming
- ret
- Return function. It pops the return address (pointer) from the stack.
- push
- Adds value into the Stack.
- pop
- Retrieves values from the Stack.
- call
- Calls a function.
- comp
- Basically an IF statement. It compares by subtracting on value the second value (Right) from the first (Left).
- mov
- Operators
- or, and, and xor
- OR, AND, and XOR respectively.
- mul
- Multiply values.
- div
- Divides values.
- shl and shr
- Shifts bits to left and right respectively.
- rol and ror
- Rotates bits around when shifting right or left.
- nop
- Does nothing.
- or, and, and xor
- 32 vs 64 bits
- 32 bits
- Parameters to functions are passed via Address Stack.
- Function arguments are pushed onto the stack in reverse order (right to left).
- Parameters to functions are passed via Address Stack.
- 64 bits
- Parameters to functions are passed via Registers.
- Calling convention (order of function parameters): RDI, RSI, RDX, RCX, R8, R9, [XYZ]MM0–7
- Parameters to functions are passed via Registers.
- 32 bits
INFORMATION GATHERING
file fileName
- ELF
- In short: Linux executable.
- 80386 or x86-64
- 32-bit or 64-bit architecture respectively.
- dynamically linked
- It uses system libraries. It does not contain the libraries in the binary.
- not stripped
- Makes it easier to analyze because it reveals the function names after disassembled.
- LSB
- Least Significant Bit = Little-endian (the bytes look to be in reverse order).
See also a list of file signatures that file
uses to identify the file type at [Link].
checksec --file=fileName
- RELRO
- Will not allow overwriting elements.
- Canary Stack
- When disabled says “No canary found”. Means, no protection against Buffer Overflow to the return address.
- NX
- Aka No eXecutable Bit, when enabled it doe snot allow executing shell-code from Stack or Heap.
- PIC
- Aka Position-independent Code, when enabled allows a code sharing (libraries) to be loaded at relative memory addresses.
- PIE
- Aka Position-independent Execution, when disable says “No PIE”. It means each time the program is executed it will get the same memory address.
- RUNPATH
- At runtime, the dynamic linker searches for its required shared libraries in the specified path prior to use the standard locations such as
/lib
and/usr/lib
.
- At runtime, the dynamic linker searches for its required shared libraries in the specified path prior to use the standard locations such as
strings fileName
- It will output any printable characters to the terminal.
- Useful for low hanging fruit information disclosure.
ldd fileName
- Finds all shared libraries used by the “dynamically linked” flag enabled during compilation.
readelf -l fileName readelf -a fileName
- Extracts information about the headers, sections, symbols, and other aspects of an ELF file.
nm fileName
- Lists symbols from object files (executable).
objdump -d fileName > decompiled.asm
- Used to disassemble an executable.
- Alternatively use the GUI software called Ghidra [Link].
- On Kali, it can be installed with
sudo apt install -y ghidra
- On Kali, it can be installed with
- Alternatively use the GUI software called Ghidra [Link].
ropper --file=fileName --search "pop rdi"
- Locates the memory address of a register.
r2 fileName
- (pending)
DEBUGGING TOOLS
ltrace ./fileName
- Intercepts and shows library calls as the program runs.
- E.g.
gets
,puts
, orprintf
fromlibc
.
- E.g.
strace ./fileName
- Intercepts and shows system calls as the program runs.
- E.g.
open
,read
,write
,close
,fork
,execve
,exit
, ormmap
from the Kernel.
- E.g.
gdb ./fileName
- A very powerful debugger that deserves a cheat-sheet of its own.
- A much better way of navigating through
gdb
is by using the Python module calledgdb-pwndbg
[Link].
gdb-pwndbg fileName
- file fileName
- Loads the binary to be executed.
- info functions
- Lists the functions in the binary
- disassemble main
- Disassembles a function.
- info stack
- Shows the stack of a running or crashed execution.
- backtrace
- Similar to
info stack
.
- Similar to
- break main
- Adds a break-point to a function.
- break *0xffffffff
- Adds a break-point to a memory position.
- delete break
- Removes break-points.
- cyclic 100
- Creates a series of 100 characters for overflowing tests.
- Similar online tool [Link].
- cyclic -l XXXX
- Locate and calculate the offset to the segment of characters provided.
- run
- Executes the binary.
- If using break-points:
n
will baby-step to the “next” instruction.c
will “continue” to the next break-point or move with no no breaks.
- run < payload
- Loads a payload file into the standard input.
See this great tutorial for installing the tools Pwndbg + GEF + Peda at once [Link].
CRAFTING PAYLOADS MANUALLY
python2 -c 'print "A"*4 + "BBBB" + "\xef\xbe\xad\xde"' > payload
- For 32-bit, it will create: AAAABBBB plus the binary for the address 0xdeadbeef.
python2 -c 'print "A"*10 + "\xef\xbe\xad\xde\x00\x00\x00\x00"' > payload
- For 64-bit, it will create: AAAAAAAAAA plus the binary for the address 0x00000000deadbeef
shellcraft -l
- Lists the available payloads (assembly instructions) for getting code execution (or a “shell”) into the application runtime.
shellcraft i386.linux.sh
- Outputs the desired payload in HEX.
shellcraft i386.linux.sh -f a
- Outputs the payload in assembly.
See also msfvenom
for more payloads at [Link].
BONUS
upx -h upx -d fileName
- Tool for compressing and expanding executable files.
hexedit fileName
- View and edit files in hexadecimal or in ASCI.
Pwndbg + GEF + Peda
There is a tutorial for setting up these 3 tools at once [Link]. Highly recommended to have it ready in a Kali base image for Pentesting or CTF.
cd ~ && git clone https://github.com/apogiatzis/gdb-peda-pwndbg-gef.git cd ~/gdb-peda-pwndbg-gef ./install.sh
./update.sh
IDA Free
It is not open-source but this free tool, originally made for Windows but now offer Linux and MacOS builds, is probably the best visuals decompiler [Link].
wget https://out7.hex-rays.com/files/idafree84_linux.run chmod +x idafree84_linux.run ./idafree84_linux.run
Radare2 Cutter
A graphic user interface version of the well known reverse engineering framework [Link].
sudo apt install radare2-cutter -y
Complete summary of the binary on the landing Dashboard:
Disassemble with Graphics:
Two most popular decompilers out of the box:
Unarguably, it has an impressively slick interface.
ROP Emporium
A must know free source of binary exploitation binaries in challenges [Link]. It also has a walk-through for building the knowledge base necessary to complete the challenges.
Libc Database Search
In order to find what functions are available in an specific library, a linked memory address might reveal what version of the library is being used. This database contains a search feature plus also provides the libraries them selves for local exploitation tests before executing it on a remote target.
More Vulnerabilities to Consider
- Integer Overflow/Underflow
- It occurs when arithmetic operations cause the number exceed its maximum binary value representation thus wraps around.
- Format String Vulnerability
- It is very specific to languages like C and its
printf
family of functions (thef
character stands for Format). If parameters are not properly sanitised, can lead to arbitrary memory read or write.
- It is very specific to languages like C and its