Reverse Engineering Binaries is a very useful set of techniques that allow one attacker to extract sensitive information from or execute code into a local or remote executable.


BASIC ASSEMBLY KNOWLEDGE

Overview

  • Memory Layout
    • stack
      • Contains the local variables, functions, and controls the program’s flow.
      • It grows from a higher address to a lower address.
    • heap
      • Dynamically allocated memory (malloc).
      • It grows from a lower address to a higher address.
    • data
      • Static or global values.
    • code
      • Set of instructions to be executed.
  • Concepts
    • Endianness
      • Big-Endian
        • Most significant bytes are stored first.
      • Little-Endian
        • Least significant bytes are stored first.
    • Functions
      • prologs
        • Are sets of routines that happen in the start of a function to prepared to the time when it will need to return (end of the function).
        • Shortcut: enter
      • epilogs
        • Are sets of routines that happen in the end of a function to cleanup and returns.
        • Shortcut: leave
    • Canary (aka Stack Cookie or Stack Guard)
      • Identifies when the function is done to identify if the memory was tampered if the canary values has changed.
      • When there is a canary, it makes not possible to overwrite the return address and take control of the program’s execution flow.
    • Return Oriented Programming (ROP)
      • Is the idea of chaining together small snippets of assembly with stack control via buffer overflow to take control over the execution of the program.
    • Procedure Linkage Table (PLT)
      • Is a data structure which is a method of linking libraries and other modules to a program at run-time.

Basics

  • Registers and their Operands
    • eip (32 bits) / rip (64 bits)
      • A pointer to the value that represents the next instruction to be executed.
    • ebp (32 bits) / rbp (64 bits)
      • Points to the “base” of a function used for relative position of memory inside the stack and allocate memory for locals.
    • esp (32 bits) / rsp (64 bits)
      • Is a pointer to the “top” of the stack (the lowers memory address). It work in conjunction with push and pop.
  • Constructs
    • mov
      • Writes data into “destination address” reading “source address” (in this order). The pointer is dereferenced.
    • lea
      • Writes pointer value into “destination address” reading “source address” (in this order). The pointer value it self, not dereferenced.
    • rop
      • Return-oriented Programming
    • ret
      • Return function. It pops the return address (pointer) from the stack.
    • push
      • Adds value into the Stack.
    • pop
      • Retrieves values from the Stack.
    • call
      • Calls a function.
    • comp
      • Basically an IF statement. It compares by subtracting on value the second value (Right) from the first (Left).
  • Operators
    • or, and, and xor
      • OR, AND, and XOR respectively.
    • mul
      • Multiply values.
    • div
      • Divides values.
    • shl and shr
      • Shifts bits to left and right respectively.
    • rol and ror
      • Rotates bits around when shifting right or left.
    • nop
      • Does nothing.
  • 32 vs 64 bits
    • 32 bits
      • Parameters to functions are passed via Address Stack.
        • Function arguments are pushed onto the stack in reverse order (right to left).
    • 64 bits
      • Parameters to functions are passed via Registers.
        • Calling convention (order of function parameters): RDI, RSI, RDX, RCX, R8, R9, [XYZ]MM0–7

INFORMATION GATHERING

file fileName
  • ELF
    • In short: Linux executable.
  • 80386 or x86-64
    • 32-bit or 64-bit architecture respectively.
  • dynamically linked
    • It uses system libraries. It does not contain the libraries in the binary.
  • not stripped
    • Makes it easier to analyze because it reveals the function names after disassembled.
  • LSB
    • Least Significant Bit = Little-endian (the bytes look to be in reverse order).

See also a list of file signatures that file uses to identify the file type at [Link].

checksec --file=fileName
  • RELRO
    • Will not allow overwriting elements.
  • Canary Stack
    • When disabled says “No canary found”. Means, no protection against Buffer Overflow to the return address.
  • NX
    • Aka No eXecutable Bit, when enabled it doe snot allow executing shell-code from Stack or Heap.
  • PIC
    • Aka Position-independent Code, when enabled allows a code sharing (libraries) to be loaded at relative memory addresses.
  • PIE
    • Aka Position-independent Execution, when disable says “No PIE”. It means each time the program is executed it will get the same memory address.
  • RUNPATH
    • At runtime, the dynamic linker searches for its required shared libraries in the specified path prior to use the standard locations such as /lib and /usr/lib.
strings fileName
  • It will output any printable characters to the terminal.
    • Useful for low hanging fruit information disclosure.
ldd fileName
  • Finds all shared libraries used by the “dynamically linked” flag enabled during compilation.
readelf -l fileName
readelf -a fileName
  • Extracts information about the headers, sections, symbols, and other aspects of an ELF file.
nm fileName
  • Lists symbols from object files (executable).
objdump -d fileName > decompiled.asm
  • Used to disassemble an executable.
    • Alternatively use the GUI software called Ghidra [Link].
      • On Kali, it can be installed with sudo apt install -y ghidra
ropper --file=fileName --search "pop rdi"
  • Locates the memory address of a register.
r2 fileName
  • (pending)

DEBUGGING TOOLS

ltrace ./fileName
  • Intercepts and shows library calls as the program runs.
    • E.g. gets, puts, or printf from libc.
strace ./fileName
  • Intercepts and shows system calls as the program runs.
    • E.g. open, read, write, close, fork, execve, exit, or mmap from the Kernel.
gdb ./fileName
  • A very powerful debugger that deserves a cheat-sheet of its own.
  • A much better way of navigating through gdb is by using the Python module called gdb-pwndbg [Link].
gdb-pwndbg fileName
  • file fileName
    • Loads the binary to be executed.
  • info functions
    • Lists the functions in the binary
  • disassemble main
    • Disassembles a function.
  • info stack
    • Shows the stack of a running or crashed execution.
  • backtrace
    • Similar to info stack.
  • break main
    • Adds a break-point to a function.
  • break *0xffffffff
    • Adds a break-point to a memory position.
  • delete break
    • Removes break-points.
  • cyclic 100
    • Creates a series of 100 characters for overflowing tests.
    • Similar online tool [Link].
  • cyclic -l XXXX
    • Locate and calculate the offset to the segment of characters provided.
  • run
    • Executes the binary.
    • If using break-points:
      • n will baby-step to the “next” instruction.
      • c will “continue” to the next break-point or move with no no breaks.
  • run < payload
    • Loads a payload file into the standard input.

See this great tutorial for installing the tools Pwndbg + GEF + Peda at once [Link].


CRAFTING PAYLOADS MANUALLY

python2 -c 'print "A"*4 + "BBBB" + "\xef\xbe\xad\xde"' > payload
  • For 32-bit, it will create: AAAABBBB plus the binary for the address 0xdeadbeef.
python2 -c 'print "A"*10 + "\xef\xbe\xad\xde\x00\x00\x00\x00"' > payload
  • For 64-bit, it will create: AAAAAAAAAA plus the binary for the address 0x00000000deadbeef
shellcraft -l
  • Lists the available payloads (assembly instructions) for getting code execution (or a “shell”) into the application runtime.
shellcraft i386.linux.sh
  • Outputs the desired payload in HEX.
shellcraft i386.linux.sh -f a
  • Outputs the payload in assembly.

See also msfvenom for more payloads at [Link].


BONUS

upx -h
upx -d fileName
  • Tool for compressing and expanding executable files.
hexedit fileName
  • View and edit files in hexadecimal or in ASCI.

Pwndbg + GEF + Peda

There is a tutorial for setting up these 3 tools at once [Link]. Highly recommended to have it ready in a Kali base image for Pentesting or CTF.

cd ~ && git clone https://github.com/apogiatzis/gdb-peda-pwndbg-gef.git
cd ~/gdb-peda-pwndbg-gef
./install.sh
./update.sh

IDA Free

It is not open-source but this free tool, originally made for Windows but now offer Linux and MacOS builds, is probably the best visuals decompiler [Link].

wget https://out7.hex-rays.com/files/idafree84_linux.run
chmod +x idafree84_linux.run 
./idafree84_linux.run

Radare2 Cutter

A graphic user interface version of the well known reverse engineering framework [Link].

sudo apt install radare2-cutter -y

Complete summary of the binary on the landing Dashboard:

 

Disassemble with Graphics:

Two most popular decompilers out of the box:

Unarguably, it has an impressively slick interface.

ROP Emporium

A must know free source of binary exploitation binaries in challenges [Link]. It also has a walk-through for building the knowledge base necessary to complete the challenges.

Libc Database Search

In order to find what functions are available in an specific library, a linked memory address might reveal what version of the library is being used. This database contains a search feature plus also provides the libraries them selves for local exploitation tests before executing it on a remote target.

  • libc database search [Link]
  • libc-database [Link]
  • search engine source code [Link]

More Vulnerabilities to Consider

  • Integer Overflow/Underflow
    • It occurs when arithmetic operations cause the number exceed its maximum binary value representation thus wraps around.
  • Format String Vulnerability
    • It is very specific to languages like C and its printf family of functions (the f character stands for Format). If parameters are not properly sanitised, can lead to arbitrary memory read or write.