Crash Harnessing with Injected Code

Brandon Miller

There are many approaches to harnessing programs and instrumenting them for crash analysis and memory profiling. Each technique has benefits and drawbacks. Emulation is often the most reliable method, but requires the largest sacrifice in performance. Specialized hardware such as modern Intel processors can provide code coverage, but doesn’t necessarily provide the ability to profile memory or monitor heap usage. There is also more advanced techniques such as binary re-compilation using frameworks such as McSema/Remill and Egalito that lift compiled code to an intermediate representation to apply instrumentation and re-compile. In this blog post I describe an alternative, yet simple, proof-of-concept to harness and add basic instrumentation to a target program by using a combination of ptrace-based techniques and code injection to profile memory and monitor for crashes. The end result is a crash harness and injected shared object that hooks imported functions to profile dynamic memory and detect scenarios such as heap buffer overflow and use-after-free conditions.

Process Trace and LD_PRELOAD

Before diving into more complex implementation details, I’d like to describe the ptrace system call and LD_PRELOAD trick, two Linux operating system features that I based my design around. ptrace, or process trace, is a Linux system call that aids in debugging a running process. The best example of software that uses the ptrace system call is the GNU Debugger (GDB). ptrace allows for attaching to a remote process to trap system calls, write to virtual memory, change registers values, and more. LD_PRELOAD is an environment variable that when supplied instructs the Linux dynamic linker to load a shared object from the specified file path before all other imported libraries. An example of software that abuses the LD_PRELOAD trick is the Jynx rootkit.

Writing a Crash Harness

I started by developing a simple crash harness for x86-64 executables. The harness is designed like strace in that you run the harness and the harness runs the target program. This is acheived by forking and allowing the child process to attach to itself using PTRACE_TRACEME before executing the target program. The parent process calls waitpid in a loop to monitor the child’s status.

int main(int argc, char **argv)
{
    int ret = 1;
    int pid;

    if (argc < 2) {
        printf("./ich [cmd]\n");
        return 1;
    }

    if (init_crash_harness())
        return 1;

    pid = fork();
    if (!pid) {
        /* This won't return */
        spawn_process(&argv[1]);
    } else {
        if (!monitor_execution(pid)) {
            display_crash_dump(pid);
        }
        ptrace(PTRACE_DETACH, pid, 0, 0);
    }

    return 0;
}

If the parent process receives a SIGSEGV from the child process it creates a crash dump displaying register values and virtual memory content. It also queries and dumps the base of the ELF by reading from rip into lower memory using PTRACE_PEEKDATA until the ELF header signature is discovered. The crash monitoring functionality is not unlike many other Linux crash harnesses.

LD_PRELOAD Code Injection

After writing the core of the crash harness, I focused on adding functionality to assist in injecting a shared object into the target program at runtime using the LD_PRELOAD trick. This trick is fairly simple and can be carried out from bash by executing command similar to the line below.

$ export LD_PRELOAD=/path/to/sauce.so && ./some_program

To avoid having to define the LD_PRELOAD environment variable manually during each run, I added LD_PRELOAD to environ (the harness’ environment) and linked in the shared object using .incbin. When running the harness, the shared object (described in the next section) is written to disk at a temporary path. The harness code that spawns the target process is below.

static void spawn_process(char **argv)
{
    char **env = NULL;
    char preload_env[256];
    size_t i = 0;

    memset(preload_env, '\0', sizeof(preload_env));
    snprintf(preload_env, sizeof(preload_env), "LD_PRELOAD=%s", HOOK_LIB_PATH);
    info("Setting up the environment: %s", preload_env);

    /* Get count */
    while (environ[i] != NULL)
        i++;
    env = (char **)malloc(i * sizeof(char *));

    /* Copy the environment variables */
    i = 0;
    while (environ[i] != NULL) {
        env[i] = environ[i];
        i++;
    }

    /* Append LD_PRELOAD */
    env[i] = preload_env;
    env[i+1] = NULL;

    info("Executing process (%s) ...\n", argv[0]);
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    kill(getpid(), SIGSTOP);
    execve(argv[0], argv, env);

    /* execve only returns on failure */
    err("Failed to execute binary");
    exit(1);
}

Instrumentation Payload (Shared Object)

At this point, I had a crash harness that is capable of pre-loading a shared object into the debuggee and monitoring the debuggee for crashes. Next, I focused on developing the shared object that gets injected into the debuggee to hook libc imports and profile memory. I started by writing functions with identical names and prototypes as libc imports I wanted to hook. Remember, the harness executes the target program with LD_PRELOAD, which tells the dynamic linker to link in this shared object before libc.so (and any other library). The dynamic linker fills in the global offset table with offsets to the pre-loaded shared object’s versions of the target functions. As such, when the program calls malloc it execute’s the pre-loaded shared object’s malloc which reserves 8 bytes of additional memory at the beginning and end of the allocation to write tags (known sequence of random bytes). It does this by adding 16 bytes to the size parameter for malloc before calling the real libc:malloc using the following macro:

#define LOAD_SYM(sym, type, name) {         \
    if (!sym) {                             \
        sym = (type)dlsym(RTLD_NEXT, name); \
        if (!sym)                           \
            FAIL();                         \
    }                                       \
}

When libc:malloc returns, my malloc writes an 8 byte tag at the beginning and end of the allocation. It also stores metadata on tagged allocation in a global linked list. Then, it increments the return pointer by 8 (past the start of the tag) and returns back to the target program. Likewise, with free, my free iterates through the linked list to see if the allocation is tagged. If it is, it decrements the pointer by 8 bytes and calls libc:free to properly free the allocation.

In addition to malloc and free the pre-loaded shared object also hooks copy imports such as memcpy, strncpy, and more. My memcpy calls libc:memcpy and then iterates through the global linked list containing metadata on tagged allocations and checks if any tags have been altered. If a tag is altered, the pre-loaded library forcefully crashes the program by executing an illegal instruction. This causes the crash harness to emit a crash dump.

Conclusion

There are many limitations to this approach. For example, dynamic memory can be modified without using imported functions. If this occurs and a tag is tainted, the instrumentation library will not detect it until the next time a hooked import is called by the program. Moreover, an OOB write could occur before the start tag or after the end tag in which case unless it causes additional memory corruption or an access violation, the program could continue to run. Also, the LD_PRELOAD trick does not work for static libraries and some linkers (such as Android’s) doesn’t support it. Limitations aside, this approach can be adapted for other use cases such as visualizing heaps or altering execution of a program for debugging purposes. It can be used in combination with other binary instrumenting techniques.

All code described in this blog post is contained in my crash harness, ich, which can be found here. Thank you for reading!