Automated RE of Kernel Configurations

Kconfig (short for kernel configuration) is a component of the the Kbuild build system for the Linux kernel. The Linux kernel is highly customizable, and configuration is required to build the kernel and generate kernel headers. In this blog post, I am introducing a new Binary Ninja plugin that analyzes Linux kernel binaries to recover kernel configuration options.

There are many reasons that one might need to recover a Linux kernel configuration post-build. My inspiration for this project is to make it easier to generate kernel headers for building LKMs that will load on target Linux devices (where source isn’t available). Linux consists of multiple mechanisms to verify LKMs during load to ensure that they are compatible and won’t cause the kernel to become unstable. By recovering a Linux kernel’s build configuration, the kernel can be built and compatible kernel headers can be generated from the upstream source. These kernel headers can be used to build LKMs that will [hopefully] load on the target device.

Intro to Kconfig

Kbuild is the Linux kernel build system. It primarily exists to parse the Kconfig macro language and set the proper flags (based on the user-provided configuration options) during build. Under the hood, it uses GNU make. The first step when building Linux is to create the .config file. This is the configuration. During build these options are used to set C preprocessor definitions, define symbols, and more. A more thorough explanation of the kernel build process can be found here. The rest of this section is focused solely on the format of the generated .config file.

Linux build configuration begins by specifying the architecture for the platform the kernel is intended to run on. When the architecture is specified, Kbuild processes the corresponding Kconfig file. The Kconfig file consists of a custom macro language that Kbuild uses to know which configuration options to set automatically and which options to ask the user to set. Tools like menuconfig build a tree-like menu that the user can edit to change options. After all the options are supplied, the .config file gets generated. This is a text file that resembles the following format:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.19.208 Kernel Configuration
#

#
# Compiler: gcc-8 (Debian 8.3.0-6) 8.3.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80300
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT="4.19.0-18-amd64"
...

Reverse Engineering Configuration Options

Most configuration options can be recovered by analyzing the Linux kernel binary post-build. Doing this manually is a time intensive and tedious task, depending on how many options you need to reverse engineer. This section describes how you can reverse a config option manually, and how the Binary Ninja API can be leveraged to do it for you.

The first option I will use for demonstration is the CONFIG_BUILD_SALT configuration. By looking at the upstream Linux kernel source code we can determine that the CONFIG_BUILD_SALT is used to define the utsname version member. The source code also indicates that the sched_debug_header function supplies the utsname version string as the fourth argument for a call to seq_printf.

static void sched_debug_header(struct seq_file *m)
{
	u64 ktime, sched_clk, cpu_clk;
	unsigned long flags;

	local_irq_save(flags);
	ktime = ktime_to_ns(ktime_get());
	sched_clk = sched_clock();
	cpu_clk = local_clock();
	local_irq_restore(flags);

	SEQ_printf(m, "Sched Debug Version: v0.11, %s %.*s\n",
		init_utsname()->release,
		(int)strcspn(init_utsname()->version, " "),
		init_utsname()->version);

By proceeding to locate and analyze the sched_debug_header in the Linux kernel binary we can see that it corresponds with the code in the upstream kernel source, and can conclude that the fourth argument in the call to seq_printf is indeed a pointer to the utsname version string.

sched_debug_header call to seq_printf

If doing this manually, we would proceed to open our config file in a text editor and type CONFIG_BUILD_SALT="4.19.0-18-amd64". Instead, we’re going to use the Binary Ninja API to automate this operation by writing code that takes the following steps:

  1. Locate the sched_debug_header function
  2. Iterate through the function’s HLIL instructions to locate the first call to seq_printf
  3. Get the fourth argument for the call to seq_printf and verify that it is a pointer
  4. Get the string that the pointer is pointing to (the build version)
    def _recover_config_build_salt(self) -> str:
        syms = self.bv.get_symbols_by_name('sched_debug_header')
        if not syms:
            logging.error('Failed to lookup sched_debug_header')
            return None

        sched_debug_header = self.bv.get_function_at(syms[0].address)
        if not sched_debug_header:
            logging.error('Failed to get function sched_debug_header')
            return None

        syms = self.bv.get_symbols_by_name('seq_printf')
        if not syms:
            logging.error('Failed to lookup seq_printf')
            return None

        call_to_seq_printf = None
        for block in sched_debug_header.high_level_il:
            for instr in block:
                if instr.operation != HighLevelILOperation.HLIL_CALL:
                    continue

                if instr.dest.operation != HighLevelILOperation.HLIL_CONST_PTR:
                    continue

                if to_ulong(instr.dest.constant) == syms[0].address:
                    if len(instr.params) < 3:
                        logging.error(
                            'First call in sched_debug header is not to seq_printf!?'
                        )
                        return None

                    if instr.params[
                            2].operation != HighLevelILOperation.HLIL_CONST_PTR:
                        logging.error(
                            'param3 of seq_printf call is not a pointer')
                        return None

                    s = self.bv.get_ascii_string_at(
                        to_ulong(instr.params[2].constant))
                    if not s:
                        logging.error('Failed to get build salt string')
                        return None

                    return s.value

Thankfully, not all of the configuration options require analyzing code. Many configuration options can be determined based on the presence of a symbol for an exported function or global data variable. An example of this type of option is CONFIG_TICK_ONESHOT. By looking at the Linux upstream source code we can see that this option is used by a Makefile to determine whether or not to use the tick-broadcast-hrtimer.o object file as part of the kernel build.

obj-$(CONFIG_GENERIC_CLOCKEVENTS)		+= clockevents.o tick-common.o
ifeq ($(CONFIG_GENERIC_CLOCKEVENTS_BROADCAST),y)
 obj-y						+= tick-broadcast.o
 obj-$(CONFIG_TICK_ONESHOT)			+= tick-broadcast-hrtimer.o
endif

This means that if any symbols defined in tick-broadcast-hrtimer.c are in the resulting kernel build, then CONFIG_TICK_ONESHOT is set. Otherwise, it is not set. tick-broadcast-hrtimer.c exports the function tick_program_event. By writing code around the BN API, we can automate recovery of this option:

    def _set_if_symbol_present(self, name: str) -> ConfigStatus:
        if self.bv.get_symbols_by_name(name):
            return ConfigStatus.SET

    def _recover_config_tick_oneshot(self) -> ConfigStatus:
        return self._set_if_symbol_present('tick_program_event')

The code above attempts to lookup the tick_program_event symbol. If the lookup fails, then the CONFIG_TICK_ONESHOT configuration option is not set. If the lookup succeeds then it is set. There are many types of configuration options. A large portion of them can be knocked out using the symbol lookup method. Others require analyzing code, data structures, and more.

What about /proc/config.gz?

RE of the kernel binary is not always necessary to gain access to the Linux kernel configuration. Sometimes kernels are built with the following configurations options:

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y

Kernels built with the “in-kernel configuration support” bundle contain the kernel configuration file in the kernel binary. On the running system, the configuration is exposed to user-space at /proc/config.gz. In this scenario, the config.gz archive can be copied off of the device and used to reproduce the build. However, it is my experience that most distributed Linux kernels don’t use this configuration. Hence, why it is often necessary to resort to RE.

Introducing bn-kconfig-recover

I have released a Binary Ninja plugin, bn-kconfig-recover, to automate recovery of kernel configuration options. Currently, this plugin is able to recover configuration options for general setup, the IRQ subsystem, the timer subsystem, and CPU/Task time and stats accounting. To use the plugin, create a kernel Binary Ninja database (BNDB) populated with symbols for exports from the kernel symbol table. The datavars branch of my bn-kallsyms plugin can be used to help apply symbols from /proc/kallsyms. Other methods for applying symbols exists as well (see the vmlinux-to-elf project). After creating the kernel BNDB, run the bn_kconfig_recover.py script headless. Supply the path to the kernel BNDB and the path for the output config file.

Once it is complete, it will create a configuration file containing entries for all supported configuration options.

Plugin Limitations

There are a few limitations to this plugin. First, the plugin is not complete. There are thousands of Linux configuration options. Adding support for all configuration options is work in progress. I plan to continue to add support for more options a sub-system at a time. I will gladly accept pull requests from community contributors as well. Limitations to the approach itself includes:

  • Many of the configuration options are dependent on symbols. The Linux kernel must provide symbols for exported functions and data variables in the kernel symbol table to support loading LKMs. However, if the kernel is built without LKM support (like Android kernels), the kernel doesn’t need to provide symbols and is built without a kernel symbol table. In this scenario, symbols required by bn-kconfig-recover would need to be applied manually in the BNDB. Depending on your use-case this could be a non-starter.
  • There are many kernel versions. This plugin has only been tested on 4.* kernels for x86-64. Development was done using a 4.19 kernel. As development progresses, I will likely need to change config option-specific heuristics to support multiple kernel versions and architectures. For now, there may be false positives when running the plugin on newer 5.* or old kernels (< 3.*).
  • Not all kernel developers follow the rules. Often times, engineering teams make proprietary modifications to the Linux source code. This can potentially cause recovery of certain config options to be inaccurate.

Conclusion

Recovering Linux kernel configurations is one example of many tedious reverse engineering tasks that can be automated. I believe this is a worthwhile pursuit that can aid in many scenarios to include LKM development, kernel exploit development, and interface compatibility development. My Binary Ninja plugin can be found here. If you are interested in this tool feel free to follow the project, submit issues, and contribute pull requests. Thanks for reading!