Fuzzing SEGA Genesis Emulators

Game console emulators are complex pieces of software, made up of diverse sections of code. I recently found the urge to learn Golang and chose to write a fuzzer for SEGA Genesis emulators. After the fuzzer began shaking out bugs I found myself getting lost in Binary Ninja studying emulator internals. In this blog post I introduce RetroFuzz, disclose some of the bugs that it found, and describe analysis done with Binary Ninja scripting.

What is a SEGA Genesis Console Emulator? 

There are entire books written on emulation. In the context of this blog post, the term emulator refers to software-based emulators that emulate SEGA Genesis hardware and allow you to play its games on a traditional desktop running an Intel processor. SEGA console emulation consists of multiple facets to include code emulation, timing, interrupt handling, hardware component emulation, and much more. Code emulation is the concept of parsing and executing M68000 and Z80 code from the ROM. Timing includes emulating the strict timing requirements of the console by counting cycles and ensuring interrupts occur at the right moments to allow sprites and audio to be rendered properly and prevent lag. Interrupt handling consists of emulating hardware interrupts and ensuring the proper callback is invoked in the table. Hardware emulation consists of emulating the video display processor, audio chips, data buses, and more. All of these components must integrate seamlessly to run SMD/SG ROM’s. 

RetroFuzz 

RetroFuzz is a framework that I developed to automate fuzzing SEGA Genesis emulators by mutating SG ROM inputs and executing emulators under a very simple harness. RetroFuzz is no AFL. It’s not coverage guided, nor is it high performance. It is simple and slow, but has proved effective at what I designed it to do. I’ve designed RetroFuzz to be modular such that I can improve each component individually as time permits. The figure below depicts RetroFuzz’s current design. 

CLI 
Mutator 
Controller 
Exporter 
Worker 
Interactive UI 
Executer
RetroFuzz Core Design

RetroFuzz consists of a command line interface (CLI) that resembles AFL’s. It takes the path to an input template ROM, path to an output directory, and command string. You also have the option of specifying the amount of workers (tests to run in parallel) and a timeout for each test. For example, to fuzz Kega Fusion you might supply the following command: 

.\retrofuzz.exe -in .\jump.bin -out .\out\ -cmd "C:\Fusion\Fusion.exe @@" 

The @@ sub-string is replaced by the path to each generated input. The controller is responsible for spawning workers and keeping the interactive UI’s information updated. The controller repeatedly spawns workers in a wait group and waits on the wait group to complete after the worker limit is reached. The workers invoke the mutator to modify the template input ROM based on the mutation stage. Currently, the mutator stages consist of byte flips, nibble flips, bit flips, and an algorithm that generates M68000 instructions (basically just iterates the first 16 bits of the instruction at program start from 0-65535). After the ROM has been mutated, the worker exports the file to disk into the output directory. The worker then tasks the executor to execute the target emulator against the exported ROM. Currently, the execution logic is very simple. It leverages the Windows API by calling CreateProcess to execute the emulator as a child process. It then calls WaitOnSingleObject with a timeout to wait until the process has terminated or the timeout has been reached. If WaitOnSingleObject returns WAIT_OBJECT_0, RetroFuzz interprets the behavior as a crash. This inference can be made because the emulators fuzzed thus far are designed to continue to run even if the ROM fails to load. Therefore, the only reason the child process should signal the event is if it crashes. 

RetroFuzz Interactive UI

Fuzzing Results 

I didn’t expect the fuzzer to be very effective in its current state. I was reluctant to use the fuzzer heavily until I improved its components. However, at the end of each night I began running RetroFuzz against a emulator. Many of the emulators leak memory and bring the host to its knees within an hour of kicking off a run. Of the emulators that didn’t bog down my laptop, RetroFuzz began logging tens to hundreds of crashes. The two emulators that I’ve tested against so far are Kega Fusion and BlastEm. I have only analyzed a handful of crashes under a debugger. Many of the crashes were caused by the same bug, or similar bugs residing in multiple areas of the code. The most common problem area seems to reside in the M68000 instruction parsing and emulation. It found a OOB fetch in Fusion, a stack alignment bug in BlastEm that results in a overflow, and more.

OOB Fetch in Kega Fusion 

Bugs in Kega Fusion are especially interesting to me because Kega Fusion is closed source and probably the most popular SEGA Genesis emulator. RetroFuzz consistently crashed Kega Fusion during the bitflip stage while iterating through the ROM code. See the M68000 instruction at 0x22c. 

start : 
00000218 
øøøøø222 
øøøøø228 
øøøøø22c 
46fc270Ø 
13fc004000a1ØØ09 
13fc004000a1ØØØ3 
41faOb58 
3Ø3c0017 
323c800ø 
1218 
33c10øcØØ004 
06410100 
51c8fbf2 
move. w 
b 
move . 
move 
lea. 1 
move. w 
move. w 
b 
move. 
move. w 
addi.w 
dbf 
#340, ($a1ØØØ9) . 1 
#340, ($a1ØØØ3) . 1 
($ØØØØØd6e) , 
#317, dB 
(aØ)+, dl {Øxd6e} 
dl, ($CØØØØ4) . 1 
dB, ($-ØØØØ1eØ) 
{Øxd6f}
M68000 DBF Instruction that Triggers Kega Fusion Bug

When the fuzzer flipped a bit it changed the instruction at 0x22c from dbf d0, ($0x200) to dbf d0, ($-00001e0). The dbf instruction is commonly used in M68000 for loops. It first tests the condition. If the d0 register is -1, it continues to the next instruction at 0x230. If it is greater than -1, it takes a PC-relative branch by calculating the sum of the PC and the signed 16-bit integer contained in the opcode. In the case above, the 16-bit integer is 0xfbf2, or -1038. PC is 0x22e. This results in a branch destination of –0x1e0, 480 bytes before the start of the ROM. Kega Fusion reads the ROM into a heap buffer. This occurs early on in execution so the ROM is almost always near the base of its heap. You might see where this is going. 

Kega Fusion M68000 DBF Emulation Bug

Kega Fusion’s M68000 dbf instruction handler does no bounds checking to ensure that PC points within the bounds of the ROM. In this case, Fusion’s M68000 ROM code pointer decremented to a bad address, before the base of the heap. Fusion proceeded to crash after attempting to fetch the next instruction.

Additional Analysis

The OOB fetch bug lead me to a huge dispatch table in Kega Fusion that peaked my interest. This dispatch table contains entries for 65535 callback routines. Each routine emulates a specific M68000 instruction. Fusion’s M68000 instruction emulator starts by fetching the first 16 bits of the instruction from the ROM. It then uses this value as an index into the dispatch table to look up and invoke its callback. Fusion doesn’t contain symbols so I came up with an idea to apply names to the M68000 instruction routines. My idea was to write a Binary Ninja script that takes the following actions:

  1. Generate M68000 code in a byte array by iterating the first 16-bits from 0x0-0xFFFF and appending random data 
  2. Attempt to disassemble the opcode out of band
  3. If the opcode disassembles, parse the instruction text and extract the operation string 
  4. Construct a symbol in the format <operation>_<opcode>_68k_to_x86 
  5. Use the first 16-bits of the opcode byte array to lookup the function in Kega Fusion’s dispatch table 
  6. Apply the symbol/name to the function 
from binaryninja import *

class FusionSymbols(BackgroundTaskThread):
    TRANSLATION_TABLE_BASE = 0x4f11b0

    def __init__(self, bv):
        BackgroundTaskThread.__init__(self, "", True)
        self.bv = bv
        self.br = BinaryReader(bv)

    def build_dispatch_table(self):
        uint32 = self.bv.parse_type_string("void *")[0]
        for i in range (0, 0xffff):
            # Assign a pointer type to the entry
            entry = FusionSymbols.TRANSLATION_TABLE_BASE+(i*4)
            self.bv.define_user_data_var(entry, uint32)

            # Get address of target
            self.br.seek(entry)
            target = self.br.read32()

            # Make code
            self.bv.add_function(target)

    def name_function(self, index, operation):
        operation = operation.replace('.', '_')
        name = '{}_{}_68k_to_x86'.format(operation, hex(index))
        entry = FusionSymbols.TRANSLATION_TABLE_BASE + (index * 4)
        self.br.seek(entry)
        target = self.br.read32()
        print('entry: {} target: {} name: {}'.format(hex(entry), hex(target), name))
        self.bv.define_user_symbol(Symbol(SymbolType.FunctionSymbol, target, name))

    def generate_instructions(self):
        for i in range (0, 0xffff):
            opcode = struct.pack('>H', i)
            opcode += b'\x41\x41\x41\x41\x41\x41\x41\x41' # A gross hack
            try:
                instr = Architecture['M68000'].get_instruction_text(opcode, 0)
            except Exception:
                # Many instructions are invalid so we continue on exceptions
                pass

            index = struct.unpack('>H', opcode[0:2])[0]
            operation = str(instr[0][0]).rstrip()
            self.name_function(index, operation)

    def run(self):
        self.build_dispatch_table()
        self.generate_instructions()
        show_message_box('functionnamer', 'All done')

This script successfully applied names to more than 90% of the 65535 entries in the table and helped tremendously to improve analysis of Kega Fusion’s instruction emulation by allowing me to lookup Fusion’s implementation for each M68000 instruction by operation name.

Conclusion

I have open sourced RetroFuzz. I am just scratching the surface of where I’d like to go with this project. I plan to write a harness around dbgeng, improve performance, and implement more tailored mutation algorithms. I also hope to write a utility to filter out redundant bugs and emit crash reports. Maybe I’ll even go after NES or other console emulators. For now, I have released a ROM that crashes both Kega Fusion and BlastEm. You can find the project here. I appreciate you taking the time to read my post and am interested to know if anyone else has success using the fuzzer. I look forward to writing additional posts on this subject as the journey progresses.