Fuzzing SEGA Genesis Emulators

Game console emulators are complex pieces of software, made up of diverse sections of code. I recently found the urge to learn Golang and chose to write a fuzzer for SEGA Genesis emulators. After the fuzzer began shaking out bugs I found myself getting lost in Binary Ninja studying emulator internals. In this blog post I introduce RetroFuzz, disclose some of the bugs that it found, and describe analysis done with Binary Ninja scripting.

What is a SEGA Genesis Console Emulator? 

There are entire books written on emulation. In the context of this blog post, the term emulator refers to software-based emulators that emulate SEGA Genesis hardware and allow you to play its games on a traditional desktop running an Intel processor. SEGA console emulation consists of multiple facets to include code emulation, timing, interrupt handling, hardware component emulation, and much more. Code emulation is the concept of parsing and executing M68000 and Z80 code from the ROM. Timing includes emulating the strict timing requirements of the console by counting cycles and ensuring interrupts occur at the right moments to allow sprites and audio to be rendered properly and prevent lag. Interrupt handling consists of emulating hardware interrupts and ensuring the proper callback is invoked in the table. Hardware emulation consists of emulating the video display processor, audio chips, data buses, and more. All of these components must integrate seamlessly to run SMD/SG ROM’s. 

RetroFuzz 

RetroFuzz is a framework that I developed to automate fuzzing SEGA Genesis emulators by mutating SG ROM inputs and executing emulators under a very simple harness. RetroFuzz is no AFL. It’s not coverage guided, nor is it high performance. It is simple and slow, but has proved effective at what I designed it to do. I’ve designed RetroFuzz to be modular such that I can improve each component individually as time permits. The figure below depicts RetroFuzz’s current design. 

CLI 
Mutator 
Controller 
Exporter 
Worker 
Interactive UI 
Executer
RetroFuzz Core Design

RetroFuzz consists of a command line interface (CLI) that resembles AFL’s. It takes the path to an input template ROM, path to an output directory, and command string. You also have the option of specifying the amount of workers (tests to run in parallel) and a timeout for each test. For example, to fuzz Kega Fusion you might supply the following command: 

.\retrofuzz.exe -in .\jump.bin -out .\out\ -cmd "C:\Fusion\Fusion.exe @@" 

The @@ sub-string is replaced by the path to each generated input. The controller is responsible for spawning workers and keeping the interactive UI’s information updated. The controller repeatedly spawns workers in a wait group and waits on the wait group to complete after the worker limit is reached. The workers invoke the mutator to modify the template input ROM based on the mutation stage. Currently, the mutator stages consist of byte flips, nibble flips, bit flips, and an algorithm that generates M68000 instructions (basically just iterates the first 16 bits of the instruction at program start from 0-65535). After the ROM has been mutated, the worker exports the file to disk into the output directory. The worker then tasks the executor to execute the target emulator against the exported ROM. Currently, the execution logic is very simple. It leverages the Windows API by calling CreateProcess to execute the emulator as a child process. It then calls WaitOnSingleObject with a timeout to wait until the process has terminated or the timeout has been reached. If WaitOnSingleObject returns WAIT_OBJECT_0, RetroFuzz interprets the behavior as a crash. This inference can be made because the emulators fuzzed thus far are designed to continue to run even if the ROM fails to load. Therefore, the only reason the child process should signal the event is if it crashes. 

RetroFuzz Interactive UI

Fuzzing Results 

I didn’t expect the fuzzer to be very effective in its current state. I was reluctant to use the fuzzer heavily until I improved its components. However, at the end of each night I began running RetroFuzz against a emulator. Many of the emulators leak memory and bring the host to its knees within an hour of kicking off a run. Of the emulators that didn’t bog down my laptop, RetroFuzz began logging tens to hundreds of crashes. The two emulators that I’ve tested against so far are Kega Fusion and BlastEm. I have only analyzed a handful of crashes under a debugger. Many of the crashes were caused by the same bug, or similar bugs residing in multiple areas of the code. The most common problem area seems to reside in the M68000 instruction parsing and emulation. It found a OOB fetch in Fusion, a stack alignment bug in BlastEm that results in a overflow, and more.

OOB Fetch in Kega Fusion 

Bugs in Kega Fusion are especially interesting to me because Kega Fusion is closed source and probably the most popular SEGA Genesis emulator. RetroFuzz consistently crashed Kega Fusion during the bitflip stage while iterating through the ROM code. See the M68000 instruction at 0x22c. 

start : 
00000218 
øøøøø222 
øøøøø228 
øøøøø22c 
46fc270Ø 
13fc004000a1ØØ09 
13fc004000a1ØØØ3 
41faOb58 
3Ø3c0017 
323c800ø 
1218 
33c10øcØØ004 
06410100 
51c8fbf2 
move. w 
b 
move . 
move 
lea. 1 
move. w 
move. w 
b 
move. 
move. w 
addi.w 
dbf 
#340, ($a1ØØØ9) . 1 
#340, ($a1ØØØ3) . 1 
($ØØØØØd6e) , 
#317, dB 
(aØ)+, dl {Øxd6e} 
dl, ($CØØØØ4) . 1 
dB, ($-ØØØØ1eØ) 
{Øxd6f}
M68000 DBF Instruction that Triggers Kega Fusion Bug

When the fuzzer flipped a bit it changed the instruction at 0x22c from dbf d0, ($0x200) to dbf d0, ($-00001e0). The dbf instruction is commonly used in M68000 for loops. It first tests the condition. If the d0 register is -1, it continues to the next instruction at 0x230. If it is greater than -1, it takes a PC-relative branch by calculating the sum of the PC and the signed 16-bit integer contained in the opcode. In the case above, the 16-bit integer is 0xfbf2, or -1038. PC is 0x22e. This results in a branch destination of –0x1e0, 480 bytes before the start of the ROM. Kega Fusion reads the ROM into a heap buffer. This occurs early on in execution so the ROM is almost always near the base of its heap. You might see where this is going. 

Kega Fusion M68000 DBF Emulation Bug

Kega Fusion’s M68000 dbf instruction handler does no bounds checking to ensure that PC points within the bounds of the ROM. In this case, Fusion’s M68000 ROM code pointer decremented to a bad address, before the base of the heap. Fusion proceeded to crash after attempting to fetch the next instruction.

Additional Analysis

The OOB fetch bug lead me to a huge dispatch table in Kega Fusion that peaked my interest. This dispatch table contains entries for 65535 callback routines. Each routine emulates a specific M68000 instruction. Fusion’s M68000 instruction emulator starts by fetching the first 16 bits of the instruction from the ROM. It then uses this value as an index into the dispatch table to look up and invoke its callback. Fusion doesn’t contain symbols so I came up with an idea to apply names to the M68000 instruction routines. My idea was to write a Binary Ninja script that takes the following actions:

  1. Generate M68000 code in a byte array by iterating the first 16-bits from 0x0-0xFFFF and appending random data 
  2. Attempt to disassemble the opcode out of band
  3. If the opcode disassembles, parse the instruction text and extract the operation string 
  4. Construct a symbol in the format <operation>_<opcode>_68k_to_x86 
  5. Use the first 16-bits of the opcode byte array to lookup the function in Kega Fusion’s dispatch table 
  6. Apply the symbol/name to the function 
from binaryninja import *

class FusionSymbols(BackgroundTaskThread):
    TRANSLATION_TABLE_BASE = 0x4f11b0

    def __init__(self, bv):
        BackgroundTaskThread.__init__(self, "", True)
        self.bv = bv
        self.br = BinaryReader(bv)

    def build_dispatch_table(self):
        uint32 = self.bv.parse_type_string("void *")[0]
        for i in range (0, 0xffff):
            # Assign a pointer type to the entry
            entry = FusionSymbols.TRANSLATION_TABLE_BASE+(i*4)
            self.bv.define_user_data_var(entry, uint32)

            # Get address of target
            self.br.seek(entry)
            target = self.br.read32()

            # Make code
            self.bv.add_function(target)

    def name_function(self, index, operation):
        operation = operation.replace('.', '_')
        name = '{}_{}_68k_to_x86'.format(operation, hex(index))
        entry = FusionSymbols.TRANSLATION_TABLE_BASE + (index * 4)
        self.br.seek(entry)
        target = self.br.read32()
        print('entry: {} target: {} name: {}'.format(hex(entry), hex(target), name))
        self.bv.define_user_symbol(Symbol(SymbolType.FunctionSymbol, target, name))

    def generate_instructions(self):
        for i in range (0, 0xffff):
            opcode = struct.pack('>H', i)
            opcode += b'\x41\x41\x41\x41\x41\x41\x41\x41' # A gross hack
            try:
                instr = Architecture['M68000'].get_instruction_text(opcode, 0)
            except Exception:
                # Many instructions are invalid so we continue on exceptions
                pass

            index = struct.unpack('>H', opcode[0:2])[0]
            operation = str(instr[0][0]).rstrip()
            self.name_function(index, operation)

    def run(self):
        self.build_dispatch_table()
        self.generate_instructions()
        show_message_box('functionnamer', 'All done')

This script successfully applied names to more than 90% of the 65535 entries in the table and helped tremendously to improve analysis of Kega Fusion’s instruction emulation by allowing me to lookup Fusion’s implementation for each M68000 instruction by operation name.

Conclusion

I have open sourced RetroFuzz. I am just scratching the surface of where I’d like to go with this project. I plan to write a harness around dbgeng, improve performance, and implement more tailored mutation algorithms. I also hope to write a utility to filter out redundant bugs and emit crash reports. Maybe I’ll even go after NES or other console emulators. For now, I have released a ROM that crashes both Kega Fusion and BlastEm. You can find the project here. I appreciate you taking the time to read my post and am interested to know if anyone else has success using the fuzzer. I look forward to writing additional posts on this subject as the journey progresses.

Hooking Golden Axe VBlanks

If you’re a 90’s kid you’ve likely played Golden Axe either at the arcade or on the SMD. You know what it’s like to have to attempt to kick the creepy little elusive elves for magic pots. If you haven’t played Golden Axe in a while (or at all), see below.

Golden Axe Theives

In this blog post, I’m going to demonstrate the methods that I used to pull off a relatively simple ROM hack to equip Ax Battler with unlimited magic pots. I also am releasing a Binary Ninja script that I wrote to enumerate PC-relative branch table patterns that were not picked up by auto-analysis.

Initial Analysis

For this hack I chose to use Binary Ninja. I started by cloning a fork of a third-party 68k architecture plugin for BN, found here. This fork has fixes for updates made to the API, improvements from the original, and true support for native 68000 (described later). After dropping the ROM in BN, I manually created the interrupt vector table and created functions at program start, the vblank handler, and hblank handler. My goal was to find the logic responsible for maintaining Ax Battler’s magic counter in hopes that I could make a simple patch to max it out and prevent it from decrementing (unlimited magic). After kicking off initial analysis, I realized BN was successful at finding only a small amount of code (Ghidra and IDA yielded even worse results). After some inspection I found that PC-relative jump and call tables are used all over. These tables are sort of unconventional in that the entries contain instructions instead of addresses. They are setup as follows:

0xd50    jsr $10(pc, d0.w)
...
0xd60    rte
0xd62    bra ($00000db6)
0xd66    bra ($00004196)
...

The instruction at 0xd50 is a PC-relative jump to a location in the branch table beginning at 0xd62 (determined by the index value in d0). Likely because the value of d0 can’t be reliably determined, BN, Ghidra, and IDA all terminate disassembly at the return instruction at 0xd60 (leaving 0xd62 and on as undefined). Unwinding these tables manually would have been a pain, so I wrote a script. The script works by operating on BN’s medium level IL (MLIL) representation of the code. I started by writing a function to extract the base addresses of branch tables (depicted below).

def find_call_tables(self):
    base_addrs = []
    for func in self.bv:
        for block in func.medium_level_il.ssa_form:
            for instr in block:
                # If it's a call or jump and the operand contains addition
                branch_ops = [
                    MediumLevelILOperation.MLIL_CALL_UNTYPED_SSA,
                    MediumLevelILOperation.MLIL_JUMP,
                    MediumLevelILOperation.MLIL_GOTO
                ]
                if instr.operation in branch_ops:
                    if type(instr.dest) == long:
                        continue

                    mlil_add = MediumLevelILOperation.MLIL_ADD
                    mlil_const = MediumLevelILOperation.MLIL_CONST
                    if instr.dest.operation == mlil_add:
                        if instr.dest.operands[0].operation == mlil_const:
                            base_addrs.append(instr.dest.operands[0].constant)
    return base_addrs

This function iterates all MLIL instructions in search of instructions containing a branch operation (MLIL_JUMP MLIL_GOTO, or MLIL_CALL_UNTYPED_SSA). When it locates a branch operation, it checks if there is an addition operation taking place to compute the destination. If so, it takes the value of the first operand in the addition operation, which is the address of the base of the table. The function returns a list of branch table base addresses. Next, I wrote a function to iterate the tables, check for bra instructions (depicted below).

def disas_call_tables(self, base_addrs):
    count = 0
    for addr in base_addrs:
        i = addr
        while True:
            self.br.seek(i)
            opcode = self.br.read32be()
            if (opcode >> 24) == 0x60:
                if not self.bv.get_function_at(i):
                    self.bv.add_function(i)
                    count += 1
            else:
                break
            i += 4

    return count

This function is not as flashy, but it gets the job done. It simply iterates the entries in the branch tables (reading a dword at a time). It checks if the top byte of the dword is 0x60, the first byte of a bra instruction. Any entry that contains 0x60 is disassembled, if it hasn’t been disassembled already. Once a dword is hit that isn’t a bra instruction, the next table is processed. As a result, the call target and any other routines in the code path are also disassembled. After running this script a few times, it located and disassembled over 500 bra instructions (and callee’s). This obviously lead to a lot more code that was previously unknown about.

Locating the Magic Count Offset

After reversing the routine handling logic statically, I decided to take snapshots of RAM from Gens (the emulator) before and after using magic to see if I could derive the offset of the magic counter in RAM from a binary diff. After looking through the diff, I found that offset 0xfe80 changed from 0x0101, to 0x0000 after using my single magic pot at the beginning of the first level.

RAM Binary Diff

This seemed like a good candidate. I searched for references to 0xfffe80 (RAM begins at 0xff0000), and I immediately found a move.w #$100,($fffe80).w instruction. I suspected that this instruction might be the intial magic pot value that is set at the beginning of each level. Using BN, I ,changed the instruction in the hex view to move.w #$100, $(fffe80).w, exported the binary, and reset the emulator. Sure enough, Ax started the level with 6 magic pots. This was a step in the right direction, but I wanted to always have 6 magic pots. There were a few other references to 0xfffe80 that I likely could have patched/nopped to acheive my goal. Instead, I chose to place a hook in the vertical blank interrupt handler routine to branch to unused space at the end of the ROM. This method would provide the ability to execute my own code on every vblank interrupt. To add this hook, I searched for a good 6 byte instruction that I could overwrite with a jsr instruction. I like using jsr instructions (over bsr) because they are easy to apply in hex editors since the absolute address of the destination is contained in the bottom dword of the instruction. I found a 6-byte btst #1, ($ffc183).w instruction at 0xd36 that seemed like a good candidate to overwrite for my hook. I overwrote the instruction with a branch to an offset at the end of the ROM.

Trampoline to Hook Handler

After overwriting the btst instruction, I patched in the handler routine to execute the instruction that I overwrote, write 0x600 to the magic pot counter, and return back to the vblank code.

Hook Handler

At this point, I had one last step. Using Binary Ninja’s quick patch functionality, I edited the checksum routine to always branch to the code path for the good checksum (if you want to correct the checksum, bn-genesis contains functionality to do so).

Binary Ninja Quick Patch to Skip Checksum

Shoutouts and Additional Finds

While I was screwing around with Golden Axe, Jason Wright was hard at work improving Alex Forencich’s 68000 Binary Ninja plugin so he could do real work. Jason was working with binaries compiled for a later generation 68000 variant processor that consists of a 32-bit address bus. I explained issues with the plugin that I discovered with incorrect sign extension in move instructions on vanilla 68000, which consists of a 24-bit address bus. Jason took time to modify the plugin to correct the issue (with a minor contribution from myself). Many thanks to both Alex and Jason!

Interestingly, Ghidra made the same mistake. Incorrect disassembly of three consecutive move instructions are depicted below.

Ghidra Incorrect Disassembly

While this may be correct for 68040, I selected 68000 for the processor. These instructions should disassemble to:

00001a52 11  d8  fe  83    move.b     (A0)+,(DAT_fffe83 ).w
00001a56 11  d8  fe  80    move.b     (A0)+,(DAT_fffe80 ).w
00001a5a 31  fc  80        move.w     #-0x7ffe ,(DAT_ffc104 ).w
         02  c1  04

Conclusion

After exporting the patched binary from BN and throwing it into Gens, we now have unlimited magic to unleash on the Death Adder! I have added the call table enumeration functionality to bn-genesis. It can be found here. I also wrote a similar script for IDA, which can be found here. This script was not near as effective as the BN script, because it searches explicitly for jsr and jmp instructions, instead of operating on IL. I’m sure it could be made better by re-writing it to use IDA’s microcode. I’ll leave that as an excercise for the reader 🙂 As always, thanks for reading!

Unlimited Potion FTW!

SMD Hacking with Ghidra

It’s been two weeks since the National Security Agency publicly released GHIDRA, a powerful software reverse engineering framework. Despite that GHIDRA comes with Motorola 68000 and 6502 processor support, the NSA didn’t release any of their retro game hacking scripts leaving the community to wonder how they have been spending our tax dollars. Today, I’d like to share a couple GHIDRA scripts that I wrote along with this blog post to serve as an introduction to GHIDRA for the ROM hacking community.

Introduction to GHIDRA

GHIDRA is a powerful static software analysis tool similar to IDA Pro or Binary Ninja, tools used primarily for analyzing compiled code. It supports multiple architectures to include Motorola 68000, the processor used by the SEGA Genesis console. GHIDRA takes a IL approach by transforming native code into P-code, the tools intermediate representation. Much of the analysis is done on the IL, allowing the majority of its core features (to include the decompiler and integrated assembler) to work across the 20+ supported architectures. There are way to many great features to cover in a blog post, and I’m still learning myself. If you haven’t already, I encourage you to start with ghidra.re to learn the basics.

Loading a SEGA Genesis ROM

To get started, create a project and import a SEGA Genesis ROM binary. You will be prompted to provide the language. Select 68000 (the default variant). This is all that is required to load the binary. You should be ready to use the Code Browser. To do so, drag the binary over top of the dragon in the tool chest. The ROM will open in the Code Browser, and you will be asked if you want to run initial analysis on the binary. Click no. Because you are loading a flat file, initial analysis will not work correctly without telling GHIDRA more information on the file format. You have two options, do everything manually or download my loader script into your ghidra_scripts folder and run it with the script manager. The loader script will setup the vector table and ROM header before kicking off initial disassembly starting at the functions with entries in the vector table.

Patching the ROM

Now that the ROM has been loaded and initial analysis has been ran, we can analyze our ROM and modify instructions. In this blog post I’ll demonstrate a simple patch in Sonic The Hedgehog to jump to the end of the game. For clarity, I’ve labeled the MainGameLoop and the call targets for each game mode in the jump table. To branch to the end of the game let’s modify the branch instruction to the normal level game mode (GM_Level) to branch to GM_Ending, instead. To edit a instruction, simply click on the instruction and type Ctrl+Shift+G. Type a new M68K instruction and hit Enter. GHIDRA even has tab completion!

Checksum Fixup

Sonic, like most SEGA games, calculates the ROM checksum at runtime. Therefore, if you export the patched binary and try to run it in an emulator without fixing the checksum, it won’t run. Instead, you get a red screen. This leads me to my second GHIDRA script. A script that calculates and fixes the ROM checksum. Simply download the script into your ghidra_scripts directory and run it with the Script Manager. After calculating the new checksum, export the binary by hitting O and save it to a file. Then, load the patched ROM into your favorite emulator to be underwhelmed.