2

Let's say I want to reverse engineer an executable that interprets some data type. I want to see how the program interferes with the file, and what is stored. In the case that decompilation is not an option, I have disassembly and debugging left. With disassembly, I would have to look into 200,000 lines of assembly, which would be rather tedious, especially if I needed to hand code it back.

From my experience with debugging with gdb, all that I was able to do is see when a thread is being created, and inspecting the stack, both of which don't seem like very useful to me.

Is my approach to debugging incorrect? If it is, then what can you do with a debugger like gdb, avoiding paid debuggers or software, to find out a piece of information similar to the one I am trying to find? Can someone give me a pointer for the sake of orientation?

John K
  • 153
  • 1
  • 9

2 Answers2

1

A few other options for understanding file formats:

  • afl-analyze can help you analyze the file format, assuming you can instrument the binary with the normal afl instrumentation.

  • Taint analysis can often help you isolate the instructions which operate on the file data. There's a wide variety of tools possible here, like panda, triton, angr, and the pintracer in moflow.

broadway
  • 1,581
  • 8
  • 18
  • Alright, that might sound like an alternative to some of the common problems. However I am talking about something like a private developer who has made his own format for storing things. What I really want to achieve is find out how something is implemented through debugging. I might just try debug, backtrace, disassemble, interpret, repeat if that's the way to do it – John K May 21 '16 at 20:17
  • 1
    Most people probably set breakpoints on file io functions (assuming input is via file io) and trace through it from there trying to locate the primary parsing code. – broadway May 21 '16 at 20:18
  • I found the name of the function that is responsible for doing so. Do I just browse assembly code from the address onwards? – John K May 21 '16 at 20:20
  • 1
    Mostly, paying attention to the code which interacts with the buffer where the file data is stored. – broadway May 21 '16 at 20:28
1

99% of RE is figuring out what not to RE.

Assuming there are 200k lines of assembly - using a debugger can quickly narrow that down to a few thousand that are actually processing the data you're interested in. Try setting a breakpoint on syscalls like open() or read() to see when it's interacting with your file, and following the buffer that the data is read in to.

Once you locate where your data is coming in to the program, static analysis becomes a much less daunting task.

Tom Cornelius
  • 371
  • 1
  • 4