9

Given a binary and only using a tool like ndisasm, how can I find main()? I don't want to use smart tools like IDA Pro because I'm doing this exercise to learn.

perror
  • 19,083
  • 29
  • 87
  • 150
drum
  • 284
  • 1
  • 2
  • 9
  • 4
    I believe I already gave an answer to this question in the answer of this question: Reversing ELF 64-bit LSB executable, x86-64 ,gdb. Feel free to modify your question in order to ask for more details if you miss somethings. – perror Apr 25 '14 at 22:32
  • @perror And a fantastic answer it was. It's a shame when people don't even pay attention to the help you've given them. – Jonathon Reinhart Apr 25 '14 at 23:00
  • 1
    What platform/OS/compiler you want to handle? Give us some concrete examples. – Igor Skochinsky Apr 26 '14 at 00:05
  • @perror Great answer. Would it be much difference in Windows? Also, is it impossible to tell the entry point without this piece of information? – drum Apr 26 '14 at 04:06
  • Yes, it would be very different. In fact, what I describe in this answer is bind to the gcc compiler. So, if you consider a different compiler, the layout may be totally different. But, if you are look for the main function in a MS-Windows context, you need to edit your question to specify it. It will help to get a more accurate answer. – perror Apr 26 '14 at 08:33
  • @perror I have to correct you on that one. It is not bound to the compiler, rather to the binary file format : ELF, PE, ... which the compiler has nothing to do with. – yaspr Apr 26 '14 at 09:27
  • No, it is really bound to the compiler. I am not speaking about the entry point, but the location of the main procedure which takes place after the loading of the dynamic libraries. Each compiler has its own function to do this. So, it is bound to the compiler. – perror Apr 26 '14 at 09:57
  • @perror From that perspective I agree. But I suppose you should've specified it in your comment. Technically speaking the location of the main function isn't important as long as it exists & it is referenced. So, compilers can put it anywhere they want it will eventually be found. I didn't look at your first comment, I wouldn't have answered otherwise, nice job ! – yaspr Apr 26 '14 at 10:42

1 Answers1

11

This is quite tricky and necessitates a LOT of patience. I'll assume here that you're trying to find the main function as it is defined in C and not as the entry point of your program. It's very hard to find what you're looking for by scanning the code with your eyes & brain. But here's a way. What you can do is first check the header of the binary file you're trying to disassemble. Below you'll find the output of readelf -h on a random file. If the file isn't damaged (on purpose or not) you'll be able to find the Entry point address.

  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400440
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4680 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         8
  Size of section headers:           64 (bytes)
  Number of section headers:         35
  Section header string table index: 32

This address usually points to the location of the first chunk of code which will be executed at run time (_start function) and which will handle the main function parameters (or command line arguments) before calling the main function. Another technique would be to run your program under a debugger (GDB for instance) and go step by step.

I have to warn you though, if you're dealing with ELF binaries, things could turn out to be more complicated as they contain ctor and dtor tables which hold pointers to functions that are executed before and after the main function. You have also some undocumented weirdness going on when dealing with statically linked binaries. And of course, other programs can make do without a main function and call whatever they wish.

yaspr
  • 2,663
  • 14
  • 20
  • this example shows nicely that what we learn in programming class is wrong: a program starts at function _start , not main. in fact, you can do a LOT of stuff before main even gets called the first time – clockw0rk Sep 07 '22 at 12:11
  • 1
    Well, most programmers have no clue how a binary file is structured. They make assumptions based on "high level" programming models that could, and most likely will, end up to be wrong once you dig into the nitty gritty details of what is a computer program. That's why you shouldn't believe teachers or instructors and still RTFM yourself. – yaspr Sep 08 '22 at 11:49