25

Assuming that I have binary file with code for an unknown CPU, can I somehow detect the CPU architecture? I know that it depends mostly on the compiler, but I think that for most CPU architectures it should be a lot of CALL/RETN/JMP/PUSH/POP opcodes (statistically more than others). Or maybe should I search for some patterns in code specific for a particular CPU (instead of opcode occurrences)?

tripleee
  • 119
  • 6
n3vermind
  • 385
  • 3
  • 10
  • 1
    If you have a binary file but don't know for which CPU, how can you see opcodes? If you know how to translate from binary to opcode, then you already know which CPU you have. (Or at least which family -- e.g. Z80, Intel, ARM, Motorola MC-680XX.) – Jongware Oct 08 '13 at 12:48
  • Read the magic, then the file format. – Stolas Oct 08 '13 at 13:03
  • 1
  • (Stolas) In embed often you don't have a magic or the magic is something they invented.
  • (Jongware) You can see opcodes (common patterns of bytes) without actually knowing what are them pretty much the same way you can determine if a file is compressed or encrypted without being able to decrypt or decompress it.
  • – joxeankoret Oct 08 '13 at 13:14
  • @jongware I think that you confuse opcode with assembler instruction. – n3vermind Oct 08 '13 at 14:11
  • @n3vermind: .. if you don't know the CPU, then how can you be sure you are looking at 'opcodes'? ARMs, for example, would be easy (all opcodes are 4 bytes and most start with 0xE0), except you have Thumb modes to consider. A statistic approach may work -- but you always have the code/data dichotomy that makes disassembling hard even when you know the CPU type. – Jongware Oct 08 '13 at 15:04
  • @Jongware you're absolutely right, but it is a rather different problem from that of the subject. Anyway the first, to my mind, is to investigate entropy of the binary. – n3vermind Oct 08 '13 at 20:00