19

Note: I am surprised that this hasn't been asked before, and if it has I could not find it in a search.

I've been on tons of websites, I've read tons of articles, and I have heard tons of explanations. Most of them were good, but they were all either to broad or too complicated or just plain bad. So my question is , how does a compiler work?

If this is a difficult, broad question, please tell me. But if not, please answer the question.

Dynamic
  • 5,746
  • 4
    Too broad, at least the "How does it work" part. There are whole books written on that topic. – Oded Nov 08 '11 at 17:46
  • @Oded: Ok, all edited! – Dynamic Nov 08 '11 at 17:51
  • 1
    http://en.wikipedia.org/wiki/Compiler would be the Wikipedia link that is trivial to find, what specifically are you wondering? The question is broad enough that I'd be tempted to give the smart alec response of, "Compilers translate code from one language to another," as that is the general idea that has a lot of nuances within that once one starts to look at what does that really involve. – JB King Nov 08 '11 at 17:57
  • @JB King: I guess my question is "How does the compiler convert code from high-level to low-level" – Dynamic Nov 08 '11 at 17:59
  • May I suggest looking into loaders, linkers, and interpreters? Interpreters also convert code from high level to low level in a different way while loaders and linkers are used with compilers to create executables and libraries to some extent. – JB King Nov 08 '11 at 18:08
  • I think this is a valid question, which can be answered at a high level. – Dima Nov 08 '11 at 18:15
  • 1
    Yes and I think it has been answered pretty well. – Jeremy Nov 08 '11 at 18:28
  • 1
    Any explanation of how a compiler works will either be too broad or too complicated. It's a complicated subject, and the compilers classes were the hardest computer-related courses I ever took. – David Thornley Nov 08 '11 at 18:52
  • 1
    @David Of course compilers are complicated, and you cannot explain all the details of how they work here. However, I am sure you had a basic high-level understanding of what a compiler is or how it works before you took your compiler course. – Dima Nov 08 '11 at 19:06

2 Answers2

26

A compiler is a program that translates the source code for another program from a programing language into executable code.

The source code is typically in a high-level programming language (e. g. Pascal, C, C++, Java, Perl, C#, etc.). The executable code may be a sequence of machine instructions that can be executed by the CPU directly, or it may be an intermediate representation that is interpreted by a virtual machine (e. g. Java byte code).

In short, a compiler converts a program from a human-readable format into a machine-readable format.

As to how a compiler works, that is indeed complicated. There are books and university courses on the subject. I will attempt to briefly outline the main stages of the process, but this will be a very cursory overview.

  1. Lexing - break up the text of the program into "tokens". The tokens are the "words" of the programming language, such as identifiers (keywords, variable names, function names, etc.) or operators (=, *, &, etc.).
  2. Parsing - convert the sequence of tokens into a parse tree, which is a data structure representing various language constructs: type declarations, variable declarations, function definitions, loops, conditionals, expressions, etc.
  3. Optimization - evaluate constant expressions, optimize away unused variables or unreachable code, unroll loops if possible, etc.
  4. Translate the parse tree into machine instructions (or JVM byte code).

Again, I stress that this is a very brief description. Modern compilers are very smart, and, consequently, very complicated.

Dima
  • 11,832
  • 2
    Actually, it transform a language into another. Early C++ compiler did compile to C. The same goes for Vala compiler. Java compiler compiles into bytecode that isn't executable without a JVM's JIT compiler. – deadalnix Nov 08 '11 at 17:57
  • 1
    @deadalnix IMHO, the point is that you go from non-executable code to executable code. I would argue that C-front was not a compiler but a front-end to the C compiler. Or a stage in the compilation process, if you will. Virtual machines blur the boundary between "executable" and "non-executable", of course. Here I would simply consider executable code to be whatever goes into the virtual machine, like the byte code, and abstract away whatever goes on inside the VM, like JIT. – Dima Nov 08 '11 at 18:02
  • 1
    @Dima, it doesn't have to be from non-executable code to executable code. For instance you cannot execute JVM byte code directly on Windows machines. –  Nov 08 '11 at 21:29
  • 1
    @Thorbjørn Ravn Andersen: but the byte code is executable by the JVM. Isn't the whole point of a "virtual machine" to look like a real machine to the programmer? – Dima Nov 08 '11 at 22:04
  • @Dima, modern JVM's actually compile byte codes into machine code - yes, another compiler. Hence the distinction of "source language" and “executable code" is at best blurry. –  Nov 08 '11 at 22:08
  • @Thorbjørn Ravn Andersen: Yes, I know. But the question was posed at a high level of detail, at which we can just treat a JVM as a black box that executes byte code, without going into the details of JIT, etc. Just as we can simply say that a CPU executes machine code produced by a C compiler, without going into the details of microcode. – Dima Nov 09 '11 at 00:59
  • 2
    I would argue that traditionally a compiler converted a program from a human-readable format into a machine-readable format, just as Dima said. Variations such as Cfront converting C++ to C or javac converting Java to bytecode are more advanced topics which should probably be left until after explaining the basic, traditional concept to someone unfamiliar with it. – Carson63000 Nov 09 '11 at 03:03
  • Today a reasonably large number of compilers don't compile to executable code; many compile from one high-level language to another. Here's an incomplete still impressive list of languages that can be compiled to javascript: https://github.com/jashkenas/CoffeeScript/wiki/List-of-languages-that-compile-to-JS – Aloso Mar 29 '17 at 02:36
5

A compiler is a computer program (or set of instructions) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

Compilers bridge source programs in high-level languages with the underlying hardware. A compiler requires:

  1. Determining the correctness of the syntax of programs
  2. Generating correct and efficient object code
  3. Run-time organization
  4. Formatting output according to assembler and/or linker conventions.