76

Question:

"Certain properties of a programming language may require that the only way to get the code written in it be executed is by interpretation. In other words, compilation to a native machine code of a traditional CPU is not possible. What are these properties?"

Compilers: Principles and Practice by Parag H. Dave and Himanshu B. Dave (May 2, 2012)

The book gives no clue about the answer. I tried to find the answer on Concepts of Programming Languages (SEBESTA), but to no avail. Web searches were of little avail too. Do you have any clue?

Raphael
  • 72,336
  • 29
  • 179
  • 389
  • 32
    Famously, Perl can't even be parsed. Other than that, the claim seems to be trivially wrong without further assumptions: if there is an interpreter, I can always bundle interpreter and code in one executable, voila. – Raphael Sep 02 '14 at 11:04
  • 1
    To add context to my earlier comment: the question is moot because of the existence of a universal Turing machine (which carries over to every (conceptually) Turing-complete language. – Raphael Sep 02 '14 at 13:39
  • 4
    @Raphael: Nice idea, but ... 1) You're assuming the code is available prior to being executed. That doesn't hold for interactive use. Sure, you can use just-in-time compilation to native code on bash statements or PostScript stack contents, but it's a pretty crazy idea. 2) Your idea doesn't actually compile the code: the bundle isn't a compiled version of the code, but still an interpreter for the code. – reinierpost Sep 02 '14 at 16:53
  • 7
    In the good old days I had self editing gwbasic programs (gwbasic stores basic programs in a kind of bytecode). I currently can't think of a sane way to compile those to native machine code while retaining their ability to edit themselves. – PlasmaHH Sep 02 '14 at 20:03
  • 15
    @PlasmaHH: Self Modifying Code goes back to 1948. The first compiler was written in 1952. The concept of self-modifying code was invented in native machine code. – Mooing Duck Sep 02 '14 at 21:49
  • 10
    @reinierpost Raphael is taking a theoretical stand on this issue. It has the merit of showing the conceptual limitations of the question. Compiling is translation from language S to language T. Language T could be an extension of S to which interpreting code in some other languge can be added. So bundling S and its interpreter is a program in language T. It seems absurd to an engineer, but it shows that it is not easy to formulate the question meaningfully. How do you distinguish an acceptable compiling process from an unacceptable one (such as Raphael's) from an engineering point of view? – babou Sep 02 '14 at 22:23
  • 2
    @MooingDuck: So the compiler magically transforms logic for manipulating bytecode into logic for manipulating machine code? I don't think so. All you've proven is that the interpreted language doesn't have any unique capability not achievable in a compiled language... you haven't succeeded in compiling the first. – Ben Voigt Sep 03 '14 at 00:47
  • 1
    So far nobody here has defined the term "compile". However, the term "interpret" is clear: to interpret language $S$ in language $T$ means that we have a program $i$ in $T$, called the interpreter, such that $i(p)$ does in $T$ whatever $p$ does in $S$, where $p$ is a (syntactic representation of) program in $S$. Now, can someone define "compile" equally well? – Andrej Bauer Sep 03 '14 at 06:34
  • 1
    @babou: I'm not disputing the merit of this answer, but I think it overstates its point. Computers are not Turing machines and the difference is what makes this question interesting. Computers can do more than manipulate symbols on a tape; not all programs run on a single machine; not all machine languages are Turing complete; etcetera. – reinierpost Sep 03 '14 at 08:53
  • 1
    "but it's a pretty crazy idea" -- oh no, my shell is crazy! On a more serious note, if the application is interactive you just incorporate I/O into the bundle and forward the input to the interpreter. Also, my proposed program does compile: some pre-prepared code and possible dynamically input code are translated into machine instructions executed by the CPU. That said, I think all the comments and answers should be sufficient to show you that your question is ill-posed and one needs to be more careful about what exactly to ask. – Raphael Sep 03 '14 at 10:58
  • 2
    I find this somewhat curious: "Certain properties of a programming language may require that the only way to get the code written in it be executed is by interpretation. In other words, compilation to a native machine code of a traditional CPU is not possible. What are these properties?" What, I wonder, do they suspect happens to the native binaries when the computer executes them? Perhaps that the bytes run themselves? – Patrick87 Sep 03 '14 at 16:24
  • 1
    @MooingDuck: Sure, but my gwbasic program was modifying gwbasic bytecode. I somewhat doubt that it works unchanged when you compile it to native machine code. – PlasmaHH Sep 03 '14 at 18:30
  • 1
    @PlasmaHH: Ah, now I see why people are objecting to what I said. I interpreted you as saying self-modifying code could only be done in a scripting language. But instead, you meant exactly what you said: you can't think of a sane way to do it in a compiled language, making no claims about ASM. My bad. – Mooing Duck Sep 03 '14 at 18:36
  • You might be interested to learn about Befunge which was created specifically to make compilation hard, although apparently there still exist compilers for it. I have no idea how much they actually compile program logic into machine language, and how much is still interpreted at runtime. There are a number of Befunge programs in posts on PCG SE. – MvG Sep 05 '14 at 11:19

9 Answers9

63

The distinction between interpreted and compiled code is probably a fiction, as underlined by Raphael's comment:

the claim seems to be trivially wrong without further assumptions: if there is
an interpreter, I can always bundle interpreter and code in one executable ...

The fact is that code is always interpreted, by software, by hardware or a combination of both, and the compiling process cannot tell which it will be.

What you perceive as compilation is a translation process from one language $S$ (for source) to another language $T$ (for target). And, the interpreter for $S$ is usually different from the interpreter for $T$.

The compiled program is translated from one syntactic form $P_S$ to another syntactic form $P_T$, such that, given the intended semantics of the languages $S$ and $T$, $P_S$ and $P_T$ have the same computational behavior, up to a few things that you are usually trying to change, possibly to optimize, such as complexity or simple efficiency (time, space, surface, energy consumption). I am trying not to talk of functional equivalence, as it would require precise definitions.

Some compilers have been actually used simply to reduce the size of the code, not to "improve" execution. This was the case for language used in the Plato system (though they did not call it compiling).

You may consider your code fully compiled if, after the compiling process, you no longer need the interpreter for $S$. At least, that is the only way I can read your question, as an engineering rather than theoretical question (since, theoretically, I can always rebuild the interpreter).

One thing that may raise problem, afaik, is meta-circularity. That is when a program will manipulate syntactic structures in its own source language $S$, creating program fragment that are then intepreted as if they had been part of the original program. Since you can produce arbitrary program fragments in the language $S$ as the result of arbitrary computation manipulating meaningless syntactic fragments, I would guess you can make it nearly impossible (from an engineering point of view) to compile the program into the language $T$, so that it now generate fragments of $T$. Hence the interpreter for $S$ will be needed, or at least the compiler from $S$ to $T$ for on-the-fly compiling of generated fragments in $S$ (see also this document).

But I am not sure how this can be formalized properly (and do not have time right now for it). And impossible is a big word for an issue that is not formalized.

Futher remarks

Added after 36 hours. You may want to skip this very long sequel.

The many comments to this question show two views of the problem: a theoretical view that see it as meaningless, and an engineering view that is unfortunately not so easily formalized.

There are many ways to look at interpretation and compilation, and I will try to sketch a few. I will attempt to be as informal as I can manage

The Tombstone Diagram

One of the early formalization (early 1960s to late 1990) is the T or Tombstone diagrams. These diagrams presented in composable graphical elements the implementation language of the interpreter or compiler, the source language being interpreted or compiled, and the target language in the case of compilers. More elaborate versions can add attributes. These graphic representations can be seen as axioms, inference rules, usable to mechanically derive processor generation from a proof of their existence from the axioms, à la Curry-Howard (though I am not sure that was done in the sixties :).

Partial evaluation

Another interesting view is the partial evaluation paradigm. I am taking a simple view of programs as a kind of function implementation that computes an answer given some input data. Then an interpreter $I_S$ for the language $S$ is a program that take a program $p_S$ written in $S$ and data $d$ for that program, and computes the result according to the semantics of $S$. Partial evaluation is a technique for specializing a program of two arguments $a_1$ and $a_2$, when only one argument, say $a_1$, is known. The intent is to have a faster evaluation when you finally get the second argument $a_2$. It is especially useful if $a_2$ changes more often than $a_1$ as the cost of partial evaluation with $a_1$ can be amortized on all the computations where only $a_2$ is changing.

This is a frequent situation in algorithm design (often the topic of the first comment on SE-CS), when some more static part of the data is pre-processed, so that the cost of the pre-processing can be amortized on all applications of the algorithm with more variable parts of the input data.

This is also the very situation of interpreters, as the first argument is the program to be executed, and is usually executed many times with different data (or has subparts executed many times with different data). Hence it become a natural idea to specialize an interpreter for faster evaluation of a given program by partially evaluating it on this program as first argument. This may be seen as a way of compiling the program, and there has been significant research work on compiling by partial evaluation of a interpreter on its first (program) argument.

The Smn theorem

The nice point about the partial evaluation approach is that it does take its roots in theory (though theory can be a liar), notably in Kleene's Smn theorem. I am trying here to give an intuitive presentation of it, hoping it will not upset pure theoreticians.

Given a Gödel numbering $\varphi$ of recursive functions, you can view $\varphi$ as your hardware, so that given the Gödel number $p$ (read object code) of a program $\varphi_p$ is the function defined by $p$ (i.e. computed by the object code on your hardware).

In its simplest form, the theorem is stated in wikipedia as follows (up to a small change in notation):

Given a Gödel numbering $\varphi$ of recursive functions, there is a primitive recursive function $\sigma$ of two arguments with the following property: for every Gödel number $q$ of a partial computable function $f$ with two arguments, the expressions $\varphi_{\sigma(q,x)}(y)$ and $f(x,y)$ are defined for the same combinations of natural numbers $x$ and $y$, and their values are equal for any such combination. In other words, the following extensional equality of functions holds for every $x$: $\;\;\varphi_{\sigma(q,x)} \simeq \lambda y.\varphi_q(x,y).\,$

Now, taking $q$ as the interpreter $I_S$, $x$ as the source code of a program $p_S$, and $y$ as the data $d$ for that program, we can write: $\;\;\varphi_{\sigma(I_S,p_S)} \simeq \lambda d.\varphi_{I_S}(p_S,d).\,$

$\varphi_{I_S}$ may be seen as the execution of the interpreter $I_S$ on the hardware, i.e., as a black-box ready to interpret programs written in language $S$.

The function $\sigma$ may be seen as a function that specializes the interpreter $I_S$ for the program $P_S$, as in partial evaluation. Thus the Gödel number $\sigma(I_S,p_S)$ may be seen has object code that is the compiled version of program $p_S$.

So the function $\;C_S = \lambda q_S.\sigma((I_S,q_S)$ may be seen as a function that take as argument the source code of a program $q_S$ written in language $S$, and return the object code version for that program. So $C_S$ is what is usually called a compiler.

Some conclusions

However, as I said: "theory can be a liar", or actually seem to be one. The problem is that we know nothing of the function $\sigma$. There are actually many such functions, and my guess is that the proof of the theorem may use a very simple definition for it, which might be no better, from an engineering point of view, than the solution proposed by Raphael: to simply bundle the source code $q_S$ with the interpreter $I_S$. This can always be done, so that we can say: compiling is always possible.

Formalizing a more restrictive notion of what is a compiler would require a more subtle theoretical approach. I do not know what may have been done in that direction. The very real work done on partial evaluation is more realistic from an engineering point of view. And there are of course other techniques for writing compilers, including extraction of programs from the proof of their specification, as developed in the context of type-theory, based on the Curry-Howard isomorphism (but I am getting outside my domain of competence).

My purpose here has been to show that Raphael's remark is not "crazy", but a sane reminder that things are not obvious, and not even simple. Saying that something is impossible is a strong statement that does require precise definitions and a proof, if only to have a precise understanding of how and why it is impossible. But building a proper formalization to express such a proof may be quite difficult.

This said, even if a specific feature is not compilable, in the sense understood by engineers, standard compiling techniques can always be applied to parts of the programs that do not use such a feature, as is remarked by Gilles' answer.

To follow on Gilles' key remarks that, depending on the language, some thing may be done at compile-time, while other have to be done at run-time, thus requiring specific code, we can see that the concept of compilation is actually ill-defined, and is probably not definable in any satisfactory way. Compilation is only an optimization process, as I tried to show in the partial evaluation section, when I compared it with static data preprocessing in some algorithms.

As a complex optimization process, the concept of compilation actually belongs to a continuum. Depending on the characteristic of the language, or of the program, some information may be available statically and allow for better optimization. Others things have to be postponed to run-time. When things get really bad, everything has to be done at run-time at least for some parts of the program, and bundling source-code with the interpreter is all you can do. So this bundling is just the low end of this compiling continuum. Much of the research on compilers is about finding ways to do statically what used to be done dynamically. Compile-time garbage collection seems a good example.

Note that saying that the compilation process should produce machine code is no help. That is precisely what the bundling can do as the interpreter is machine code (well, thing can get a bit more complex with cross-compilation).

babou
  • 19,445
  • 40
  • 76
  • 3
    "impossible is a big word" A very very big word. =) – Brian S Sep 02 '14 at 18:43
  • 3
    If one defines "compilation" to refer to a sequence of steps which take place entirely before an executing program receives its first input, and interpretation as being the process of having data control program flow via means which are not part of the program's abstract machine model, then for a language to be compiled it must be possible for the compiler to identify, before execution begins, every possible meaning a language construct could have. In languages where a language construct could have an unbounded number of meanings, compilation won't work. – supercat Sep 02 '14 at 18:52
  • @BrianS No, it's not, and it's impossible to prove otherwise ;) – Michael Gazonda Sep 03 '14 at 17:18
  • @supercat That still isn't a definition. What is the 'meaning' of a language construct? – Rhymoid Sep 04 '14 at 12:26
  • I love the concept of viewing a compiler/interpreter as some kind of partial execution! – Bergi Sep 04 '14 at 13:19
  • @Rhymoid: Piece of the source code which is supposed to perform some action. Since languages' execution model combines compiled code with run-time interpretation, I would posit that a metric to distinguish "essentially compiled" from "interpreted code" would be to compare the number of instructions executed by non-library code to perform a task to the number of instructions executed in an effort to decide what to do. In C, a statement like x=5; will store a 5 into a location whose address can typically be ascertained with a single instruction. In JavaScript's non-strict dialect... – supercat Sep 04 '14 at 16:33
  • ...the 5 may end up being stored in many different places, depending upon the context in which the method containing the assignment was called. The cost of figuring out where the "5" should go would very likely exceed the cost of actually storing a "5" in C by at least two orders of magnitude. The "use strict"; dialect of JavaScript is much more compiler-friendly, since it eliminates many of the context dependencies which plague the non-strict dialect. – supercat Sep 04 '14 at 16:38
17

The question is not actually about compilation being impossible. If a language can be interpreted¹, then it can be compiled in a trivial way, by bundling the interpreter with the source code. The question is asking what language features make this essentially the only way.

An interpreter is a program that takes source code as input and behaves as specified by the semantics of that source code. If an interpreter is necessary, this means that the language includes a way to interpret source code. This feature is called eval. If an interpreter is required as part of the language's runtime environment, it means that the language includes eval: either eval exists as a primitive, or it can be encoded in some way. Languages known as scripting languages usually include an eval feature, as do most Lisp dialects.

Just because a language includes eval doesn't mean that the bulk of it can't be compiled to native code. For example, there are optimizing Lisp compilers, that generate good native code, and that nonetheless support eval; eval'ed code may be interpreted, or may be compiled on the fly.

eval is the ultimate needs-an-interpreter feature, but there are other features that require something short of an interpreter. Consider some typical phases of a compiler:

  1. Parsing
  2. Type checking
  3. Code generation
  4. Linking

eval means that all these phases have to be performed at runtime. There are other features that make native compilation difficult. Taking it from the bottom, some languages encourage late linking by providing ways in which functions (methods, procedures, etc.) and variables (objects, references, etc.) can depend on non-local code changes. This makes it difficult (but not impossible) to generate efficient native code: it's easier to keep object references as calls in a virtual machine, and let the VM engine handle the bindings on the fly.

Generally speaking, reflection tends to make languages difficult to compile to native code. An eval primitive is an extreme case of reflection; many languages don't go that far, but nonetheless have a semantics defined in terms of a virtual machine, allowing for example code to retrieve a class by name, inspect its inheritance, list its methods, call a method, etc. Java with JVM and C# with .NET are two famous examples. The most straightforward way to implement these languages is by compiling them to bytecode, but there are nonetheless native compilers (many just-in-time) that compile at least program fragments that don't use advanced reflection facilities.

Type checking determines whether a program is valid. Different languages have different standards for how much analysis is performed at compile time vs run time: a language is known as “statically typed” if it performs many checks before starting to run the code, and “dynamically typed” if it doesn't. Some languages include a dynamic cast feature or unmarshall-and-typecheck feature; these feature require embedding a typechecker in the runtime environment. This is orthogonal to requirements of including a code generator or an interpreter in the runtime environment.

¹ Exercise: define a language that cannot be interpreted.

Gilles 'SO- stop being evil'
  • 43,613
  • 8
  • 118
  • 182
  • (1) I disagree about bundling a interpreter with source code counting as compiling, but the rest of your post is excellent. (2) Totally agree about eval. (3) I don't see why reflection would make languages difficult to compile to native code. Objective-C has reflection, and (I assume) it is typically compiled. (4) Vaguely related note, C++ template metamagic is typically interpreted rather than compiled then executed. – Mooing Duck Sep 03 '14 at 16:36
  • Just occured to me, Lua is compiled. The eval simply compiles the bytecode, and then as a separate step the binary executes the bytecode. And it definitely has reflection in the compiled binary. – Mooing Duck Sep 03 '14 at 16:42
  • On a Harvard Architecture machine, compilation should yield code which never has to be accessed as "data". I would posit that information from the source file which ends up having to be stored as data rather than code isn't really "compiled". There's nothing wrong with a compiler taking a declaration like int arr[] = {1,2,5}; and generating an initialized-data section containing [1,2,5], but I wouldn't describe its behavior as translating [1,2,5] into machine code. If nearly all of a program has to get stored as data, what part of it would really be "compiled"? – supercat Sep 03 '14 at 23:05
  • 2
    @supercat That's what mathematicians and computer scientists mean by trivial. It fits the mathematical definition, but nothing interesting happens. – Gilles 'SO- stop being evil' Sep 03 '14 at 23:13
  • @Gilles: If the term "compile" is reserved for translation to machine instructions (without retention of associated "data") and accepts that in a compile language, the behavior of the array declaration isn't to "compile" the array, then there are some languages in which it is impossible to compile any meaningful fraction of the code. – supercat Sep 03 '14 at 23:19
  • +1, but I would remove the "parse" step from the list of things that an eval compiler needs to do at runtime and that requires an interpreter. Not only is parsing (string to AST) usually a thing that doesn't require an interpreter, but also many homoiconic languages do have eval functions that are given an AST to execute instead of a string. – Bergi Sep 04 '14 at 13:15
  • @supercat: so is a compiler that uses constant pools in some formal sense compiling "less of" the code than a compiler that uses immediates, since it accesses code as data? Given the code int foo = 0xCAFEBABE;, is one of them only compiling int foo = whereas the other one compiles the whole statement? Then, what if we "compile" our source to Harvard-architecture pure code that first uses only immediates to re-create the original source in RAM and then interprets it? ;-) – Steve Jessop Sep 04 '14 at 18:05
  • @SteveJessop: The issue would be with the allocation of data RAM. For a read-only array, code could legitimately produce a method which, rather than allocating RAM, simply said e.g. return (index & 2) ? ((index & 1) ? 5 : 0) : ((index & 1) ? 6 : 8), but languages where code be stored in a modifiable array will ultimately require interpreting the content of that array. – supercat Sep 04 '14 at 18:44
  • @supercat: I may not be following, but are you saying that your definition of "compiled code" requires that it doesn't allocate memory? Or at least that no branch is dependent on values stored in the memory allocated? If so that rules out techniques that normally would be considered "compiled", such as a memoized function, a dynamically computed LUT, or internal use by the implementation of strings/containers/whatever. I don't see how to formally distinguish between those and a "compiled-in interpreter" of the kind proposed: it's all code that writes data and then looks at it later. – Steve Jessop Sep 04 '14 at 18:57
  • @SteveJessop: As I think about it, I think the biggest issue is that "pure" compiled code generally uses accesses memory at addresses which are either located at fixed addresses relative to either a stack frame or global memory base, or which correspond to array references or pointer indirections in user code. Interpreted languages require indirect data memory accesses which do not form part of the language semantics. – supercat Sep 04 '14 at 19:03
  • C# generics is another feature that requires some degree of reflection: there is one copy of native code generated for class instances and one for every kind of struct instance. Within the same binary, it works in a straightforward manner, but to create new concrete types across from an external binary, metadata are required. Java doesn't support generics for scalar types so it can reuse the version for class references without keeping the data (type erasure) and C++ templates are compile-time, so they aren't present in the binary for re-instantiation, as far as I know. – Theodoros Chatzigiannakis Sep 06 '14 at 13:28
13

I think the authors are assuming that compilation means

  • the source program doesn't need to be present at run-time, and
  • no compiler or interpreter needs to be present at run-time.

Here are some sample features that would make it problematic if not "impossible" for such a scheme:

  1. If you can interrogate the value of a variable at run-time, by referring to the variable by its name (which is a string), then you will need the variable names to be around at run time.

  2. If you can call a function/procedure at run-time, by referring to it by its name (which is a string), then you will need the function/procedure names at run-time.

  3. If you can construct a piece of program at run-time (as a string), say by running another program, or by reading it from a network connection etc., then you will need either an interpreter or a compiler at run-time to run this piece of program.

Lisp has all three features. So, Lisp systems always have an interpreter loaded at run-time. Languages Java and C# have function names available at run time, and tables to look up what they mean. Probably languages like Basic and Python also have variable names at run time. (I am not 100% sure about that).

Uday Reddy
  • 4,294
  • 1
  • 18
  • 23
  • What if the "interpreter" is compiled into the code? For example, using dispatch tables to call virtual methods, are these an example of interpretation or compilation? – Erwin Bolwidt Sep 03 '14 at 03:03
  • 2
    "no compiler or interpreter needs to be present at run-time", eh? Well if that's true, then in a deep sense, C can't be "compiled" on most platforms either. The C runtime doesn't have very much to do: startup, to set up stacks and so forth, and shutdown for atexit processing. But it still has to be there. – Pseudonym Sep 03 '14 at 04:14
  • 1
    "Lisp systems always have an interpreter loaded at run-time." – Not necessarily. Many Lisp systems have a compiler at runtime. Some don't even haven an interpreter at all. – Jörg W Mittag Sep 03 '14 at 10:39
  • 2
    Nice try, but http://en.wikipedia.org/wiki/Lisp_machine#Technical_overview. They do compile Lisp and are designed to execute the result efficiently. – Peter - Reinstate Monica Sep 03 '14 at 13:33
  • @Pseudonym: The C runtime is a library, not a compiler nor interpreter. – Mooing Duck Sep 03 '14 at 16:43
  • @PeterSchneider: I believe he meant Lisp compiler/binaries, not Lisp Machines. – Mooing Duck Sep 03 '14 at 16:44
  • @UdayReddy: I don't think 1 & 2 are problematic for compiled languages, as most compiled languages have mechanisms for doing these. Namely: dll linking often uses these features. – Mooing Duck Sep 03 '14 at 16:46
  • @MooingDuck - So then what separates a library from a compiler or interpreter? – Michael Gazonda Sep 03 '14 at 17:21
  • @MGaz: This type of library has absolutely nothing to do with source code whatsoever. They have... virtually nothing in common with compilers or interpreters. It's merely a series of function binaries that you call into. You know, like the Win32 library. – Mooing Duck Sep 03 '14 at 17:25
  • @MooingDuck: The runtime is more than the runtime library, even for compiled languages. It loads and modifes code (e.g. resolves references to dynamic libraries), prepares the environment for the actual program to run, and then starts it. The C++ runtime is more sophisticated than the C runtime, but only on tiny systems does a C program run standalone. Usually a lot happens before the first user code is executed. – Peter - Reinstate Monica Sep 04 '14 at 04:16
  • @PeterSchneider: I don't think that the C runtime resolves dynamic libraries, I'm pretty sure that's an OS facility. The only preparation I'm aware that it does (In C) is the initialization of globals, and (potentially) passing bits between main and the OS. There may be other prep, but I can't imagine what or why. The C runtime also sets up handlers for various signals, and possibly a tiny bit for the console, but I'd expect that most of the console bits are regular library calls. None of this involves C code. – Mooing Duck Sep 04 '14 at 16:44
  • @MooingDuck After googling a bit and reading the terrible wikipedia article http://en.wikipedia.org/wiki/Runtime_system, I think there are several meanings of the term. 1. The point in time when a prog runs. 2. The runtime (in particular system) libraries, e.g. glibc. 3. The runtime environment, i.e. everything that's needed to successfully run a given executable. The third obviously encompasses many parts of the OS and related services like a dynamic loader. The runtime libs provide an API to (parts of) the runtime environment.-- Would that cover it? – Peter - Reinstate Monica Sep 04 '14 at 17:12
  • @PeterSchneider: I think it gets subjective in this area, but it dawns on my that it's all irrelevant to the point I was making: compilers and interpreters deal in source code translation. The part of the C runtime that does setup and shutdown does not need source code and thus C doesn't "require a compiler or interpreter at runtime" as Pseudonym claimed. – Mooing Duck Sep 04 '14 at 17:39
  • Exactly: the reason the CRT library isn't a C compiler is that you can't give it C source code and get back an executable! (Or even an object format other than an executable, for those who want to distinguish between compiling and linking) It's not an interpreter because you can't give it C source code and get back the output of the program. It might be difficult to define precisely what compilation is, but I don't think this is one of the difficult cases. OTOH, I suppose you could argue that the dynamic loader is a "compiler" between two extremely similar languages neither of which is C. – Steve Jessop Sep 04 '14 at 17:50
  • @MooingDuck IIUC Pseudonym suggests that the runtime environment for a C program in a hosted environment, e.g. a PC, is formally an interpreter (as opposed to a freestanding environment where the machine code is executed "directly"). It's just not a C interpreter: It just so happens that the language being interpreted is understood by the CPU so that running a program amounts to some setup followed by a large "eval", with the executable as argument. I see his point. Also the OS can suspend and kill a program etc; the prog runs completely under its control. – Peter - Reinstate Monica Sep 04 '14 at 17:56
  • @PeterSchneider: If you interpret it as the hosting counting partially as an interpreter, that's silly because it makes every theoretical programming language impossible to compile (including assembly). If you interpret it as the CPU interpreting ASM, that's even more silly, because then even raw bytecode is "interpreted", and the word "compile" has no meaning whatsoever. While I can see that discussion happening, it shouldn't be part of a comparison of interpretation vs compilation. – Mooing Duck Sep 04 '14 at 18:03
8

it is possible the current replies are "overthinking" the statement/ answers. possibly what the authors are referring to is the following phenomenon. many languages have an "eval" like command; eg see javascript eval and its behavior is commonly studied as a special part of CS theory (eg say Lisp). the function of this command is to evaluate the string in the context of the language definition. therefore in effect it has a similarity to a "built in compiler". the compiler cannot know the contents of the string until runtime. therefore compiling the eval result into machine code is not possible at compile time.

other answers point out that the distinction of interpreted vs compiled languages can blur significantly in many cases esp with more modern languages like say Java with a "just in time compiler" aka "Hotspot" (javascript engines eg V8 increasingly use this same technique). "eval-like" functionality is certainly one of them.

vzn
  • 11,034
  • 1
  • 27
  • 50
3

LISP is a terrible example, as it was conceived as a sort of higher-level "machine" language as a base for a "real" language. Said "real" language never materialized. LISP machines were built on the idea of doing (much of) LISP in hardware. As a LISP interpreter is just a program, it is in principle possible to implement it in circuitry. Not practical, perhaps; but far from impossible.

Furthermore, there are lots of interpreters programmed in silicon, normally called "CPU". And it is often useful to interpret (not yet existing, not at hand, ...) machine codes. E.g. Linux' x86_64 was first written and tested on emulators. There were full distributions on hand when the chips came to the market, even just for early adopters/testers. Java is often compiled to JVM code, which is an interpreter which would not be too hard to write in silicon.

Most "interpreted" languages are compiled to an internal form, which is optimized and then interpreted. This is e.g. what Perl and Python do. There are also compilers for meant-to-be interpreted languages, like the Unix shell. It is possible on the other hand to interpret traditionally compiled languages. One somewhat extreme example I saw was an editor which used interpreted C as extension language. It's C could run normal, but simple, programs with no issues.

On the other hand, modern CPUs even take the "machine language" input and translate it into lower-level instructions, which are then reordered and optimized (i.e., "compiled") before being handed off for execution.

This whole "compiler" vs "interpreter" distinction is really moot, somewhere in the stack there is an ultimate interpreter which takes "code" and executes it "directly". The input from the programmer undergoes transformations along the line, which of those is called "compiling" is just drawing an arbitrary line in the sand.

vonbrand
  • 14,004
  • 3
  • 40
  • 50
1

i would presume the main feature of a programming language that makes a compiler for the language impossible (in a strict sense, see also self-hosting) is the self-modification feature. Meaning the language allows to change the source code during run-time (sth a compiler generating, fixed and static, object code cannot do). A classic example is Lisp (see also Homoiconicity). Similar functionality is provided using a language construct such as eval, included in many languages (e.g javaScript). Eval actually calls the interpreter (as a function) at run-time.

In other words the language can represent its own meta-system (see also Metaprogramming)

Note that language reflection, in the sense of querying about meta-data of a certain source code, and possibly modify the meta-data only, (sth like Java's or PHP's reflection mechanism) is not problematic for a compiler, since it already has those meta-data at compile time and can make them available to the compiled program, as needed, if needed.

Another feature that makes compilation difficult or not the best option (but not impossible) is the typing scheme used in the language (i.e dynamic typing vs static typing and strong typing vs loose typing). This makes difficult for the compiler to have all the semantics at compile-time, so effectively a part of the compiler (in other words an interpreter) becomes part of the generated code which handles the semantics at run-time. This is, in other words, not compilation but interpretation.

Nikos M.
  • 957
  • 6
  • 16
1

The reality is that there is a big difference between interpreting some Basic program and executing assembler. And there are areas in-between with P-code / byte-code with or without (just-in-time) compilers. So I will try to summarise some points in the context of this reality.

  • If how source code is parsed depends on run-time conditions, writing a compiler may become impossible, or so hard that nobody will bother.

  • Code that modifies itself is in the general case impossible to compile.

  • A program that uses an eval-like function usually cannot be completely compiled in advance (if you regard the string fed to it as part of the program), although if you're going to run the eval'ed code repeatedly it may still be useful to have your eval-like function invoke the compiler. Some languages provide an API for the compiler to make this easy.

  • The ability to refer to things by name doesn't preclude compilation, but you do need tables (as mentioned). Calling functions by name (like IDispatch) requires a lot of plumbing, to the point where I think most people would agree that we're effectively talking about a function call interpreter.

  • Weak typing (whatever your definition) makes compilation harder and perhaps the result less efficient, but often not impossible, unless different values trigger different parses. There is a sliding scale here: if the compiler can't deduce the actual type, it will need to emit branches, function calls and such that wouldn't otherwise be there, effectively embedding bits of interpreter in the executable.

FrankW
  • 6,589
  • 4
  • 26
  • 42
-1

I feel the original question is not well formed. The authors of the question may have intended to ask a somewhat different question: What properties of a progamming language facilitate writing a compiler for it?

For example, it's easier to write a compiler for a context-free language than a context-sensitive language. The grammar which defines a language can also have issues that make it challenging to compile, such as ambiguities. Such issues can be resolved but require extra effort. Similarly, languages defined by unrestricted grammars are harder to parse than context-sensitive languages (see Chomsky Hierarchy). To my knowledge most widely used procedural programming languages are close to context-free, but have a few context-sensitive elements, making them relatively easy to compile.

Georgie
  • 107
  • 1
  • 2
    The question is clearly intending to oppose/compare compilers and interpreters. While they may work differently, and usually do except for @Raphael limit case above, they have exactly the same problems regarding syntax analysis and ambiguity. So syntax cannot be the issue. I also believe that syntactic problem are not usually the major concern in compiler writing nowadays, though it has been in the past. I am not the downvoter: I prefer commenting. – babou Sep 03 '14 at 08:33
-1

The question has a correct answer so obvious that it's typically overlooked as being trivial. But it does matter in many contexts, and is the primary reason why interpreted languages exist:

Compiling source code into machine code is impossible if you don't yet have the source code.

Interpreters add flexibility, and in particular they add the flexibility of running code that wasn't available when the underlying project was compiled.

tylerl
  • 107
  • 1
  • 2
    "I lost the source code" is not a property of a programming language but of a particular program, so that doesn't answer the question. And you definitely need a citation for the claim that avoiding loss of the source code is "the primary reason why interpreted languages exist", or even a reason why they exist. – David Richerby Nov 11 '14 at 09:28
  • 1
    @DavidRicherby I guess the use case tyleri has in mind is interactive interpretation, i.e. code entered at runtime. I agree, though, that that is out of the scope of the question since it's not a feature of the language. – Raphael Nov 11 '14 at 11:24
  • @DavidRicherby and Raphael, i say that the author of this post implies (what i describe in my answer) as the self-modification feature which of course is a language construct by design and not an artifact of some specific program – Nikos M. Jan 17 '16 at 22:46