66

In languages like C, the programmer is expected to insert calls to free. Why doesn't the compiler do this automatically? Humans do it in a reasonable amount of time(ignoring bugs), so it is not impossible.

EDIT: For future reference, here is another discussion that has an interesting example.

Milton Silva
  • 815
  • 1
  • 6
  • 8
  • 131
    And that, kids, is why we teach you computability theory. ;) – Raphael Jan 12 '17 at 20:44
  • 8
    This isn't a computability problem as humans can't decide in all cases either. It's a completeness problem; deallocation statements contain information that, if removed, can't be fully recovered by analysis unless that analysis includes information about the deployment environment and expected operation, which C source code doesn't contain. – Nat Jan 13 '17 at 07:16
  • 41
    No, it's a computability problem. It is undecidable whether a given piece of memory should be deallocated. For a fixed program, no user input or other external interference. – Andrej Bauer Jan 13 '17 at 07:25
  • 1
    Comments are not for extended discussion; this conversation has been moved to chat. All comments that do not specifically address the question and how it can be improved will be deleted on sight. – Raphael Jan 14 '17 at 13:01
  • It is not clear if this question is about “the” C “compiler”, the compilers of “languages like C” or something even more general. The conclusion “so it is not impossible” needs a stricter logical derivation: it assumes that the programme contains everything that the programmer knows about it, which may include intention, future development and operating environment. – PJTraill Jan 15 '17 at 17:59
  • 2
    @BorisTreukhov, please take it to the chatroom. No, I don't think Andrej is saying that escape analysis is "impossible" (though determining exactly what that means in this context is a bit unclear to me). Perfectly precise escape analysis is undecidable. To all: please take it to the chatroom. Please only post comments here that are aimed at improving the question -- other discussion and commentary should be posted in the chatroom. – D.W. Jan 16 '17 at 03:41
  • C++ does do this, via RAII, which uses block scoping to see when to destroy an object. But it can still be fooled at run-time by smart pointers taking cyclic ownership of each other (the compiler-side still works exactly as expected in this case, since reference counting is done at run-time). – Mark K Cowan Jan 16 '17 at 17:20
  • @MarkKCowan It's not just smart pointers. Anything that puts a pointer in another object can fool the automatic storage. It's undecidable to do this safely in an unrestricted type system like that of C++. – Theodoros Chatzigiannakis Jan 17 '17 at 12:27
  • 1
    Also it is worth noting that linear types (like in Clean or Rust) help dealing with automatic resource management. – Maja Piechotka Jan 18 '17 at 01:32

12 Answers12

87

Because it's undecidable whether the program will use the memory again. This means that no algorithm can correctly determine when to call free() in all cases, which means that any compiler that tried to do this would necessarily produce some programs with memory leaks and/or some programs that continued to use memory that had been freed. Even if you ensured that your compiler never did the second one and allowed the programmer to insert calls to free() to fix those bugs, knowing when to call free() for that compiler would be even harder than knowing when to call free() when using a compiler that didn't try to help.

David Richerby
  • 81,689
  • 26
  • 141
  • 235
59

As David Richerby rightly noted, the problem is undecidable in general. Object liveness is a global property of the program, and may in general depend on the inputs to the program.

Even precise dynamic garbage collection is an undecidable problem! All real-world garbage collectors use reachability as a conservative approximation to whether or not an allocated object will be needed in the future. It's a good approximation, but it's an approximation nonetheless.

But that's only true in general. One of the most notorious cop-outs in the computer science business is "it's impossible in general, therefore we can't do anything". On the contrary, there are many cases where it's possible to make some headway.

Implementations based on reference counting are very close to "the compiler inserting deallocations" such that it's hard to tell the difference. LLVM's automatic reference counting (used in Objective-C and Swift) is a famous example.

Region inference and compile-time garbage collection are current active research areas. It turns out to be much easier in declarative languages like ML and Mercury, where you can't modify an object after it's created.

Now, on the topic of humans, there are three main ways that humans manage allocation lifetimes manually:

  1. By understanding the program and the problem. Humans can put objects with similar lifetimes in the same allocation object, for example. Compilers and garbage collectors must infer this, but humans have more precise information.
  2. By selectively using nonlocal book-keeping (e.g. reference counting) or other special allocation techniques (e.g. zones) only when needed. Again, a human can know this where a compiler must infer it.
  3. Badly. Everyone knows of real-world deployed programs that have slow leaks, after all. Or if they don't, sometimes programs and internal APIs need to be restructured around memory lifetimes, decreasing reusability and modularity.
Pseudonym
  • 22,091
  • 2
  • 42
  • 84
  • Comments are not for extended discussion. If you wish to discuss declarative vs functional, please do it in chat. – Gilles 'SO- stop being evil' Jan 15 '17 at 19:42
  • 2
    This is by far the best answer to the question (which too many answers do not even address). You could have added a reference to the pioneering work of Hans Boehm on conseervative GC: https://en.wikipedia.org/wiki/Boehm_garbage_collector. Another interesting point is that data liveness (or usefulness in an extended sense) can be defined with respect to an abstract semantics or to an execution model. But the topic is really wide. – babou Jan 22 '17 at 13:15
30

It's an incompleteness problem, not an undecidability problem

While it's true that the optimal placement of deallocation statements is undecidable, that's simply not the issue here. Since it's undecidable for both humans and compilers, it's impossible to always knowingly select the optimal deallocation placement, regardless of whether it's a manual or automatic process. And since no one's perfect, a sufficiently advanced compiler should be able to out-perform humans at guessing approximately optimal placements. So, undecidability isn't why we need explicit deallocation statements.

There are cases in which external knowledge informs deallocation statement placement. Removing those statements is then equivalent to removing part of the operational logic, and asking a compiler to automatically generate that logic is equivalent to asking it to guess what you're thinking.

For example, say that you're writing a Read-Evaluate-Print-Loop (REPL): the user types in a command, and your program executes it. The user can allocate/deallocate memory by typing commands into your REPL. Your source code would specify what the REPL should do for each possible user command, including deallocation when the user types in the command for it.

But if C source code doesn't provide an explicit command for deallocation, then the compiler would need to infer that it should perform the dellocation when the user inputs the appropriate command into the REPL. Is that command "deallocate", "free", or something else? The compiler has no way of knowing what you want the command to be. Even if you program in logic to look for that command word and the REPL finds it, the compiler has no way of knowing that it should respond to it with deallocation unless you explicitly tell it to in the source code.

tl;dr The problem is C source code doesn't provide the compiler with external knowledge. Undecidability isn't the issue because it's there whether the process is manual or automated.

Nat
  • 1,351
  • 1
  • 10
  • 18
  • 3
    Comments are not for extended discussion; this conversation has been moved to chat. All further comments that do not specifically address shortcomings of this answer and how they can be fixed will be deleted on sight. – Raphael Jan 15 '17 at 11:37
24

Currently, none of the posted answers are fully correct. It's not impossible to do this, but adding that feature would restrict mermory allocation patterns.

Why don't compilers automatically insert deallocations?

  1. Some do. (I'll explain later.)

  2. Trivially, you can call free() just before the program exits. But there's an implied need in your question to call free() as soon as possible.

  3. The problem of when to call free() in any C program as soon as the memory is unreachable is undecidable, i.e. for any algorithm providing the answer in finite time, there is a case it does not cover. This -- and many other undecidabilities of arbitrary programs -- can be proved from the Halting Problem.

  4. An undecidable problem cannot always be solved in finite time by any algorithm, whether a compiler or a human.

  5. Humans (try to) write in a subset of C programs that can be verified for memory correctness by their algorithm (themselves).

  6. Some languages accomplish #1 by building #5 into the compiler. They don't allow programs with arbitrary uses of memory allocation, but rather a decidable subset of them. Forth and Rust are two examples of languages that have more restrictive memory allocation than C's malloc(), that can (1) detect if a program is written outside of their decidable set (2) insert deallocations automatically.

rici
  • 12,020
  • 21
  • 38
Paul Draper
  • 396
  • 1
  • 7
  • 2
    I understand how Rust does it. But I never heard of a Forth that did this. Can you elaborate? – Milton Silva Jan 15 '17 at 11:52
  • 3
    @MiltonSilva, Forth -- at least its most basic, original implementation -- has only a stack, not a heap. It makes allocation/deallocation moving the call stack pointer, a task which the compiler can easily do. Forth was made to target very simple hardware, and sometimes non-dynamic memory is all that is workable. It's obviously not a workable solution for non-trivial programs. – Paul Draper Jan 16 '17 at 01:11
10

"Humans do it, so it's not impossible" is a well-known fallacy. We do not necessarily understand (let alone control) the things that we create - money is a common example. We tend to overestimate (sometimes dramatically) our chances of success in technological matters, especially when human factors seem to be absent.

Human performance in computer programming is very poor, and the study of computer science (lacking in many professional education programs) helps understanding why this problem does not have a simple fix. We may some day, perhaps not too far away, be replaced by artificial intelligence on the job. Even then, there will not be a general algorithm that gets deallocation right, automatically, all the time.

André Souza Lemos
  • 3,276
  • 1
  • 14
  • 30
  • 1
    The fallacy of accepting the premise of human fallibility and yet assuming that human-created thinking machines may still be infallible (i.e. better than humans) is less well-known but more intriguing. The only assumption from which action can proceed is that the human mind has the potential to compute perfectly. – Wildcard Jan 17 '17 at 05:19
  • I never said that thinking machines might be infallible. Better than humans is what they already are, in many cases. 2. The expectation of perfection (even potential) as a prerequisite for action is an absurdity.
  • – André Souza Lemos Jan 17 '17 at 07:07
  • "We may some day, perhaps not too far away, be replaced by artificial intelligence on the job." This, in particular, is nonsense. Humans are the source of intent in the system. Without humans, there is no purpose for the system. "Artificial Intelligence" could be defined as the apparency of intelligent present-time decision by machines, brought about in fact by intelligent decisions of a programmer or system designer in the past. If there is no maintenance (which must be done by a person), AI (or any system that's left uninspected and fully automatic) will fail. – Wildcard Jan 17 '17 at 09:16
  • Intent, in humans as in machines, always comes from outside. – André Souza Lemos Jan 17 '17 at 13:44
  • Entirely untrue. (And also, "outside" does not define a source.) Either you're stating that intent as such doesn't actually exist, or you are stating that intent exists but doesn't come from anywhere. Perhaps you believe that intent can exist independent of purpose? In which case you misunderstand the word "intent." Either way, an in-person demonstration would shortly change your mind on this subject. I will drop off after this comment as words alone cannot bring about an understanding of "intent," so further discussion here is pointless. – Wildcard Jan 22 '17 at 11:37