Why don't compilers automatically insert deallocations?

Question

In languages like C, the programmer is expected to insert calls to free. Why doesn't the compiler do this automatically? Humans do it in a reasonable amount of time(ignoring bugs), so it is not impossible.

EDIT: For future reference, here is another discussion that has an interesting example.

And that, kids, is why we teach you computability theory. ;) — Raphael, Jan 12 '17 at 20:44
This isn't a computability problem as humans can't decide in all cases either. It's a completeness problem; deallocation statements contain information that, if removed, can't be fully recovered by analysis unless that analysis includes information about the deployment environment and expected operation, which C source code doesn't contain. — Nat, Jan 13 '17 at 07:16
No, it's a computability problem. It is undecidable whether a given piece of memory should be deallocated. For a fixed program, no user input or other external interference. — Andrej Bauer, Jan 13 '17 at 07:25
Comments are not for extended discussion; this conversation has been moved to chat. All comments that do not specifically address the question and how it can be improved will be deleted on sight. — Raphael, Jan 14 '17 at 13:01
It is not clear if this question is about “the” C “compiler”, the compilers of “languages like C” or something even more general. The conclusion “so it is not impossible” needs a stricter logical derivation: it assumes that the programme contains everything that the programmer knows about it, which may include intention, future development and operating environment. — PJTraill, Jan 15 '17 at 17:59
@BorisTreukhov, please take it to the chatroom. No, I don't think Andrej is saying that escape analysis is "impossible" (though determining exactly what that means in this context is a bit unclear to me). Perfectly precise escape analysis is undecidable. To all: please take it to the chatroom. Please only post comments here that are aimed at improving the question -- other discussion and commentary should be posted in the chatroom. — D.W., Jan 16 '17 at 03:41
C++ does do this, via RAII, which uses block scoping to see when to destroy an object. But it can still be fooled at run-time by smart pointers taking cyclic ownership of each other (the compiler-side still works exactly as expected in this case, since reference counting is done at run-time). — Mark K Cowan, Jan 16 '17 at 17:20
@MarkKCowan It's not just smart pointers. Anything that puts a pointer in another object can fool the automatic storage. It's undecidable to do this safely in an unrestricted type system like that of C++. — Theodoros Chatzigiannakis, Jan 17 '17 at 12:27
Also it is worth noting that linear types (like in Clean or Rust) help dealing with automatic resource management. — Maja Piechotka, Jan 18 '17 at 01:32

David Richerby · Accepted Answer · 2017-01-13T00:12:12.253

87

Because it's undecidable whether the program will use the memory again. This means that no algorithm can correctly determine when to call free() in all cases, which means that any compiler that tried to do this would necessarily produce some programs with memory leaks and/or some programs that continued to use memory that had been freed. Even if you ensured that your compiler never did the second one and allowed the programmer to insert calls to free() to fix those bugs, knowing when to call free() for that compiler would be even harder than knowing when to call free() when using a compiler that didn't try to help.

edited Jan 13 '17 at 00:12

answered Jan 12 '17 at 20:02

David Richerby

81,689
26
141
235

12

We have a question that covers humans' ability to solve undecidable problems. I can't give you an example of a program that would be compiled incorrectly because that depends on what algorithm the compiler uses. But any algorithm will produce incorrect output for infinitely many different programs. – David Richerby Jan 12 '17 at 20:40
1

Comments are not for extended discussion; this conversation has been moved to chat. – Gilles 'SO- stop being evil' Jan 13 '17 at 09:04
3

Guys, take it to chat. Everything that does not directly relate to the answer itself and how it can be improved will be deleted. – Raphael Jan 15 '17 at 11:45
2

Lots of things compilers do happily are undecidable in general; we wouldn't get anywhere in the compiler world if we always caved to Rice's theorem. – Tikhon Jelvis Jan 17 '17 at 02:28
@TikhonJelvis: I believe he's saying that this was the reason why this approach wasn't pursued. I don't think he's saying that it shouldn't be pursued, or that this is necessarily a good reason. – user541686 Jan 18 '17 at 02:41
4

This is irrelevant. If it's undecidable for all compilers, it's undecidable for all humans too. Yet we expect humans to insert free() correctly. – Paul Draper Jan 18 '17 at 03:51

score 59 · Answer 2 · edited Jan 18 '17 at 11:49

As David Richerby rightly noted, the problem is undecidable in general. Object liveness is a global property of the program, and may in general depend on the inputs to the program.

Even precise dynamic garbage collection is an undecidable problem! All real-world garbage collectors use reachability as a conservative approximation to whether or not an allocated object will be needed in the future. It's a good approximation, but it's an approximation nonetheless.

But that's only true in general. One of the most notorious cop-outs in the computer science business is "it's impossible in general, therefore we can't do anything". On the contrary, there are many cases where it's possible to make some headway.

Implementations based on reference counting are very close to "the compiler inserting deallocations" such that it's hard to tell the difference. LLVM's automatic reference counting (used in Objective-C and Swift) is a famous example.

Region inference and compile-time garbage collection are current active research areas. It turns out to be much easier in declarative languages like ML and Mercury, where you can't modify an object after it's created.

Now, on the topic of humans, there are three main ways that humans manage allocation lifetimes manually:

By understanding the program and the problem. Humans can put objects with similar lifetimes in the same allocation object, for example. Compilers and garbage collectors must infer this, but humans have more precise information.
By selectively using nonlocal book-keeping (e.g. reference counting) or other special allocation techniques (e.g. zones) only when needed. Again, a human can know this where a compiler must infer it.
Badly. Everyone knows of real-world deployed programs that have slow leaks, after all. Or if they don't, sometimes programs and internal APIs need to be restructured around memory lifetimes, decreasing reusability and modularity.

Comments are not for extended discussion. If you wish to discuss declarative vs functional, please do it in chat. — Gilles 'SO- stop being evil', Jan 15 '17 at 19:42
This is by far the best answer to the question (which too many answers do not even address). You could have added a reference to the pioneering work of Hans Boehm on conseervative GC: https://en.wikipedia.org/wiki/Boehm_garbage_collector. Another interesting point is that data liveness (or usefulness in an extended sense) can be defined with respect to an abstract semantics or to an execution model. But the topic is really wide. — babou, Jan 22 '17 at 13:15

Nat · Answer 3 · 2017-01-13T10:25:40.277

It's an incompleteness problem, not an undecidability problem

While it's true that the optimal placement of deallocation statements is undecidable, that's simply not the issue here. Since it's undecidable for both humans and compilers, it's impossible to always knowingly select the optimal deallocation placement, regardless of whether it's a manual or automatic process. And since no one's perfect, a sufficiently advanced compiler should be able to out-perform humans at guessing approximately optimal placements. So, undecidability isn't why we need explicit deallocation statements.

There are cases in which external knowledge informs deallocation statement placement. Removing those statements is then equivalent to removing part of the operational logic, and asking a compiler to automatically generate that logic is equivalent to asking it to guess what you're thinking.

For example, say that you're writing a Read-Evaluate-Print-Loop (REPL): the user types in a command, and your program executes it. The user can allocate/deallocate memory by typing commands into your REPL. Your source code would specify what the REPL should do for each possible user command, including deallocation when the user types in the command for it.

But if C source code doesn't provide an explicit command for deallocation, then the compiler would need to infer that it should perform the dellocation when the user inputs the appropriate command into the REPL. Is that command "deallocate", "free", or something else? The compiler has no way of knowing what you want the command to be. Even if you program in logic to look for that command word and the REPL finds it, the compiler has no way of knowing that it should respond to it with deallocation unless you explicitly tell it to in the source code.

tl;dr The problem is C source code doesn't provide the compiler with external knowledge. Undecidability isn't the issue because it's there whether the process is manual or automated.

Comments are not for extended discussion; this conversation has been moved to chat. All further comments that do not specifically address shortcomings of this answer and how they can be fixed will be deleted on sight. — Raphael, Jan 15 '17 at 11:37

score 24 · Answer 4 · edited Dec 19 '20 at 05:35

Currently, none of the posted answers are fully correct. It's not impossible to do this, but adding that feature would restrict mermory allocation patterns.

Why don't compilers automatically insert deallocations?

Some do. (I'll explain later.)
Trivially, you can call free() just before the program exits. But there's an implied need in your question to call free() as soon as possible.
The problem of when to call free() in any C program as soon as the memory is unreachable is undecidable, i.e. for any algorithm providing the answer in finite time, there is a case it does not cover. This -- and many other undecidabilities of arbitrary programs -- can be proved from the Halting Problem.
An undecidable problem cannot always be solved in finite time by any algorithm, whether a compiler or a human.
Humans (try to) write in a subset of C programs that can be verified for memory correctness by their algorithm (themselves).
Some languages accomplish #1 by building #5 into the compiler. They don't allow programs with arbitrary uses of memory allocation, but rather a decidable subset of them. Forth and Rust are two examples of languages that have more restrictive memory allocation than C's malloc(), that can (1) detect if a program is written outside of their decidable set (2) insert deallocations automatically.

I understand how Rust does it. But I never heard of a Forth that did this. Can you elaborate? — Milton Silva, Jan 15 '17 at 11:52
@MiltonSilva, Forth -- at least its most basic, original implementation -- has only a stack, not a heap. It makes allocation/deallocation moving the call stack pointer, a task which the compiler can easily do. Forth was made to target very simple hardware, and sometimes non-dynamic memory is all that is workable. It's obviously not a workable solution for non-trivial programs. — Paul Draper, Jan 16 '17 at 01:11

score 10 · Answer 5 · answered Jan 13 '17 at 04:10

10

"Humans do it, so it's not impossible" is a well-known fallacy. We do not necessarily understand (let alone control) the things that we create - money is a common example. We tend to overestimate (sometimes dramatically) our chances of success in technological matters, especially when human factors seem to be absent.

Human performance in computer programming is very poor, and the study of computer science (lacking in many professional education programs) helps understanding why this problem does not have a simple fix. We may some day, perhaps not too far away, be replaced by artificial intelligence on the job. Even then, there will not be a general algorithm that gets deallocation right, automatically, all the time.

answered Jan 13 '17 at 04:10

André Souza Lemos

3,276
1
14
30

1

The fallacy of accepting the premise of human fallibility and yet assuming that human-created thinking machines may still be infallible (i.e. better than humans) is less well-known but more intriguing. The only assumption from which action can proceed is that the human mind has the potential to compute perfectly. – Wildcard Jan 17 '17 at 05:19
I never said that thinking machines might be infallible. Better than humans is what they already are, in many cases. 2. The expectation of perfection (even potential) as a prerequisite for action is an absurdity.

André Souza Lemos

Jan 17 '17 at 07:07

"We may some day, perhaps not too far away, be replaced by artificial intelligence on the job." This, in particular, is nonsense. Humans are the source of intent in the system. Without humans, there is no purpose for the system. "Artificial Intelligence" could be defined as the apparency of intelligent present-time decision by machines, brought about in fact by intelligent decisions of a programmer or system designer in the past. If there is no maintenance (which must be done by a person), AI (or any system that's left uninspected and fully automatic) will fail. – Wildcard Jan 17 '17 at 09:16

Intent, in humans as in machines, always comes from outside. – André Souza Lemos Jan 17 '17 at 13:44

Entirely untrue. (And also, "outside" does not define a source.) Either you're stating that intent as such doesn't actually exist, or you are stating that intent exists but doesn't come from anywhere. Perhaps you believe that intent can exist independent of purpose? In which case you misunderstand the word "intent." Either way, an in-person demonstration would shortly change your mind on this subject. I will drop off after this comment as words alone cannot bring about an understanding of "intent," so further discussion here is pointless. – Wildcard Jan 22 '17 at 11:37

score 9 · Answer 6 · edited Jan 14 '17 at 09:50

The issue is mostly a historic artifact, not an impossibility of implementation.

The way most C compilers build code is so that the compiler only sees each source file at a time; it never sees the whole program at once. When one source file calls a function from another source file or a library, all the compiler sees is the header file with the return type of the function, not the actual code of the function. This means when there is a function that returns a pointer, the compiler has no way to tell if the memory that the pointer is pointing to needs to be freed or not. The information to decide that is not shown to the compiler at that point in time. A human programmer, on the other side, is free to look up the source code of the function or the documentation to find out what needs to be done with the pointer.

If you look into more modern low-level languages like C++11 or Rust you'll find that they mostly solved the issue by making memory ownership explicit in the type of the pointer. In C++ you would use a unique_ptr<T> instead of a plain T* to hold memory and the unique_ptr<T> makes sure that the memory gets freed when the object reaches the end of the scope, unlike the plain T*. The programmer can hand the memory from one unique_ptr<T> to another, but there can only ever be one unique_ptr<T> pointing at the memory. So it is always clear who owns the memory and when it needs to be freed.

C++, for backward compatibility reasons, still allows old style manual memory management and thus the creation of bugs or ways to circumvent the protection of a unique_ptr<T>. Rust is even more strict in that it enforces memory ownership rules via compiler errors.

As for undecidability, the halting problem and the like, yes, if you stick to C semantics it is not possible to decide for all programs when the memory should be freed. However for most actual programs, not academic exercises or buggy software, it absolutely would be possible to decide when to free and when not to. That's after all the only reason why human can figure out when to free or not in the first place.

Comments are not for extended discussion; this conversation has been moved to chat. — Raphael, Jan 15 '17 at 16:49

score 9 · Answer 7 · edited Jan 13 '17 at 22:54

9

The lack of automatic memory management is a feature of the language.

C is not supposed to be a tool for writing software easily. It is a tool for making the computer do whatever you tell it to do. That includes allocating and deallocating memory at the moment of your choosing. C is a low-level language you use when you want to control the computer precisely, or when you want to do things in a different way than what the language/standard library designers expected.

edited Jan 13 '17 at 22:54

D.W.

159,275
20
227
470

answered Jan 13 '17 at 11:46

Jouni Sirén

462
2
5

Comments are not for extended discussion; this conversation has been moved to chat. – D.W. Jan 13 '17 at 22:53
2

How is this an answer to the (CS part of the) question? – Raphael Jan 14 '17 at 12:57
6

@Raphael Computer science does not mean that we should look for obscure technical answers. Compilers do many things that are impossible in the general case. If we want automatic memory management, we can implement it in many ways. C does not do so, because it is not supposed to do. – Jouni Sirén Jan 14 '17 at 13:10

score 6 · Answer 8 · answered Jan 13 '17 at 12:11

6

Other answers have focussed on whether it is possible to do garbage collection, some details of how it's done, and some of the problems.

One issue which hasn't yet been covered though is the inevitable delay in garbage collection. In C, when a programmer calls free(), that memory is immediately available for reuse. (In theory at least!) So a programmer can free their 100MB structure, allocate another 100MB structure a millisecond later, and expect the overall memory usage to remain the same.

This is not true with garbage collection. Garbage-collected systems have some delay in returning unused memory to the heap, and this can be significant. If your 100MB structure goes out of scope, and a millisecond later your program sets up another 100MB structure, you can reasonably expect your system to be using 200MB for a short period. That "short period" may be milliseconds or seconds depending on the system, but there is still a delay.

If you're running on a PC with gigs of RAM and virtual memory, of course you'll probably never notice this. If you're running on a system with more limited resources though (say, an embedded system or a phone), this is something you need to take seriously. This is not just theoretical - I have personally seen this create problems (as in crashing the device kind of problems) when working on a WinCE system using the .NET Compact Framework and developing in C#.

answered Jan 13 '17 at 12:11

Graham

219
1
5

You could in theory run a GC before every allocation. – adrianN Jan 13 '17 at 12:14
5

@adrianN But in practice this isn’t done because it would be mental. Graham’s point still stands: GCs always incur a substantial overhead, either in terms of runtime or in terms of required surplus memory. You can tweak this balance towards either extreme but you fundamentally cannot remove the overhead. – Konrad Rudolph Jan 13 '17 at 12:22
The "delay" in when memory gets freed up is more of a problem in a virtual-memory system than in a system with limited resources. In the former case, it may be better for a program to use 100MB than 200MB even if the system has 200MB available, but in the latter case there will be no benefit to running the GC earlier than needed unless delays would be more acceptable during some parts of the code than during others. – supercat Jan 13 '17 at 23:17
I fail to see how this attempts to answer the (CS part of the) question. – Raphael Jan 14 '17 at 12:59
1

@Raphael I've explained a well-recognised problem with the principle of garbage collection, which is (or should be) taught in CS as one of its basic disadvantages. I've even given my personal experience of having seen this in practise, to show that it is not a purely theoretical problem. If you failed to understand something about this, I'm happy to talk to you to improve your knowledge of the subject. – Graham Jan 16 '17 at 10:26

score 4 · Answer 9 · edited Apr 13 '17 at 12:48

The question presumes a deallocation is something the programmer is supposed to deduce from other parts of the source code. It's not. "At this point in the program, memory reference FOO isn't useful anymore" is information only known in the mind of the programmer until it's encoded into (in procedural languages) a deallocation statement.

It's not theoretically different from any other line of code. Why don't compilers automatically insert "At this point in the program, check register BAR for input" or "if function call returns nonzero, exit current subroutine"? From the compiler's point of view the reason is "incompleteness", as shown in this answer. But any program suffers from incompleteness when the programmer hasn't told it everything he knows.

In real life, deallocations are grunt work or boilerplate; our brains fill them in automatically and grumble about it, and the sentiment "the compiler could do it just as well or better" is true. In theory, however, that is not the case, although fortunately other languages give us more choice of theory.

"'At this point in the program, memory reference FOO isn't useful anymore' is information only known in the mind of the programmer" -- that's clearly wrong. a) For many FOO, it's trivial to figure this out, e.g. local variables with value semantics. b) You suggest that the programmer knows this, always, which is clearly an overly optimistic assumption; if it were true, we'd have no sever bugs due to bad memory handling is security-critical software. Which, alas, we do. — Raphael, Jan 14 '17 at 13:00
I'm just suggesting that the language was designed for cases where the programmer does know FOO isn't useful anymore. I agree, clearly, this is not usually true, and that's why we need to have static analysis and/or garbage collection. Which, hooray, we do. But the OP's question is, when are those things not as valuable as hand-coded deallocs? — Travis Wilson, Feb 03 '17 at 22:59

score 4 · Answer 10 · answered Jan 15 '17 at 12:07

What is done: There is garbage collection, and there are compilers using reference counting (Objective-C, Swift). Those that do reference counting need help from the programmer by avoiding strong reference cycles.

The real answer to the "why" is that compiler writers haven't figured out a way that is good enough and fast enough to make it usable in a compiler. Since compiler writers are usually quite smart you could conclude that it is very, very hard to find a way that is good enough and fast enough.

One of the reasons that it is very, very hard is of course that it is undecidable. In computer science, when we talk about "decidability" we mean "making the right decision". Human programmers can of course easily decide where to deallocate memory, because they are not limited to correct decisions. And they often make decisions that are wrong.

I fail to see a contribution here. – babou Jan 22 '17 at 12:35 — babou, Jan 22 '17 at 12:35

score 3 · Answer 11 · edited Jan 17 '17 at 14:24

In languages like C, the programmer is expected to insert calls to free. Why doesn't the compiler do this automatically?

Because the lifetime of a memory block is the programmer's decision, not the compiler's.

That's it. This is the design of C. Compiler cannot know what was the intention of allocating a memory block. Humans can do it, because they do know the purpose of every memory block and when this purpose is served so it can be freed. That's part of the design of the program being written.

C is low-level language, so instances of passing a block of your memory to another process or even to another processor are quite frequent. In extreme case, a programmer may intentionally allocate a chunk of memory and never use it again just to put memory pressure on other parts of the system. Compiler has no way of knowing if the block is still needed.

score 0 · Answer 12 · answered Jan 13 '17 at 22:47

In languages like C, the programmer is expected to insert calls to free. Why doesn't the compiler do this automatically?

In C and many other languages, there is indeed a facility to make the compiler do the equivalent of this in those cases in which it is clear at compile time when it should be done: use of automatic-duration variables (i.e. ordinary local variables). The compiler is responsible for arranging for sufficient space for such variables, and for releasing that space when their (well-defined) lifetime ends.

With variable-length arrays being a C feature since C99, automatic-duration objects serve, in principle, substantially all the functions in C that dynamically-allocated objects of computable duration do. In practice, of course, C implementations may place significant practical limits on the use of VLAs -- i.e. their size may be limited as a result of being allocated on the stack -- but this is an implementation consideration, not a language design consideration.

Those objects whose intended usage precludes giving them automatic duration are precisely those whose lifetime cannot be determined at compile time.

Why don't compilers automatically insert deallocations?

12 Answers12

It's an incompleteness problem, not an undecidability problem

Because the lifetime of a memory block is the programmer's decision, not the compiler's.

Linked