What is a safe programming language?

Question

Safe programming languages (PL) are gaining popularity. What is the formal definition of safe PL? For example, C is not safe, but Java is safe. I suspect that the property “safe” should be applied to a PL implementation rather than to the PL itself. If so, how do we define what is a safe PL implementation?

Comments are not for extended discussion; this conversation has been moved to chat. — Gilles 'SO- stop being evil', Jul 06 '18 at 19:19
"the property “safe” should be applied to a PL implementation rather than to the PL itself" - you can call a PL safe if a safe implementation of it exists. — Dmitry Grigoryev, Apr 29 '19 at 08:08

score 100 · Answer 1 · answered Jul 03 '18 at 06:30

100

There is no formal definition of "safe programming language"; it's an informal notion. Rather, languages that claim to provide safety usually provide a precise formal statement of what kind of safety is being claimed/guaranteed/provided. For instance, the language might provide type safety, memory safety, or some other similar guarantee.

answered Jul 03 '18 at 06:30

D.W.

159,275
20
227
470

14

As an addeumdum, if we talk about C vs Java like OP's post : it's the memory safety that is guaranted in Java and not in C. Type safety is provided by both in their own way. (Yeah a lot of people reading this already know that, but maybe some don't). – Walfrat Jul 03 '18 at 12:31
17

@Walfrat That's part of it. Java also has no undefined behaviour, which is something we expect of a language that calls itself 'safe'. As for type systems, I don't think a strong static type system is what people tend to mean by 'safe'. Dynamically typed languages like Python are generally 'safe' after all. – Max Barraclough Jul 03 '18 at 13:24
3

My definition of type safety is the compilation check that handle that. That may not be the formal definition. Note that I said "type safety", not "safe". For me "safe" language refer to "my" definition of "type and memory safety" and I think it may be the most widespread one. Of course i'm not talking of some pitfall like reflection/void pointer in C that compilation cannot handle. Another possible definition of safe are program that don't crash with a segment fault like the unitialized pointer in C. Things like that are generally granted in Python and Java. – Walfrat Jul 03 '18 at 13:52
7

@Walfrat All that gets you though is a language where the syntax is well defined. It doesn't guarantee that execution is well defined - and the number of times I've seen a JRE crash, I can tell you flat out that as a system it isn't "safe". In C on the other hand, MISRA has put work into avoiding undefined behaviour to get a safer subset of the language, and compilation of C into assembler is much better defined. So it does depend on what you consider "safe". – Graham Jul 03 '18 at 15:55
5

@MaxBarraclough - "Java also has no undefined behaviour" - Java has no undefined behaviour in the sense used in the C specifications in the language definition (although it does allow some code to produce values that do not have a single predefined value, e.g. accessing a variable that is being modified in another thread, or by accessing a double or long while it is being modified in another thread, which isn't guaranteed not to produce half of one value mixed in some unspecified way with half of another) but the API specification however has undefined behavior in some cases. – Jules Jul 06 '18 at 10:18
2

Sure, Java allows for indeterminate values to arise from unsynchronised code. It would be impractical to do otherwise with Java's model. That isn't anywhere close to the dangers posed by undefined behaviour, where anything can happen. Also, Java can't completely isolate you from the underlying platform, or indeed the state of your specific device. "C:" doesn't exist on Linux, for instance. What do you mean by "undefined behavior in some cases"? – Max Barraclough Jul 06 '18 at 11:19
@Walfrat: What is memory safety for you? Memory leaks can be created in Java as well as in C (e.g. improper use of callbacks). And what about type safety? Reflection in Java can be abused just as easily as direct memory writes in C (e.g. "making 2 + 2 = 5"). – hoffmale Jul 06 '18 at 23:29
1

@hoffmale: Try those reflection tricks in Java Applets sometime. They don't work anymore. – Joshua Jul 07 '18 at 15:00
1

@hoffmale Memory leaks have nothing to do with memory safety. From Java, you can call native code and then all bets are off. However, the reflective abuse of the integer cache is far from what you can do in C: Overwriting some global data after you explicitly requested hacking via setAccessible versus overwriting anything by a completely innocuous-looking code. The former never happens by mistake (and can be easily forbidden), the latter is a frequent problem. +++ I'm afraid, Java goes as far as practical, but the only way to get 100% memory safety is keeping the computer off. – maaartinus Jul 08 '18 at 17:27
1

@Joshua Forget applets, they're long dead. Applets never allowed setAccessible (except maybe because of a bug I haven't heard about). Anyway, just forget them. – maaartinus Jul 08 '18 at 17:30
1

@maartinus: That was once an important definition of safe language. – Joshua Jul 08 '18 at 18:07
In my evil youth I write a Java class that let you call memory.leak(object) and that object was leaked - it was stashed away in a linked list starting with a private variable with no way for the programmer to ever access it, but visible to the garbage collector so it could never be collected. – gnasher729 Nov 22 '23 at 18:23

martin · Answer 2 · 2018-07-03T09:37:46.637

42

If you can get your hands on a copy of Benjamin Pierce's Types and Programming Languages, in the introduction, he has a good overview on various perspectives on the term "safe language".

One proposed interpretation of the term that you might find interesting is:

“A safe language is completely defined by its programmer’s manual.” Let the definition of a language be the set of things the programmer needs to understand in order to predict the behavior of every program in the language. Then the manual for a language like C does not constitute a definition, since the behavior of some programs (e.g., ones involving unchecked array accesses or pointer arithmetic) cannot be predicted without knowing the details of how a particular C compiler lays out structures in memory, etc., and the same program may have quite different behaviors when executed by different compilers.

So, I would be hesitant to use the term "unsafe" to refer to a programming language implementation. If an undefined term in a language produces different behavior in different implementations, one of the implementations might product behavior that might be more expected, but I wouldn't call it "safe".

edited Jul 03 '18 at 09:37

answered Jul 03 '18 at 09:32

martin

545
3
5

8

The Halting Problem of course says that no matter the language, there will always be programs whose behavior is not predictable from the language definition. So any definition that hinges on "predict the behavior of every program in the language" is fundamentally flawed for any Turing-complete language. – MSalters Jul 03 '18 at 09:53
16

@MSalters This is a popular misunderstanding of the halting problem. The undecidability of the halting problem implies that it is impossible to mechanically derive the behavior of an arbitrary program in a Turing-complete language. But is possible that for any given program, the behavior is predictable. It's just that you can't make a computer program that makes this prediction. – Gilles 'SO- stop being evil' Jul 03 '18 at 12:22
7

@Giles: That is not the case. Suppose there exists a proof of non-termination for every non-terminating program. Then you can enumerate the proofs of non-termination to find whether a given program halts. So the halting problem is decidable. Contradiction. Thus some non-terminating programs are not provably non-terminating. – Kevin Jul 03 '18 at 14:12
9

@Gilles: I'm perfectly aware of the fact that many programs are trivially proven to halt or not. But the statement here is literally about the behavior of every program. The proof of the Halting Theorem shows that there exists at least one program for which that is not true. It's just a non-constructive proof, it won't tell you which program is undecidable. – MSalters Jul 03 '18 at 14:27
3

Halting is predictable behavior though. If the language definition leads to a certain program halting, then the implementation will too. That's generally not what you want, but it's not "unsafe" either. – gardenhead Jul 03 '18 at 14:40
8

@MSalters I think the implied bit is that the statement is about the small-scale behavior of the program, rather than large-scale, emergent behavior. For example, take the Collatz conjecture. The individual steps of the algorithm are simple and well defined, but the emergent behavior (how many iterations until stopping, and if it does at all), is anything but. -- "Predict" is being used informally here. It might be better written as "know how any given statement in an arbitrary program will execute." – R.M. Jul 03 '18 at 15:36
@MSalters A counterexample of a "predictable" behavior would be the C programming language: There are some statements which are valid but which can be compiled in a different way by different compilers because the C standard does not define what exactly these statements mean. This means that it is possible to write a valid C program which runs in a totally different way when different compilers are used. – Martin Rosenau Jul 03 '18 at 16:27
I have read about some statement which behaves differently in the latest Java version compared to older versions. Because Java is said to be a "safe" language (according to the initial question) some other kind of "safety" has to be meant. – Martin Rosenau Jul 03 '18 at 16:30
1

Halting-problem-related complexities may be resolved by saying that a program is safe if it can be guaranteed not to produce bad outputs or behaviors. If a program seems to hang, observation may not be able to prove that it wouldn't eventually produce unacceptable output, but in many proving that a program will either produce correct output, hang, or fail in controlled fashion, and will definitely not do anything else, will be easier than proving that it will always "work". In many cases, however, the simpler proof would meet requirements. – supercat Jul 03 '18 at 21:42
3

@MSalters Predictable doesn't mean provable in any particular logic. – Gilles 'SO- stop being evil' Jul 03 '18 at 21:46
@MartinRosenau: In some languages, it may be possible to guarantee that a particular program will not output some particular text, e.g. "Justin Bieber", when run by a conforming implementation. They might not be able to guarantee everything about a program's behavior, but could guarantee some things it wouldn't do. By contrast, modern compiler writers interpret "Undefined Behavior" as an invitation for a compiler to behave in totally nonsensical fashion not bound by laws of time and causality. – supercat Jul 03 '18 at 21:47
@MSalters How about this definition? A language specification S is safe if we can prove the following: for all programs P and implementations I1 and I2 satisfying S, the result of running P in I1 and I2 is the same (ie same output, both halt or not). Proving this statement doesn't require us to solve the halting problem. Of course, this still leaves most languages unsafe because the specification doesn't tell you when you will run out of memory... – stewbasic Jul 03 '18 at 22:51
@MSalters The undecidability of the halting problem proves that there is no one algorithm that can can predict whether any program halts; it does not show that there exists a program for which no algorithm can predict whether it halts. In fact whether any program halts is predicted by one of these two algorithms: const True and const False. The halting problem doesn't say you can't predict the behaviour of every program, it just says you can't use the same algorithm for each one (and can't compute which algorithm to use from the program you want to use it on). – Ben Jul 05 '18 at 00:24
2

@Gilles In my opinion, it is your statement which raises more red flags about a misconception (a misconception, by the way, that I'm sure you do not actually hold!). I do not know of any specific program whose behavior is "predictable" in any sense I'm aware of that does not boil down to either (i) the program can be evaluated and its behavior observed, or (ii) a proof of the program's behavior can be found in some logical system. Both (i) and (ii) are mechanically derivable properties. This is not so surprising, if you believe that humans are no more powerful than Turing machines. – Caleb Stanford Jul 05 '18 at 00:44
2

On rereading the comments, I think that @MSalters and Gilles are both correct, and have misinterpreted each other. MSalters' point is that no language can be "safe" in the sense that every program's behavior is predictable, if "predictable" includes whether or not the program halts. In particular, if every program you write comes with a guarantee that it will halt or not halt, then the language cannot be Turing complete. And Gilles' point is that for many particular programs, it may still be possible to know the halting behavior. This is also true, but it can't be true for every program. – Caleb Stanford Jul 05 '18 at 00:54
I'm sure people will TL;DR my comments at this point, but It's worth noting: there are many examples of concrete programs for which we don't know whether or not they halt; and we can construct concrete programs for which this is unprovable in ZFC. HOWEVER, such programs are in a sense "useless" precisely because their behavior is unknown to us. It turns out that in practice, we only need programs which we already have in mind a proof of what they do and why they do it correctly. This is the whole idea behind "formally verified software". – Caleb Stanford Jul 05 '18 at 00:58
1

@Ben Your point is correct, but the halting problem gets worse: in any formal system, no matter how strong (say, ZFC or stronger), assuming the system is correct, there are specific programs which do not halt, but for which we can't prove they do not halt. In particular, consider the program that searches for a proof of False from the axioms of ZFC. You can write it yourself in your favorite programming language. This supports MSalters' overall point. – Caleb Stanford Jul 05 '18 at 01:02
3

@MSalters We can certainly predict many behaviours for arbitrary programs in many Turing-complete languages. For example we can predict that no lambda calculus programs will write to the filesystem (purity); that Java code won't dereference unallocated memory (memory safety); etc. These are "trivial" properties, since they hold for all programs in those languages. The halting problem, Rice's theorem, etc. only apply to non-trivial properties (those which only apply to some programs). A "safe language" would be one where "safety" (however it's defined, e.g. memory safety) is a trivial property. – Warbo Jul 06 '18 at 00:18
@Warbo: Given a possibly-halting( ) function and a certainly-unsafe( ) function , the simple program possibly_halting( ); certainly-unsafe( ) already has safety as a non-trivial property. – MSalters Jul 06 '18 at 07:02
2

@MSalters Of course, that's why any language which allows us to write a certainly-unsafe function is an unsafe language. Any language which allows us to write a possibly-unsafe function is also unsafe. Safety is about what's possible, not some particular outcome. C is not memory safe, since it allows unsafe memory usage; regardless of what any particular C program does. Java is memory safe since no program can use memory unsafely; regardless of what any particular Java program does. By analogy: speeding cars on icy roads are unsafe regardless of whether or not any particular one crashes. – Warbo Jul 06 '18 at 09:53
@6005 This is certainly an interesting discussion. Unfortunately, the aforementioned book does not contain any more formal description than the cited one. I guess that it means determinism. A PL specification defines program semantics in the form of a relation from inputs to outputs. If 2 outputs are related to 1 input, both outputs are allowed by the specification. This does not incorporate non-termination. A PL specification is deterministic iff for every program, the corresponding relation is a partial function. I would like to see a generally accepted definition if it exists. – beroal Jul 16 '18 at 12:31

Davislor · Accepted Answer · 2018-07-04T07:31:04.853

When we call a language “safe” in some respect, that formally means that there’s a proof that no well-formed program in the language can do something we consider dangerous. The word “safe” is also used less formally, but that’s what people here understand your question to mean. There are many different definitions of properties we want a “safe” language to have.

A few important ones are:

Andrew Wright and Matthias Felleisen’s definition of “type soundness”, which is cited in many places (including Wikipedia) as an accepted definition of “type safety,” and their 1994 proof that a subset of ML meets it.
Michael Hicks lists several definitions of “memory safety” here. Some are lists of types of errors that cannot occur, and some are based on treating pointers as capabilities. Java guarantees that none of those errors are possible (unless you explicitly use a feature marked unsafe) by having a garbage collector manage all allocations and deallocations. Rust makes the same guarantee (again, unless you explicitly mark code as unsafe), through its affine type system, which requires a variable to be either owned or borrowed before being used at most once.
Similarly, thread-safe code is usually defined as code that cannot exhibit certain kinds of bugs involving threads and shared memory, including data races and deadlocks. These properties are often enforced at the language level: Rust guarantees that data races cannot occur in its type system, C++ guarantees that its std::shared_ptr smart pointers to the same objects in multiple threads will not delete an object prematurely or fail to delete it when the last reference to it is destroyed, C and C++ additionally have atomic variables built into the language, with atomic operations guaranteed to enforce certain kinds of memory-consistency if used correctly. MPI restricts interprocess communication to explicit messages, and OpenMP has syntax to ensure that access to variables from different threads is safe.
The property that memory will never leak is often called safe-for-space. Automatic garbage collection is one language feature to ensure this.
Many languages have a guarantee that its operations will have well-defined results and its programs will be well-behaved. As supercat gave an example of above, C does this for unsigned arithmetic (guaranteed to wrap around safely) but not signed arithmetic (where overflow is allowed to cause arbitrary bugs, because C needed to support CPUs that do wildly-different things when signed arithmetic overflows), but then the language sometimes silently converts unsigned quantities to signed ones.
Functional languages have a large number of invariants that any well-formed program is guaranteed to maintain, for example, that pure functions cannot cause side-effects. These may or may not be described as “safe.”
Some languages, such as SPARK or OCaml, are designed to facilitate proving program correctness. This may or may not be described as “safe” from bugs.
Proofs that a system cannot violate a formal security model (hence the quip, “Any system that’s provably secure probably isn’t.”)

This may or may not be described as “safe” from bugs. Would you please elaborate this a little? What do you mean by "from bugs"? — Nobody, Jul 04 '18 at 08:05
@scaaahu Here’s an example of a website that refers to software formally proven correct as “provably safe.” In this context, it’s referring to software to prevent aircraft from colliding, so it means safe from collisions. — Davislor, Jul 04 '18 at 09:01
I am accepting this answer because it lists types of safety. The type that I had in mind is memory safety. — beroal, Jul 16 '18 at 13:32
While this answer lists some helpful links and a lot of examples, most of them are completely messed up. Garbage collection ensures that memory will never leak or not using "unsafe" blocks automatically gives you safety or signed overflow being Undefined Behavior because С compilers need to support some weird CPUs, seriously? And just a brief word for Ada / SPARK which is the only one of the mentioned languages that does take safety seriously. — user7860670, Apr 13 '19 at 23:08
OCaml is not designed to facilitate proofs of correctness. It may have a sophisticated type system that can make many guarantees, but OCaml isn't a proof assistant. (From the Curry-Howard point of view, OCaml's type system is an inconsistent logic, unlike the type system of e.g. Inria's Coq.) — Del, Jan 21 '20 at 23:20
@TypesLogicsCats Correct, and neither is SPARK. I know people who’ve taken advantage of OCaml’s well-specified behavior to write provably-correct code. — Davislor, Jan 22 '20 at 00:48
@user7860670 You appear to be working with a narrower definition of “safe” than the one I gave in the first sentence of my answer. — Davislor, Jan 22 '20 at 00:51
Thanks for listing the types of safety. I wonder if one more type may be constituted by what 'const' is doing in C++ ? — AlexGenkin, Jul 20 '20 at 17:08
@AlexGenkin You might look into Static Single Assignment form for more detail on that. — Davislor, Jul 20 '20 at 18:12
One more language you could add to your list: Delphi, a low level programming language allows your to build programs that will never leak memory. All mem allocations goes through a memory manager. You forgot to free up some object, the mem manager will scream at you. (A "side effect" of this mem manager is that under Delphi, in general, the heap memory is as fast as the stack memory (in C++)). — IceCold, Aug 03 '20 at 15:07

score 20 · Answer 4 · answered Jul 03 '18 at 14:21

Safe is not binary, it's a continuum.

Informally speaking, safety is meant by opposition to bugs, the 2 most often mentioned being:

Memory Safety: the language and its implementation prevent a variety of memory related errors such as use-after-free, double-free, out-of-bounds access, ...
Type Safety: the language and its implementation prevent a variety of type related errors such as unchecked casts, ...

Those are not the sole classes of bugs that languages prevent, data-race freedom or deadlock freedom is rather desirable, proofs of correctness are pretty sweet, etc...

Simply incorrect programs are rarely considered "unsafe" however (only buggy), and the term safety is generally reserved for guarantees affecting our ability to reason about a program. Thus, C, C++ or Go, having Undefined Behavior, are unsafe.

And of course, there are languages with unsafe subsets (Java, Rust, ...) which purposefully delineate areas where the developer is responsible for maintaining the language guarantees and the compiler is in "hands-off" mode. The languages are still generally dubbed safe, despite this escape hatch, a pragmatic definition.

Most programming languages implementations have unsafe features (e.g. Obj.magic in Ocaml). And in practice, these are really needed — Basile Starynkevitch, Jul 03 '18 at 16:44
@BasileStarynkevitch: Indeed. I would think that any language with FFI necessarily contains some level of unsafe, as calling a C function will require "pining" GC'ed objects and manually ensure that the signatures on both sides match up. — Matthieu M., Jul 03 '18 at 17:06

score 15 · Answer 5 · answered Jul 03 '18 at 09:50

While I don't disagree with D.W.'s answer, I think it leaves one part of "safe" unaddressed.

As noted, there are multiple types of safety promoted. I believe it's good to understand why there are multiple notions. Each notion is associated with the idea that programs suffer especially from a certain class of bugs, and that programmers would be unable to make this specific kind of bug if the language blocked the programmer from doing so.

It should be noted that these different notions therefore have different classes of bugs, and these classes are not mutually exclusive nor do these classes cover all forms of bugs. Just to take D.W.'s 2 examples, the question whether a certain memory location holds a certain object is both a question of type safety and memory safety.

A further criticism of "safe languages" follows from the observation that banning certain constructs deemed dangerous leaves the programmer with the need to come up with alternatives. Empirically, safety is better achieved by good libraries. using code that's already field-tested saves you from making new bugs.

It's rather off-topic for this site, because software engineering is not really a science, but I disagree with your empirical statement. Using good libraries won't save you in unsafe languages, because you aren't protected from using them wrong. Safe languages let you get more guarantees from the library author and let you gain more assurance that you're using them correctly. — Gilles 'SO- stop being evil', Jul 03 '18 at 21:49
I'm with MSalters on this. -"Safe languages let you get more guarantees from the library author and let you gain more assurance that you're using them correctly." This is a non sequitur for all practical purposes. — Captain Giraffe, Jul 04 '18 at 00:05

supercat · Answer 6 · 2018-07-03T21:37:16.987

8

A fundamental difference between C and Java is that if one avoids certain easily-identifiable features of Java (e.g. those in the Unsafe namespace), every possible action one may attempt--including "erroneous" ones--will have an limited range of possible outcomes. While this limits what one can do in Java--at least without using the Unsafe namespace, it also makes it possible to limit the damage that can be caused by an erroneous program, or--more importantly--by a program which would correctly process valid files but is not particularly guarded against erroneous ones.

Traditionally, C compilers would process many actions in Standard-defined fashion in "normal" cases, while processing many corner cases "in a manner characteristic of the environment". If one were using a CPU which would short out and catch fire if numerical overflow occurred and wanted to avoid having the CPU catch fire, one would need to write code to avoid numerical overflow. If, however, one were using a CPU which would perfectly happily truncate values in two's-complement fashion, one didn't have to avoid overflows in cases where such truncation would result in acceptable behavior.

Modern C takes things a step further: even if one is targeting a platform which would naturally define a behavior for something like numerical overflow where the Standard would impose no requirements, overflow in one portion of a program may affect the behavior of other parts of the program in arbitrary fashion not bound by the laws of time and causality. For example, consider something like:

 uint32_t test(uint16_t x)
 {
   if (x < 50000) foo(x);
   return x*x; // Note x will promote to "int" if that type is >16 bits.
 }

A "modern" C compiler given something like the above might conclude that since the computation of x*x would overflow if x is greater than 46340, it can perform the call to "foo" unconditionally. Note that even if it would be acceptable to have a program abnormally terminate if x is out of range, or have the function return any value whatsoever in such cases, calling foo() with an out-of-range x might cause damage far beyond either of those possibilities. Traditional C wouldn't provide any safety gear beyond what the programmer and underlying platform supplied, but would allow safety gear to limit the damage from unexpected situations. Modern C will bypass any safety gear that isn't 100% effective at keeping everything under control.

edited Jul 03 '18 at 21:37

answered Jul 03 '18 at 17:00

supercat

1,261
8
11

Unsigned arithmetic is defined in C. It's modular arithmetic. Try an example with signed integer arithmetic. – David Thornley Jul 03 '18 at 18:06
3

@DavidThornley: Perhaps my example was too subtle. If int is 32 bits, then x will get promoted to signed int. Judging from the rationale, the authors of the Standard expected that non-weird implementations would treat signed and unsigned types in equivalent fashion outside of some specific cases, but gcc sometimes "optimizes" in ways that break if a uint16_t by uint16_t multiply yields a result beyond INT_MAX, even when the result is being used as an unsigned value. – supercat Jul 03 '18 at 20:17
@DavidThornley: IMHO, there's no good reason a quality implementation for non-weird hardware should process a multiply like this in any fashion other than either using modular arithmetic or raising an implementation-defined, possibly-fatal, signal. While there are some cases where allowing computations to yield somewhat non-deterministic values may be allow useful optimizations (e.g. allowing x+1 > y to be optimized to x >= y), the additional "optimization" benefits gained by jumping the rails entirely are relatively minor. – supercat Jul 03 '18 at 20:23
4

Good example. This is one reason why we should always (on GCC or Clang) compile with -Wconversion. – Davislor Jul 03 '18 at 22:31
Java doesn't have namespaces. Perhaps you meant package? – Mike Borkland Jul 04 '18 at 01:02
@supercat They're probably less minor than you think. E.g. they allow a lot of redundant tests to be elided from inline functions and macros. – user253751 Jul 04 '18 at 05:08
@immibis: A lot of redundant tests can be eliminated under an "overflow yields partially-non-deterministic-values" model, and a lot more could be eliminated--without impactiing safety--if the language included "checked assume" directives which would invite a compiler to generate code that would force abnormal termination as soon as it is apparent that a condition hadn't been or wouldn't be met. If code includes a "checked assumption" that x is less than 1000, a compiler could then see if code that only had to handle x values less than that could be smaller than code... – supercat Jul 04 '18 at 15:48
...to handle arbitrary values, and then--if it would be--include code to abnormally terminate the program if x was too big. Unlike an assert, however, the checked assume would not compel a compiler to include a check in cases where it wouldn't benefit code generation elsewhere. From what I can tell, the primary goal of jump-the-rails philosophy is to synthesize assumptions about values, but there's not much correlation between things that can be inferred that way, versus things a compiler would benefit from knowing. Further, jump-the-rails philosophy compels programmers to add tests... – supercat Jul 04 '18 at 15:51
...which would not otherwise be necessary, and which compilers may not be able to optimize out, thus degrading executable code quality. – supercat Jul 04 '18 at 15:52
@MikeBorkland How is a Java package not a namespace? – BlackJack Jul 04 '18 at 15:56
@Davislor: I just tested the above code on godbolt's compiler with -Wconversion and it produces no diagnostics. – supercat Jul 04 '18 at 15:57
@supercat With trivial emendations to make it a MCVE, gcc 7.3.0 with -std=c11 -Wall -Wextra -Wpedantic -Wconversion gave me the diagnostic, warning: conversion to 'uint32_t {aka unsigned int}' from 'int' may change the sign of the result [-Wsign-conversion]. The -Wconversion flag enabled this diagnostic. – Davislor Jul 04 '18 at 18:47
@supercat Clang 6.0.0 with -std=c11 -Wall -Wextra -Wpedantic -Wconversion did not give me any diagnostic either. – Davislor Jul 04 '18 at 18:48
@supercat I also get the diagnostic on Godbolt’s version of GCC 8.1, with the same flags.. – Davislor Jul 04 '18 at 19:01
2

@Davislor: Ah, I just noticed that godbolt reversed the order in which compiler versions are listed, so selecting the last version of gcc in the list yields the latest rather than earliest. I don't think the warning is particularly helpful since it's prone to flag a lot of situations like return x+1; which shouldn't be problematic, and casting the result to uint32_t would stifle the message without fixing the issue. – supercat Jul 04 '18 at 19:12
@supercat It’s not a perfect solution, I agree, but at least it gets me to take another look at that code. In this case. I might miss the subtle pitfall the second time too, but I’d be more likely to write (uint32_t)x*x, which would fix the real problem, just because it’s the quickest fix: It saves me having to type an extra pair of parentheses. And it should make me wonder, “Wait, why does the compiler say that’s a conversion from int?” – Davislor Jul 04 '18 at 20:24
2

@supercat Eliminating tests is pointless if the compiler is required to put the tests back in a different place. – user253751 Jul 04 '18 at 23:05
3

@immibis: A "checked assume" directive may allow a compiler to replace many tests, or a check that would be performed many times within a loop, with a single test which may be hoisted outside a loop. That's better than requiring programmers to add checks that would not b needed in machine code for a program to meet requirements, for the purpose of ensuring that a compiler doesn't "optimize out" checks that are necessary to meet requirements. – supercat Jul 05 '18 at 00:45

score 6 · Answer 7 · answered Jul 05 '18 at 01:32

There are several layers of correctness in a language. In order of increasing abstraction:

Few programs are error free (only those for which correctness can be proven). Others mentioned already that error containment is therefore the most concrete safety aspect. Languages which run in a virtual machine like Java and .net are generally safer in this respect: Program errors are normally intercepted and handled in a defined way.¹
At the next level, errors detected at compile time instead of at run time make a language safer. A syntactically correct program should also be semantically right as much as possible. Of course the compiler cannot know the big picture, so this concerns the detail level. Strong and expressive data types are one aspect of safety on this level. One could say the language should make it hard to make certain kinds of errors (type errors, out-of bound access, uninitialized variables etc.). Run-time type information like arrays which carry length information avoid errors. I programmed Ada 83 in college and found that a compiling Ada program typically contained perhaps an order of magnitude fewer errors than the corresponding C program. Just take Ada's ability to user-define integer types which are not assignable without explicit conversion: Whole space ships have crashed because feet and meters were confused, which one could trivially avoid with Ada.
At the next level, the language should provide means to avoid boilerplate code. If you have to write your own containers, or their sorting, or their concatenation, or if you must write your own string::trim() you'll make mistakes. Since the abstraction level rises this criteria involves the language proper as well as the language's standard library.
These days the language should provide means for concurrent programming on the language level. Concurrency is hard to get right and perhaps impossible to do correctly without language support.
The language should provide means for modularization and collaboration. The strong, elaborate, user-defined types from above help create expressive APIs.

Somewhat orthogonally the language definition should be intelligible; the language and libraries should be well documented. Bad or missing documentation leads to bad and wrong programs.

_{¹ But because usually the correctness of the virtual machine cannot be proven such languages may somewhat paradoxically not be suitable for very strict safety requirements.}

+1 For clear layer by layer explanation. A question for you, Whole space ships have crashed because feet and meters were confused, which one could trivially avoid with Ada., are you talking about Mars Probe Lost Due to Simple Math Error? Do you happen to know the language they were using for that space ship? — Nobody, Jul 05 '18 at 03:43
@scaaahu Yes, I think I was referring to that. No, I don't know the language. Actually, reading the report, it seems that data sent by the probe was processed by software on earth producing a data file which was then used to determine thrust levels. Simple language typing is not applicable in this scenario. Btw., they had multiple issues with the ground software and data file format, a confusion which prevented early detection of the problem. So the story is not a direct argument for strong typing but still a cautionary tale. — Peter - Reinstate Monica, Jul 05 '18 at 07:08

score 2 · Answer 8 · answered Jul 07 '18 at 17:39

Please, do not say that every PL has unsafe commands. We can always take a safe subset.

Every language I know of has ways of writing illegal programs that can be (compiled and) run. And every language that I know of has a safe subset. So, what's your question really?

Safety is multidimensional and subjective.

Some languages have lots of operations that are "unsafe". Others have a fewer such operations. In some languages, the default way of doing something is inherently unsafe. In others, the default way is safe. In some languages, there is an explicit "unsafe" subset. In other languages, there is no such subset at all.

In some languages, "safety" refers exclusively to memory safety — a service offered by the standard library and/or the runtime where memory access violations are made difficult or impossible. In other languages, "safety" explicitly includes thread safety. In other languages, "safety" refers to the guarantee that a program will not crash (a requirement that includes not allowing uncaught exceptions of any kind). Finally, in many languages "safety" refers to type safety — if the type system is consistent in certain ways, it is said to be "sound" (incidentally, Java and C# do not have completely sound type systems).

And in some languages, all the different meanings of "safety" are considered subsets of type safety (e.g. Rust and Pony achieve thread safety through properties of the type system).

I would say that being Turing complete logically precludes any sort of absolute safety. Tautologically stated, if you can write any program then you can write any program. — Max Power, Nov 20 '23 at 05:48

score 0 · Answer 9 · answered Nov 22 '23 at 07:59

Take this C code:

int* p;
*p = 1;

C is a very unsafe language. p is uninitialised, so it could point anywhere. Likely effects are an immediate crash, overwriting memory somewhere with no bad side effect, or overwriting something important, and then Things Go Wrong. Now we change it to

int* p = NULL;
*p = 1;

On many implementations this will crash because that’s storing to NULL will crash on those implementations. But you may have an optimising compiler that throws this and any following code away!

Now in a safe language, the first example would not compile. The compiler must prove that any variable has been assigned a value before it is used. In the second example, assigning NULL is only allowed if p is a nullable variable, but then assigning through that pointer is not allowed unless the value is tested first successfully so the assignment could not happen.

A safe language also needs to handle the case that a pointer becomes invalid. Examples:

int* p = malloc(100);
int* q = p;
free(p);

Now both p and q are invalid and assignment through either will be unsafe; a safe language prevents this. Or

int* p;
{ int x; p = &x; }
*p = 2;

At the time of the assignment p has changed from a valid pointer to an invalid one because x doesn’t exist anymore. A safe language will prevent this.

There are very rare cases where one would want to purposefully do what you labeled an invalid pointer operation. It may not be used in a typical desktop app but there is a use somewhere, which is why the compiler allows it. see Chesterton's Fence. In any case it is good to remember that by the time the source code has been preprocessed, parsed, compiled, optimized, linked and decoded into machine micro-operations, all high-level language safety mechanisms have been stripped out. — Max Power, Nov 28 '23 at 04:15

Max Power · Answer 10 · 2024-01-02T00:31:36.830

All languages are converted to the same instruction set for execution so at a very low-level view no language can be more or less inherently safe than another.

The words "safe" and "safety" have in recent decades been mutilated by certain politically oriented parts of English-speaking society, such that the common usage has almost no definition. However, for technical matters I define "safety" and "safe" as a device(physical or procedural) that prevents the unintentional use of something or that makes accidental use substantially more difficult; and the state of being under the protection of such a device.

So, a safe language has some device to limit a particular class of bugs. Limits by their nature often come with some other inconvenience or outright lack of ability. This inconvenience or inability to do certain things may or may not be an acceptable tradeoff, and must be judged for each situation.

This also should not be taken to imply that "unsafe" languages will result in bugs. For example, I don't have safety corks on my forks and yet I have, without much effort, managed for decades to avoid stabbing my eye while eating. It is certainly less effort to use bare forks with a bit of care and skill than the effort that would be needed to use the fork-cork. (The fork-cork is an old obscure pop culture reference.) The saved effort frees up resources I can use to improve other parts of my life.

Moving code from less safe Objective-C to safer Swift the code often became shorter and more readable. Probably helped by lots of syntactic sugar that made safety easy. — gnasher729, Nov 23 '23 at 17:15
@gnasher729 Overall syntax is mostly independent of safety mechanisms, but I can see how being readable reduces risk. (Hazard mitigation and risk reduction are not synonymous with safe.) The benefits seem to be worth the costs in your case.(Maybe obj-c is just a poor language all around, I don't know it well.) Those languages are highly abstracted from the underlying machine and more concerned with the efficacy of writing code than the efficiency of execution, or ease of writing a compiler for new systems. eg They aren't good picks at all for running bare metal without the support of an OS. — Max Power, Nov 28 '23 at 04:02

What is a safe programming language?

10 Answers10