Why do more powerful programming languages tend to have slower implementations?

Question

I was reading this article. The author talks about "The Blub Paradox". He says programming languages vary in power. That makes sense to me. For example, Python is more powerful than C/C++. But its performance is not as good as that of C/C++.

Is it always true that more powerful languages must necessarily have lesser possible performance when compared to less powerful languages? Is there a law/theory for this?

The definition of "powerful" is apparently quite subjective. I've always considered C and C++ to be much more powerful than Python because it gives you lower level access to what happens in your system. In Python you're much more limited by functionality the language makes available to you. It's pretty telling that some Python libraries are written in C. Python might be easier to code in and more concise and expressive, but that's not the same as being powerful. A grenade might be easier to make and wield than an atom bomb, but I don't think anyone would argue it's a more powerful weapon. — Bernhard Barker, Jun 21 '20 at 18:00
The reason some Python libraries are written in C is that for many, many years, the only available Python implementation had abysmal performance. They are written in C by necessity, not by choice. Compare e.g. to the PyPy Python implementation, where those same libraries are written in either Python or RPython. — Jörg W Mittag, Jun 21 '20 at 18:28
Ada, C++, and Fortran compilers for super-computers, especially the vector oriented ones like NEC SX series, some with computer specific extensions, can generate very fast code. — rcgldr, Jun 21 '20 at 19:17
@rcgldr: But at least with C++, code written for supercomputers tends to avoid language constructs that lead to slow code. I've certainly improved a few such programs by converting C++ stuff to C, and recompiling with the C++ compiler. — jamesqf, Jun 22 '20 at 03:06
@jamesqf - I was mostly thinking of Fortran for vector computers, going back to the Cray-1 days. — rcgldr, Jun 22 '20 at 04:19
I saw an interview once with Charles Oliver Nutter, the lead developer of JRuby. In this interview, he mentioned that he met the team at Oracle that was responsible for one sub-component of one of the three garbage collectors of the Oracle HotSpot JVM, and he realized that this team was bigger than all core teams of all Ruby implementations combined. That is what influences performance the most: how much you invest in performance. This should be a tautology, but for some reason is still surprising to many people: if you invest in performance, you get performance. — Jörg W Mittag, Jun 22 '20 at 08:23
@JörgWMittag It's not just a matter of speed. Python's data model simply lacks any notion of memory beyond an abstract place where objects reside. Anything that needs direct hardware access necessarily requires an extension to the Python implementation itself. — chepner, Jun 22 '20 at 14:47
You may edit your question to define "more powerful" as something like "more high-level and quick and easy to use". — Panzercrisis, Jun 22 '20 at 16:32
Saying Python is more powerful than C/C++ is a bit like watching someone heat up a frozen TV dinner in nine minutes and thinking that they've somehow outperformed a Michelin starred chef. Because... quick and easy? — J..., Jun 23 '20 at 11:56
You are twisting the words from the essay: he wasn't saying that higher level languages (e.g. Python) are necessarily slower, in every possible case than lower level ones (e.g. C). It's a heuristic, a rule of thumb. And those aren't a good fit for a deterministic boolean "this is true and this is false" outlook. 80% of the time it works every time. — Jared Smith, Jun 23 '20 at 12:17
@Jörg W Mittag: But I think that example rather misses the point. Of course a high-performance garbage collector performs better than a low-performance one (kinda by definition :-)), but will a language with the high-performance collector beat one with no garbage collector at all? Or ~20K lines using C++ strings to process text, vs ~1K lines of flex & yacc to do the job ~100 times faster? — jamesqf, Jun 23 '20 at 16:37
Until you define "more powerful" & "performance" & "slower", you haven't asked a question. — philipxy, Jun 24 '20 at 06:35
@BernhardBarker But you're not talking about grenade vs. atomic bomb. You're talking about tools that make it easier to build an atomic bomb. One allows you to go from idea to implementation in a few days, the other goes through years of tweaking, but gives you more bombs per day. The resulting product has little to do with the tools. — Luaan, Jun 24 '20 at 07:12
Python is more powerful than C/C++! The words "power" and "powerful" need a new entry in Oxford dictionaries. — Delphi.Boy, Jun 26 '20 at 17:48

score 56 · Answer 1 · edited Jun 21 '20 at 17:54

56

This is simply not true. And part of why it's false is that the premise isn't well formed.

There is no such thing as a fast or slow language. The expressive power of a language is purely a function of its semantics. It is independent of any particular implementation.

You can talk about the performance of code generated by GCC, or about the performance of the CPython interpreter. But these are specific implementations of the language. You could write a very slow C compiler, and you can write Python interpreters that are quite fast (like PyPy).

So the answer to the question of "is more power necessarily slower" is no, purely because you or I can go write a slow C compiler, that has the same expressive power as GCC, but that is slower than Python.

The real question is "why do more powerful languages tend to have slower implementations." The reason is that, if you're considering the C vs Python, the difference in power is abstraction. When you do something in Python, there is a lot more that is implicit that is happening behind the scenes. More stuff to do means more time.

But there's also lots of social elements at play. People who need high performance choose low level languages, so they have fine grained control of what the machine is doing. This has led to the idea that low level languages are faster. But for most people, writing in C vs Python will have pretty comparable performance, because most applications don't require that you eke out every last millisecond. This is particularly true when you consider the extra checks that are manually added to program defensively in C. So just because lots of specialists have built fast things in C and C++ doesn't mean they're faster for everything.

Finally, some languages have zero cost abstraction. Rust does this, using a type system to ensure memory safety without needing runtime garbage collection. And Go has garbage collection, but it's so fast that you get performance on par with C while still getting extra power.

The TLDR is that more powerful languages are sometimes faster in some cases, but this is not a firm rule, and there are exceptions and complications.

edited Jun 21 '20 at 17:54

Kyle Jones

8,091
2
29
51

answered Jun 21 '20 at 06:52

Joey Eremondi

29,754
5
64
121

11

First you say "this is simply not true", and then you say "when you do something in Python, there is a lot more that is implicit that is happening behind the scenes", which flatly contradicts the first statement. It is true, and you are providing the correct reason for it. Yes, there are fast and slow languages, as you clearly demonstrate. Why you come to the opposite conclusion is beyond me. – Peter - Reinstate Monica Jun 22 '20 at 07:38
12

@Peter-ReinstateMonica The point is that a language by itself cannot be "fast" or "slow". It's its implementation, its compiler or interpreter, that will produce code that runs faster or slower. Even C, with the same compiler, can itself produce slower of faster executions depending on the optimization. Which one is the speed of the language? You can have languages easier to implement faster because they are less abstract, but nothing prohibit you to implement an abstract language so that you produce really fast execution. – bracco23 Jun 22 '20 at 09:55
@bracco23, are you disagreeing with the Python being inherently slower than C++ (modulo cases when c++ compiler is just bad)? – RiaD Jun 22 '20 at 10:27
10

@bracco23 Re "nothing prohibit you to implement an abstract language so that you produce really fast execution": jmite actually gave a good reason which prevents it ("here is a lot more that is implicit that is happening behind the scenes"). It's inherent in languages which e.g. carry run time information with their types etc. (even in C++ RTTI carries a small cost). – Peter - Reinstate Monica Jun 22 '20 at 11:58
1

@RiaD If you say "modulo cases when c++ compiler is just bad", you also need to allow "modulo cases when Python compiler/interpreter is just bad". And yes, that's exactly the point. – Bergi Jun 22 '20 at 16:21
6

@Bergi That's wrong though: an optimal Python implementation (given today's machines and software engineering state of the art) will be slower than an optimal C++ implementation. The Python implementation simply has more work to do at runtime, and there's only so much we can do to optimise it. – Konrad Rudolph Jun 22 '20 at 19:57
1

The issue here is that none of this has anything to do with what Paul Graham was talking about, namely practice, not theory. Which probably makes the question a poor fit for this site. In theory there's no inherent trade-off between level of abstraction and performance, in practice there often is. The OPs framing though is certainly an exaggerated one. – Jared Smith Jun 22 '20 at 20:17
@RiaD: C++ code (probably) runs really slow on LISP machines, since they're designed from the ground-up to run LISP. It's about the compiler and hardware. Languages are just a set of symbols, they don't have a speed. – Mooing Duck Jun 22 '20 at 22:58
9

I have to disagree with this answer, and my view is similar to Peter's. There is such a thing as a slow language, a language that cannot be implemented efficiently. For example, I can say that my language contains a primitive operation that is the busy beaver function, or to be less extreme some function that has very bad time complexity. Languages specify behavior, but if that behavior cannot be implemented efficiently then there will be no efficient implementation of the language. JS engines today are a great example of optimized implementations of an inefficient language. – Mario Carneiro Jun 23 '20 at 04:25
2

@JaredSmith I don't know about abstraction -- that's a very general and perhaps hard-to-define term. Let's look at a concrete example, virtual functions. C does not have those. Each function call to a statically linked function is essentially a jump to a (simplified) compile-time known, hard-wired address. Calling a virtual function in C++, by contrast, necessarily (in the general case) needs an indirection because the address must be looked up at run time! That is necessarily slower, there is no way around it. (Which is the reason that C++ retained non-virtual functions at all.) – Peter - Reinstate Monica Jun 23 '20 at 07:18
@Peter-ReinstateMonica virtual function lookup overhead is a great example. But as for abstraction being too vague and general, that speaks to my point: Graham was gesturing at a general concept of (and I know I'm being imprecise again here) "higher-level" languages being generally slow than "lower-level" language when writing code that is idiomatic for the language. The OP framed it more precisely to try to get a definitive answer, but in doing so went way beyond what Paul Graham was saying in order to turn a heuristic "rule-of-thumb" into a falsifiable conjecture. – Jared Smith Jun 23 '20 at 12:12
C is a "slow language" on (now ancient) 6502- and Z80-based computers - compared to assembly, a "fast language". – user253751 Jun 23 '20 at 16:13
1

@Mario Soo C++ is a "slow language" then because there are certain parts of its specification that cannot be implemented efficiently (hello streams)? And how do you define a "slow feature" anyhow? What if you can avoid that slow feature anyhow and implement the same functionality fast in the language itself? What if the "fast language" can't provide the equivalent functionality? – Voo Jun 23 '20 at 18:11
1

@Peter You're aware that HotSpot for example can optimize virtual calls to a simple jump as well given certain circumstances? So much for "necessarily slower". – Voo Jun 23 '20 at 18:14
1

@Voo Yes, C++ is a slow language if you use its slow features. Exceptions and RTTI are also famous examples of slow C++. Of course C++ is a monster language containing every language feature from every discipline, so it's not really a contradiction that C++ can be simultaneously a fast language and a slow language, if you focus on different subsets of it. – Mario Carneiro Jun 23 '20 at 18:24
3

@Voo Regarding optimizing virtual calls, indeed a very common approach to compiling inefficient languages is to recognize when it is being used in a specific way that doesn't require the full power of the inefficient pattern and can be optimized to a more restrictive and efficient pattern. Type specialization in V8 is like this. As long as you are not really using the inefficient feature, you can maybe optimize it away. But if you actually use the feature as it was designed, the optimizer can't do anything about it. – Mario Carneiro Jun 23 '20 at 18:32
@Mario So C++ is slow, except if you don't use certain features of C++. But actually it's not certain features it's only if you use certain features in specific ways and so on. Also I see you're thinking that HotSpot only optimizes virtual calls if there's a single implementation - that's actually not the case (at which point, how many different implementations do you need that you're using virtual calls "correctly"?). Seems like a not particularly useful way to categorize languages if you need dozens of asterisks everywhere.. – Voo Jun 23 '20 at 19:11
3

@Voo it's useful, but necessarily fuzzy. It's a heuristic, not a hard and fast rule. – Jared Smith Jun 23 '20 at 19:20
@Jared Why is it useful if it has no practical purpose? (Judging the speed of implementations is useful, but languages? Only if you are a fanboy and want to argue how much better language X over Y is). Particularly after we already had to concede that it depends on the usage as well and not just the language itself. – Voo Jun 23 '20 at 19:20
3

Tell a gamedev who just spent a 70 hour week tweaking their C++ for perf just to get an acceptable frame rate that they should have used Python instead and see what they say. Just because the signal to noise ratio is low doesn't mean it's zero. – Jared Smith Jun 23 '20 at 19:24
@Jared Nobody is arguing that certain implementations of C++ aren't faster than say CPython for things like game development, that's not the point. But if languages are fast or slow, what about languages that are only fast for very specific things like say CUDA? Is Cuda fast or slow now? (What about things where CUDA is fast but only if you have the correct hardware?). But really this is just sophistry, even if I enjoy the argument :-) – Voo Jun 23 '20 at 19:28
3

There are parts of specification which will necessarily make a proper implementation slower. For example, having garbage collector will make any implementation slower than requiring the programmer to manually handle memory. Similarly, dynamic typing is inherently slower than static typing. Interpreted vs compiled languages are another example. – Noctiphobia Jun 23 '20 at 20:30
So garbage collection is a good example here. Languages don't say "you have to have a garbage collector", that's an implementation choice. But the language will say that free doesn't have to be marked, and now the implementation needs to either statically determine where to free (like Rust), or dynamically determine it (with a GC). But the only reason Rust works is because it has a type system that is expressive enough for programmers to give the necessary information to the compiler. In theory, you could static analyze java code and figure out ownership, but in practice that's impossible. – Mario Carneiro Jun 23 '20 at 23:46
@Voo "how many different implementations do you need that you're using virtual calls "correctly"": Strictly speaking, if you get enough implementations the table jump implemetation will become slower than an actual indirect jump. Exactly where this cutoff is depends on cache locality so very application dependent. But you've forgotten that the compiler doesn't necessarily know all the jump targets in advance, for example if you dynamically load code or link with external code that is compiled separately. – Mario Carneiro Jun 23 '20 at 23:55
3

Also, this approach of assuming that if a language feature isn't used then it has no performance cost is very expensive in terms of compiler development. It's all well and good to say that Python can be just as fast as C with a compiler that notices that you actually wrote C code in python syntax, but this puts near magical analysis requirements on the compiler, and this is only reasonable if you know that the compiler has received a lot of optimization development time (like gcc/llvm). So it is also reasonable to characterize languages based on the speed of a "naive" base implementation. – Mario Carneiro Jun 24 '20 at 00:05
1

@MarioCarneiro Re "In theory, you could static analyze java code and figure out ownership, but in practice that's impossible.": In theory, I strongly suspect that would be halting problem-hard (maybe dependent on how you formalize it). As you pointed out, rust can avoid GC exactly because it requires sufficient information from the programmer; that's a property of the language which allows a more efficient implementation. – stewbasic Jun 24 '20 at 05:11
1

@stewbasic I expect you are right, although that has never stopped people from writing optimizers that do it anyway if the need is great enough. They just use heuristics until the benchmark numbers improve. I would argue that C++ is in exactly this space: it is a slow language which has had a lot of master class optimization work put into it in order to keep up the impression that it is a fast language. There is no way you could get good performance out of a naive C++ compiler on normal looking C++ code. – Mario Carneiro Jun 24 '20 at 05:27
@KonradRudolph One of the main counters to that is putting much of what is interpreted the first time in a .pyc so it doesn't need to be interpreted again the second time. An attempt to achieve the best of both worlds, which works well if done right. Of-course, a lot in software is not done right. – Mast Jun 24 '20 at 07:22
2

@Mast When I said that Python has to do more at runtime I wasn’t talking about byte compilation, which is a trivial, one-off up-front cost. I was talking about the need to keep track of type information, look up names dynamically, etc. .pyc files really only provide runtime savings for very short-running scripts. – Konrad Rudolph Jun 24 '20 at 07:48
1

@Voo yeah, me too. The problem is that we should be hashing this out over adult beverage of choice for our entertainment, I worry that we've wondered a bit afield from the meat of the question. – Jared Smith Jun 24 '20 at 16:46

Jörg W Mittag · Answer 2 · 2020-06-21T18:35:58.510

Is it always true that more powerful languages must necessarily have lesser possible performance when compared to their less powerful counterparts? Is there a law/theory for this?

First off, we need to make one thing clear: languages don't have "performance".

A particular program written in a particular programming language executed on a particular machine in a particular environment under particular conditions using a particular version of a particular implementation of the programming language has a particular performance. This does not mean that all programs written in that language have a particular performance.

The performance that you can attain with a particular implementation is mostly a function of how many resources, how much money, how many engineers, etc. are invested to make that implementation fast. And the simple truth is that C compilers have more money and more resources invested in them than Python implementations. However, that does not mean that a Python implementation cannot be fast. A typical Python implementation has about as many full-time engineers as a typical C compiler vendor has full-time custodians that re-fill the developers' coffee machines.

Personally, I am more familiar with the Ruby community, so I will give some examples from there.

The Hash class (Ruby's equivalent to Python's dict) is written in 100% C in YARV. In Rubinius, however, it is written (mostly) in Ruby (relying only on a Tuple class that is partially implemented using VM primitives).

The performance of Hash-intensive benchmarks running on Rubinius is not significantly worse than running on YARV, which means that at least for those particular combinations of benchmark, language, operating system, CPU, environment, load, etc. Ruby is about as fast as C.

Another example is TruffleRuby. The TruffleRuby developers set up an interesting benchmark: they found two Ruby libraries that use lots Ruby idioms that are thought to be notoriously hard to optimize, such as runtime reflection, dynamically calculating method names to call, and so on. Another criterion they used, was that the Ruby library should have an API compatible replacement written as a YARV C extension, thus indicating that the community (or at least one person in it) deemed the pure Ruby version too slow.

What they then did, was create some benchmarks that heavily rely on those two APIs and run them with the C extensions on YARV and the pure Ruby versions on TruffleRuby. The result was that TruffleRuby could execute the benchmarks on average at 0.8x the performance of YARV with the C extensions, and at best up to 21x that of YARV, in other words, TruffleRuby was able to optimize the Ruby code to a point where it was on average comparable to C, and in the best case, over 20x faster than C.

[I am simplifying here, you can read the whole story in a blog post by the lead developer: *Pushing Pixels with JRuby+Truffle].

That does, however, not mean that we can simply say "Ruby is 20x faster than C". It does, however, show that clever implementations for languages like Ruby (and Python, PHP, ECMAScript, etc. are not much different in that regard) can achieve comparable, and sometimes even better, performance than C.

There are more examples that demonstrate how throwing money at the problem increases performance. E.g. until companies like Google started to develop entire complex applications in ECMAScript (GMail, Google Docs, Google Wave [RIP], MS Office online, etc.), nobody really cared about ECMAScript performance. Sure, there were browser benchmarks, and browser vendors tried to improve them bit by bit, but there was no serious effort to build a fundamentally high-performance ECMAScript engine. Until Google built V8. Suddenly, all other vendors also invested heavily in performance, and within just a few years, ECMAScript performance had increased by a factor of 10 across all implementations. But the language had not changed at all in that time! So, the exact same language suddenly became "10 times faster", just by throwing money at it.

This should show that performance is not an inherent characteristic of the language.

One last example is Java. The original JVM by Sun was dog-slow. Along came a couple of Smalltalk guys who had developed a high-performance Smalltalk VM (the Animorphic Smalltalk VM) and noticed that Smalltalk and Java were very similar, and they could easily build a high-performance JVM using the same ideas. Sun bought the company (which is ironic, because the same developers had already built the high-performance Self VM based on the same ideas while employed at Sun, but Sun let them go just a couple of years earlier because they wanted to focus on Java and not Self as their new language), and the Animorphic Smalltalk VM became the Sun HotSpot JVM, still the most widely-used JVM to date.

(Interestingly, the team that built V8 includes key people of the team that built HotSpot, and the ideas behind V8 are – not surprisingly – also based on the Animorphic Smalltalk VM.)

Lastly, I would also like to point out that we have only talked about languages and language implementations (interpreters, compilers, VMs, …) here. But there is a whole environment around those. For example, modern CPUs contain quite a lot of features that are specifically designed to make C-like languages fast, e.g. branch prediction, memory prefetching, or memory protection. None of these features really help languages like Java, ECMAScript, PHP, Python, or Ruby. Some (e.g. memory protection) even have the potential to slow them down. (Virtual memory can impact garbage collection performance, for example.) The thing is: these languages are memory-safe and pointer-safe, they don't need memory protection because they fundamentally do not allow the operations that memory protection protects agains in the first place!

On a CPU and an OS that were designed for such languages, it would be much easier to achieve higher performance. If you really wanted to do a fair benchmark between, say, C and Python, you would have to run the Python code on a CPU that has received just as many optimizations for Python as our current mainstream CPUs have for C.

You might find some more interesting information in these questions:

Apart from the argument about throwing more money on an implementation, one may also consider what the same amount of money is spent on. Higher-level languages might want to focus on implementing the high-level features at all, and all the documentation, debuggability, and tooling surrounding this, rather than squeezing the last bit of performance out of micro-optimisations. It's developer efficiency that matter more for the more powerful languages. — Bergi, Jun 22 '20 at 16:31
Your TruffleRuby example is close to making the same mistake you caution against: you say "... can achieve comparable, and sometimes even better, performance than C." But it's not C in general you're comparing against. It's actually one specific implementation of some functions, compiled by a specific compiler for a given ISA, running on some specific hardware. You haven't ruled out the possibility of big optimizations to the version that uses some C. Perhaps it's reasonable to take it as "the best C can do on that platform", perhaps not. — Peter Cordes, Jun 22 '20 at 19:36
Hardware features like branch prediction (which enables out-of-order speculative execution) is important for running pretty much any machine code, regardless of whether it was compiled ahead-of-time by a C compiler, generated on the fly by a JVM JIT-compiler, or was compiled ahead-of-time for a Python interpreter written in C (CPython). Memory prefetching is important for any code that loops over arrays of anything. Even if it loops very slowly, OoO exec can't hide the entire DRAM latency, so a slow interpreter would still stall for a while. — Peter Cordes, Jun 22 '20 at 19:41
CPUs that execute high-level bytecode directly or have features dedicated to supporting it have been tried and rejected (e.g. ARM Jazelle, or Lisp machines) in favour of traditional CPUs that execute normal machine code, leaving it to software to JIT-compile. — Peter Cordes, Jun 22 '20 at 19:42
Time is also money, part of the reason C and C++ are so much faster is not just the money but that their compilers have a decade long (or longer!) lead time on a lot of the other languages still in use. — Jared Smith, Jun 23 '20 at 12:20
An additional example wrt Java might be certain concurrent problems that are actually vastly faster (and easier to implement) in Java than C++ because you can take advantage of the GC. Cliff Click from Azul systems talked about that in some ancient blog post - I think it was about his concurrent hashmap and the advantages of not having to worry about memory leaks and thereby avoiding fences. — Voo, Jun 23 '20 at 18:09
@PeterCordes Itanium had a comfy spot somewhere in the middle, but it hit the biggest problem of them all - it's a new thing people have to learn, and it isn't widely applicable. Compatibility almost always wins. — Luaan, Jun 24 '20 at 07:04
@Voo There's plenty of examples like that. For example, in .NET, the heap behaves pretty much like a stack for memory allocation; this generally means that allocating memory is much faster and cheaper than in a typical C/C++ allocator, and often the savings are big enough that they swallow the cost of the GC. But you'll always get the same response from the C++ crowd - you could do that manually in C++. Which is true, but also completely missing the point - you could do all that in assembly, but we don't code in assembly much, do we? :) — Luaan, Jun 24 '20 at 07:06
@Luaan: The real advantage of GC for concurrent problems is that GC solves the deallocation problem. In C++, the consumer side of a queue has to do the freeing (or queue the nodes again for later free). It thus has to make sure that no other thread could still have a pointer to the node before handing that memory back to the OS or using it for something else. This is hard, unless you already have a GC mechanism to lean on. — Peter Cordes, Jun 24 '20 at 07:09
@Luaan Also very true, but I was thinking about not having to worry about deallocating nodes as Peter describes. If you want a non-blocking hashmap implementation, this is pretty hard to do without a GC that handles it in the background. — Voo, Jun 24 '20 at 07:44

score 8 · Accepted Answer · edited Oct 28 '21 at 16:26

TL;DR: Performance is a factor of Mechanical Sympathy and Doing Less. Less flexible languages are generally doing less and being more mechanically sympathetic, hence they generally perform better out of the box.

Physics Matter

As Jorg mentioned, CPU designs today co-evolved with C. It's especially telling for the x86 instruction set which features SSE instructions specifically tailored for NUL-terminated strings.

Other CPUs could be tailored for other languages, and that may give an edge to such other languages, but regardless of the instruction set there are some hard physics constraints:

The size of transistors. The latest CPUs feature 7nm, with 5nm being experimental. Size immediately places an upper bound on density.
The speed of light, or rather the speed of electricity in the medium, places on an upper bound on the speed of transmission of information.

Combining the two places an upper bound on the size of L1 caches, in the absence of 3D designs – which suffer from heat issues.

Mechanical Sympathy is the concept of designing software with hardware/platform constraints in mind, and essentially to play to the platform's strengths. Language Implementations with better Mechanical Sympathy will outperform those with lesser Mechanical Sympathy on a given platform.

A critical constraint today is being cache-friendly, notably keeping the working set in the L1 cache, and typically GCed languages use more memory (and more indirections) compared to languages where memory is manually managed.

Less (Work) is More (Performance)

There's no better optimization than removing work.

A typical example is accessing a property:

In C value->name is a single instruction (lea).
In Python or Ruby, the same typically involves a hash table lookup.

The lea instruction is executed in 1 CPU cycle, an optimized hash table lookup takes at least 10 cycles.

Recovering performance

Optimizers, and JIT optimizers, attempt to recover the performance left on the table.

I'll take the example of two typical optimizations for JavaScript code:

NaN-tagging is used to store a double OR a pointer in 8 bytes. At run-time, a check is performed to know which is which. This avoids boxing doubles, eliminating a separate memory allocation and an indirection, and thus is cache-friendly.
The V8 VM optimizes dynamic property lookups by creating a C-like struct for each combination of properties on an object, hence going from hash table lookup to type-check + lea – and possibly lifting the type-check much earlier.

Thus, to some extent, even highly flexible languages can be executed efficiently... so long as the optimizer is smart enough, or the developer makes sure to massage the code to just hit the optimizer's sweet spot.

There is no faster language...

... there are just languages that are easier to write fast programs in.

I'll point to a serie of 3 blog articles from 2018:

Nick Fitzgerald explained how he sped up a JS library by writing the core algorithm in Rust and compiling it to WebAssembly: https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with-rust-and-webassembly/
Vyacheslav Egorov (V8 developer) explained how you could massively speed up the JS library by making sure to hit V8 sweet spots (and some algorithmic improvements): https://mrale.ph/blog/2018/02/03/maybe-you-dont-need-rust-to-speed-up-your-js.html
Nick concluded with Speed without Wizardry, with a less flexible language (Rust) there was no need for expert's knowledge, nor for tuning for 1 specific JS engine (possibly at the expense of others): https://fitzgeraldnick.com/2018/02/26/speed-without-wizardry.html

I think the latter article is the key point. More flexible languages can be made to run efficiently with expert's knowledge, and time. This is costly, and typically brittle.

The main advantage of less flexible languages – statically typed, tighter control on memory – are that they make optimizing their performance more straightforward.

When the language's semantics already closely match the platform sweet spot, good performance is straight out of the box.

score 6 · Answer 4 · answered Jun 24 '20 at 08:03

In general, it's about what the language and its implementors are trying to do.

C has a long culture of keeping things as close to the hardware as possible. It doesn't do anything that could easily be translated into machine code at compile time. It was intended as a multi-platform kind of low level language. As time went on (and it was a lot of time!), C became sort of a target language for compilers in turn - it was a relatively simple way to get your language to compile for all the platforms that C compiled for, which was a lot of languages. And C ended up being the API-system of choice for most desktop software - not because of any inherent qualities in the way C calls things or shares header files or whatever, but simply because the barrier to introducing a new way is very high. So again, the alternatives usually sacrifice performance for other benefits - just compare C-style APIs with COM.

That isn't to say that C wasn't used for development, of course. But it's also clear that people were well aware of its shortcomings, since even people doing "hard-core" stuff like OS development always tried to find better languages to work with - LISP, Pascal, Objective-C etc. But C (and later C++) remained at the heart of most system-level stuff, and the compilers were continuously tweaked to squeeze out extra performance (don't forget there's ~50 years of C by now). C wasn't significantly improved in capabilities over that time; that was never seen as particularly important, and would conflict with the other design pillars.

Why do you design a new language? To make something better. But you can't expect to get everything better; you need to focus. Are you looking for a good way to develop GUIs? Build templates for a web server? Resolve issues with reliability or concurrency? Make it easier to write correct programs? Now, out of some of those, you may get performance benefits. Abstraction usually has costs, but it can also mean you can spend more of your time performance tweaking small portions of code.

It's definitely not true that using a low-level language (like C) will net you better performance. What is true, is that if you really really want to, you can reach the highest performance with a low-level language. As long as you don't care about the cost, maintainability and all that. Which is where economies of scale come in - if you can have a 100 programmers save performance for 100M programmers through a low-level tweak, that might be a great pay off. The same way, a lot of smart people working on a good high-level language can greatly increase the output of a lot more people using that language.

There is a saying that a sufficiently powerful compiler will be able to eliminate all the costs of high-level languages. In some sense, it's true - every problem eventually needs to be translated to a language the CPU understands, after all. Higher level abstractions mean you have fewer constraints to satisfy; a custom .NET runtime, for example, doesn't have to use a garbage collector. But of course, we do not have unlimited capacity to work on such compilers. So as with any optimisation problem, you solve the issues that are the most painful to you, and bring you the most benefit. And you probably didn't start the development of a new, high level language, to try to rival C in "raw" power. You wanted to solve a more specific problem. For example, it's really hard to write high-performance concurrent code in C. Not impossible, of course. But the "everything is shared and mutable by default" model means you have to either be extremely careful, or use plenty of guards everywhere. In higher level languages, the compiler or runtime can do that for you, and decide where those can be omitted.

More powerful programming languages tend to have slower implementations because fast implementations were never a priority, and may not be cost effective. Some of the higher level features or guarantees may be hard to optimise for performance. Most people don't think performance should trump everything - even the C and C++ people are using C or C++, after all. Languages often trade run-time, compile-time and write-time performance. And you don't even have to look at languages and their implementations to see that - for example, compare the original Doom engine with Duke Nukem 3D. Doom's levels need significant compile-time - Duke's can be edited in real-time. Doom had better runtime performance, but it didn't matter by the time Duke launched - it was fast enough, and that's all that matters when you're dealing with performance on a desktop.

What about performance on a server? You might expect a much stronger focus on performance in server software. And indeed, for things like database engines, that's true. But at the same time, servers are flooded with software like PHP or Node.js. Much of what's happening in server-space shifted from "squeeze every ounce of performance from this central server node" to "just throw a hundred servers at the problem". Web servers were always designed for high concurrency (and decentralisation) - that's one big reason why HTTP and the web were designed to be state-less. Of course, not everyone got the memo, and it's handy to have some state - but it still makes decoupling state from a particular server much easier. PHP is not a powerful language. It's not particularly nice to work with. But it provided something people needed - simple templating for their web sites. It took quite a while for performance to become an important goal, and it was further "delayed" by sharding, caching, proxying etc. - which were very simple to do thanks to the limitations of PHP and HTTP.

But surely, you'll always write an OS in C/C++? Well, for the foreseeable future on the desktop, sure. But not because of raw performance - the trump card is compatibility. Many research OSes have cropped up over time that provide greater safety, security, reliability and performance (particularly in highly concurrent scenarios). A fully memory managed OS makes many of the costs of managed memory go away; better memory guarantees, type safety and runtime type information allow you to elude many runtime checks and costs with task switching etc. Immutability allows processes to share memory safely and easily, at very low cost (heck, many of Unix strengths and weaknesses come from how fork works). Doing compilation on the target computer means you can't spend so much time optimising, but it also means you are targeting a very specific configuration - so you can always use the best available CPU extensions, for example, without having to do any runtime checks. And of course, safe dynamic code can bring its own performance benefits too (my software 3D renderer in C# uses that heavily for shader code; funnily enough, thanks to all the high-level language features, it's much simpler, faster and more powerful than e.g. the Build engine that powers Duke Nukem 3D - at the cost of extra memory etc.).

We're doing engineering here (poor as it may be). There's trade-offs to be had. Unless squeezing every tiny bit of performance out of your language gives you the greatest possible benefit, you shouldn't be doing it. C wasn't getting faster to please C programmers; it was getting faster because there were people who used it to work on stuff that made things faster for everyone else. That's a lot of history that can be hard to beat, and would you really want to spend the next 50 years catching up with some low-level performance tweaks and fixing tiny incompatibilities when nobody would want to use your language in the first place because it doesn't provide them with any real benefit over C? :)

"C has a long culture of keeping things as close to the hardware as possible. It doesn't do anything that could easily be translated into machine code at compile time. It was intended as a multi-platform kind of low level language." – I think that relationship has been reversed recently. Now, it's the CPU vendors who work hard to present emulation layers to make their CPUs look like a C Abstract Machine, precisely because C is not "portable assembly" or "close to the machine", but "close to the machine, as long as that machine is the PDP-11". — Jörg W Mittag, Jun 25 '20 at 11:21

Quantum Mechanic · Answer 5 · 2021-01-08T13:22:55.350

I reject the premise of "More powerful programming languages tend to have slower implementations."

"Power" is subjective. Is it faster? More robust? More exact? More efficient? More capable?

A nuclear warhead is very powerful, but not very precise.
An acupuncture needle is very precise, and can be very powerful, but it is only leveraging the underlying neural system.
Lisp is very powerful, very precise, and yet, (some) people find it an awkward language.
APL is very very powerful, precise, and succinct. But it requires a special keyboard (or mapping), and is sometimes labelled as too difficult to teach (though it's probably fairer to say it's not for everyone).
Pascal isn't very powerful, but is fairly precise. It was designed as a teaching language, and also an experiment to prove that a one-pass compiler is practical. (Leave it to Microsoft to distribute a 3-pass compiler for a 1-pass language.)
Python, Perl, Java, etc. These are easier to write in for most people, there are loads of libraries, tutorials, and online projects for examination. Many of these languages don't have "pointers" as such, but do have "references", which are more consistent with the language -- you don't have to bother with pointer arithmetic, wrap-around, and other implementation-specific details. Indeed, these were meant to be on most, if not all, hardware. They are an abstraction up from C and C compilers, making their programs more widely applicable without recompiling. But they lose some performance for this flexibility.
Turing machines: the most powerful, and yet, when was the last time you wrote a program? Performance is awful, because, in all but pathological cases, there are better implementations.
GOL (Game Of Life): since it's Turing complete, it's just as powerful, yet the performance is worse than a direct Turing machine implementation in the same context.

I don't understand how this answers the question. I can't understand what premise in the question you are rejecting. The question doesn't directly state any premises that I can see. The question doesn't mention "precise" etc. so I don't see how those are relevant. If you can [edit] your answer to make it clearer how it addresses the question, I encourage you to do so. — D.W., Jan 08 '21 at 05:29
The premise is "More powerful programming languages tend to have slower implementations." Respectfully, I can't understand why that wasn't obvious, but I've made the explicit in my answer. — Quantum Mechanic, Jan 08 '21 at 13:23
I guess the challenging part for me is understanding the connection between the bullet points and that premise. Many of the bullet points talk about "precise", "awkward", "succinct", "teaching", "easier to write in", yet none of those have anything to do with performance. Perhaps they are intended to relate to whether the language is powerful or not? Trying to say that the language is powerful in one way but not in another? If so, maybe it'd help to make that connection more explicit? I ask because this was flagged as not an answer, and I'm trying to form a position on that. — D.W., Jan 08 '21 at 20:42
Also, it seems to me that to back up your rejection of the premise, we'd need to have some evidence that more powerful languages don't tend to have slower performance, e.g., analyze more powerful languages and provide some evidence that most of them don't have slower performance. I don't see any support for that in this answer. — D.W., Jan 08 '21 at 20:44

score 1 · Answer 6 · answered Dec 14 '20 at 21:17

The phenomenon you describe as one language being more "powerful" than another one is what we call a "high-level" language vs. "low-level" languages.

But, what is the meaning of "level" in this context ? In other words, what they refer to being high/low level of ?

They refer to levels of abstraction. C/C++ is a language with low level (of abstraction). Python has a higher level (of abstraction).

The fact that high-level (of abstraction) languages are slower than low-level (of abstraction) ones is called abstraction penalty:

High-level languages intend to provide features which standardize common tasks, permit rich debugging, and maintain architectural agnosticism; while low-level languages often produce more efficient code through optimization for a specific system architecture. Abstraction penalty is the cost that high-level programming techniques pay for being unable to optimize performance or use certain hardware because they don't take advantage of certain low-level architectural resources. High-level programming exhibits features like more generic data structures and operations, run-time interpretation, and intermediate code files; which often result in execution of far more operations than necessary, higher memory consumption, and larger binary program size. For this reason, code which needs to run particularly quickly and efficiently may require the use of a lower-level language, even if a higher-level language would make the coding easier. In many cases, critical portions of a program mostly in a high-level language can be hand-coded in assembly language, leading to a much faster, more efficient, or simply reliably functioning optimised program.

References:

Pankaj Surana, Meta-compilation of language abstractions

Why do more powerful programming languages tend to have slower implementations?

6 Answers6

Physics Matter

Less (Work) is More (Performance)

Recovering performance

There is no faster language...