Do theorem provers demonstrate their own correctness?

Question

I am not very well-versed in the world of theorem proving, much less automated theorem proving, so please correct me if anything I say or assume in my question is wrong.

Basically, my question is: are automated theorem provers themselves ever formally proven to work with another theorem prover, or is there just an underlying assumption that any theorem prover was just implemented really really well, extensively tested & reviewed, etc. and so it "must work"?

If so, does there always remain some underlying doubt in any proof proven by a formally verified automated theorem prover, as the formal verification of that theorem prover still lies on assuming that the non-formally verified theorem prover was correct in its verification of the former theorem prover, even if it might technically be wrong - as it was not formally verified itself? (That is a mouthful of a question, apologies.)

I am thinking of this question in much the same vein as bootstrapping compilers.

This is indeed interesting and important - I'm not nearly knowledgeable to give an answer here, but you might find J. C. Davis' Ph.D. thesis interesting. — Noah Schweber, Jan 21 '20 at 16:39
If a theorem prover can prove its own consistency, doesn't that mean it is inconsistent? — yters, Jan 22 '20 at 18:30
@yters I'd guess it can at least prove its own relative consistency. — senderle, Jan 23 '20 at 02:29
@yters: That's why the question isn't really well-defined. After all, there is no way to formalize the correctness of a theorem prover build in the real world since the most we can do is to express that notion in some syntactic form that intuitively seems to correspond to what we want it to mean. And then of course under mild assumptions, no reasonable theorem prover can prove its own consistency, much less arithmetical soundness (which is one precisely specifiable aspect of correctness), unless it proves $0=1$. I'm not sure why none of the existing answers touched on this aspect. — user21820, Jan 24 '20 at 08:10
You cannot prove a proof language's strong normalization ("every program terminates with a unique result") property in itself unless it is in fact inconsistent. Or unless the proof language is so weak that it does not even allow you to encode Peano arithmetic... — xuq01, Jan 24 '20 at 17:35

score 46 · Accepted Answer · edited Nov 30 '22 at 19:18

46

I recommend reading Pollack's How to believe a machine-checked proof. It explains how proof assistants are designed to minimize the amount of critical code.

There are many levels of formal verification (that's the phrase you're looking for in place of "proven") of a proof assistant:

Verify that the algorithms used by the proof assistant are correct.
Verify that the implementation of (the critical core of) the proof assistant is correct.
Verify that the compiler for the language in which the proof assistant is implemented is correctly designed and implemented.
Verify that the hardware on which the proof assistant runs is correctly designed and built.
Compute the probability that a cosmic ray passes through the CPU and tricks your proof assistant every time you run it.
Estimate the likelihood that you are insane.

People put serious effort into these (well, at least the first four). For example, steps 1 and 2 are addressed in Coq Coq Correct!, and steps 3 and 4 in the amazing award-winning CompCert project.

edited Nov 30 '22 at 19:18

Taimoor Zaeem

103
3

answered Jan 21 '20 at 16:45

Andrej Bauer

30,396
1
70
117

10
Estimate the likelihood that you are part of a simulation by very powerful beings that are living in a finite world but are simulating us in an apparently infinite world, and so there is no real model of PA but we cannot ever figure that out because the simulation tricks us.

user21820

Jan 22 '20 at 10:16

3

"Coq Coq Correct!" is the greatest title I've seen in a long while. – Édouard Jan 22 '20 at 12:23

@user21820: I worry about your personal value of point 6 above. – Andrej Bauer Jan 22 '20 at 12:42

2

I do feel that swapping 5 and 6 probably leads to a more sane approach to proving proof systems =) – Cort Ammon Jan 22 '20 at 15:15

@AndrejBauer: Heh. I was joking in my comment. But more seriously, we don't have any evidence that there is a real-world model of PA, due to the impossibility of performing 100% accurate computations involving arbitrary-length strings. Assuming PA sure works well in terms of producing theorems that appear to be true at human-testable scales, but 'unsoundness' of PA at untestable scales is far more likely than cosmic ray trickery and my insanity. =) – user21820 Jan 22 '20 at 16:57

1

The Intro to Isabelle slides (pdf), discuss "If I Prove It on the Computer, It Is Correct, Right?" Which is more about validity of a proof rather than the theorem prover, but I think probably still worth mentioning the two points "logic could be inconsistent" and "theorem could mean something else" (page 17). – BurnsBA Jan 22 '20 at 17:42

@user21820: you have a strange notion of "real-world model of PA". Are you imagining an infinite string of beads somewhere, that somehow we can comprehend all at once to consitute a model of PA? – Andrej Bauer Jan 22 '20 at 22:12

@AndrejBauer: No I am not. Having a real-world model of PA means that we have a conceptual type whose members are potentially constructible entities in the real world, that is closed under operations addition and multiplication, such that it (with those operations) satisfies the axioms of PA. Equivalently, we need a conceptual type of strings that is closed under concatenation and satisfies TC. No such thing is known to exist. We can imagine binary strings stored in extendible electronic memory, but it would fail at extremely large string lengths. – user21820 Jan 23 '20 at 02:49

We're deeply off topic here. – Andrej Bauer Jan 23 '20 at 07:13

No it's not off-topic; it's directly relevant to your point (1). I don't think it is fair to give much weight to point (5) if one cannot even have comparable confidence of soundness of the underlying formal system. – user21820 Jan 23 '20 at 17:47

Nobody is discussing whether we can verify every theorem, but rather how to trust a proof assistant when it confirms that a specific given proof is valid. If the proof assisant runs out of resources, it crashes. Therefore, we will always detect the fact that our idelized mathematical model (which presupposes unlimited resources) is is out of sync with reality. In other words, we do not need to know whether there "really is a model of $\mathbb{N}$ in our universe" in order to trust the outputs of a proof assistant. – Andrej Bauer Jan 23 '20 at 18:10

And in any case, if you're going to worry about something, you could start with worrying about the computational complexity of the algorithms that run inside a proof assistant, as those will present an actual bottle neck. Philosophical worries about storing infinitely many beads into the universe are far down on the list of things that worry people who actually implement proof assistants, rather than just philosophise about them. – Andrej Bauer Jan 23 '20 at 18:14

It occurred to me that @user21820 may have misunderstood my point 1. I did not meant by that that we need to show soundness of PA or other foundational theories, but rather that the underlying algorithms (for example, normalization procedures) for the type theory implemented in the proof assistant are correct. – Andrej Bauer Jan 24 '20 at 06:59

1

@AndrejBauer: Based on your last comment, yes... you should edit your post to clarify, because for most logicians "underlying formal theory" would mean "underlying foundational system"... And of course I agree that the algorithmic issues far outweigh potential unsoundness in the foundational system. (Though it would be nice if you don't keep misconstruing my point as being about "infinitely many beads".) Thanks! – user21820 Jan 24 '20 at 08:03

Ok, I fixed it. I hope that clarifies things a bit. Regarding infinitiely many beads, I honestly never understood what people meant when they wanted a "real-world model of PA". It just seems such a misguided idea that I might as well imagine an infinite stream of beads. – Andrej Bauer Jan 24 '20 at 13:25

score 10 · Answer 2 · answered Jan 21 '20 at 16:42

What you need is the idea of "the trusted core". Quoting "A verified runtime for a verified theorem prover":

In many theorem provers, the trusted core—the code that must be right to ensure faithfulness—is quite small. As examples, HOL Light is an LCF-style system whose trusted core is 400 lines of Objective Caml, and Milawa is a Boyer-Moore style prover whose trusted core is 2,000 lines of Common Lisp. These cores are so simple we may be able to prove their faithfulness socially, or perhaps even mechanically as Harrison did for HOL Light.

Once you assure yourself the core is correct, it can be used to verify the correctness of some extensions, and then those extensions can be used...

score 6 · Answer 3 · answered Jan 24 '20 at 04:10

While this may trend close to self-advertisement, this is essentially the topic of my recent paper Metamath Zero: The Cartesian Theorem Prover (video), and the analogy with bootstrapping compilers is spot on. The introduction of the paper lays out what is needed to make this happen, and it's only a problem of engineering.

As Andrej says, there are several components that go into a "full bootstrap", and while many of the parts have been done separately, the theorem provers that are used by the community are only correct in the sense that linux is correct: we've used it for a while and there are no bugs we have found so far, except the bugs that we did find and fix.

The issue, as ever, is that because the cost of producing verified software is high, verified programs tend to be simplistic or simplified from the "real thing", and so there remains a gap between what people use and what people prove theorems about. A "small trusted kernel" setup is necessary but not sufficient, because unless you have a full formal specification (with proofs) of the programming language, the untrusted part can still interfere with the trusted part, and even if the barrier is air-tight, you have communication problems when the untrusted part is in control - for example, the kernel may flag an error that the untrusted part ignores, or the kernel may never be shown some apparent assertion at the source level.

The good news is that projects of this scale have become feasible in the past few years, so I am very hopeful that we can get production quality theorem provers with verified backends soon-ish.

The CakeML project should get a mention here: they have a ML compiler verified in HOL4, that is capable of running a HOL verifier written in ML. (Unfortunately HOL4 is more than just its logical system, so there is some work to be done to make this realistic.)
The Milawa kernel is written in a subset of ACL2, with a bunch of bootstrapping stages before closing the loop (being able to prove theorems about this same ACL2 subset). This is the only actual theorem prover bootstrap I know, but it doesn't go down to machine code, it stays at the level of Lisp, and from what I understand it's not actually performant enough for production work. It has since been verified down to machine code, but that part of the proof was done in HOL4 so it's not actually bootstrapping at the machine code level.
Coq recently made some strides towards this with Coq Coq Correct!, but it doesn't cover the full Coq kernel (including recent additions such as SProp, and the module system). (Aside, if there are any Coq experts reading this: if you know any place where the complete formal system implemented by Coq is written down, I would really like to see it. Formalized is nice but informal might even be better, as long as it is complete and precise.) You also can't connect it to CompCert as Andrej suggested, because the typechecking algorithm is only described abstractly, and certainly not in C, which is what CompCert expects as input.
Metamath Zero is still under construction but the goal is to prove correctness down to the machine code level, inside its own logic. (I also can't help but mention it's currently about 8 orders of magnitude faster than "the other guys" but we'll see if that holds up until the end of the project.)

@user76284 That's what I keep telling people! It's the difference between ~18 hours to check CakeML and 200ms for MM0 to check set.mm (which is not a proof of correctness of a theorem prover so it's not directly comparable, but it is an entire library of ZFC set theory, and predicted to be larger by at least an order of magnitude from the actual compiler proof). The project work so far comes in at about 2ms. — Mario Carneiro, Jan 27 '20 at 06:42

score 3 · Answer 4 · answered Jan 24 '20 at 09:00

3

In case you are not aware, there is a possibility that a theorem prover is implemented 100% correctly and run on a faultless machine and proves itself arithmetically inconsistent, even though it is not. The other existing answers focus on the issue of verifying that the theorem prover runs exactly as it was designed to run, but do not address this aspect of "correctness".

More precisely, every theorem prover is based on some underlying foundational system $S$. For $S$ to be considered a reasonable foundational system, $S$ must interpret arithmetic, meaning that there is a computable translation $ι$ of sentences over PA to sentences in the language of $S$ such that for every arithmetical sentence $Q$ we have that if PA proves $Q$ then $S$ proves $ι(Q)$. Godel's incompleteness theorem can be easily extended to any such (computable) foundational system $S$, and states the following: $ \def\con{\text{Con}} $

If $S$ does not prove $ι(0=1)$, then $S$ does not prove $ι(\con(S))$.

Here $\con(S)$ is the arithmetical sentence that encodes the (arithmetical) consistency of $S$. Namely, $S$ does not prove $ι(0=1)$ iff $\mathbb{N} ⊨ \con(S)$.

So first of all, this implies that $S$ cannot possibly prove its own consistency unless it is inconsistent. In this sense it is impossible for any theorem prover to prove that itself is correct. It may be able to prove that its execution on a particular machine satisfies certain specifications (as mentioned in other answers), but that is far from proving correctness of the theorems that it proves. In other words, anyone who wants confidence in a theorem prover's outputs still must examine those specifications and be somehow convinced that they ensure correctness of the outputs. That part can never be covered by the theorem prover itself.

Secondly, if $S$ is deductively closed for arithmetical sentences, meaning that if $S$ proves $ι(Q)$ and $ι(Q⇒R)$ then $S$ also proves $ι(R)$, then let $A(S)$ be the arithmetical theorems of $S$, namely all arithmetical sentences $Q$ such that $S$ proves $ι(Q)$, and let $S'$ be a formal system that proves the same theorems of $S$ plus $ι(R)$ for every arithmetical sentence $R$ such that $A(S)+¬\con(S)$ proves $R$. Then $S'$ is still deductively closed for arithmetical sentences, but $S'$ proves $ι(¬\con(S))$ and hence $ι(¬\con(S'))$.

Note that $S'$ proves everything that $S$ does, and if $S$ is consistent then $S'$ is also consistent but proves that itself is inconsistent. Now how can we know that the foundational system we started with was not like such an $S'$? We cannot know, without assumptions beyond $S$. Simply put, it is possible that a theorem prover proves that it proves nonsense including $0=1$, even though it does not!

To summarize:

Any reasonable theorem prover cannot prove that itself produces only consistent outputs.
One cannot determine without assumptions beyond those coded into a theorem prover that it does not produce false theorems.

I also wish to say a bit about verification according to specifications. This is necessarily conditional; all you can prove is that, if the theorem prover is run on a machine that satisfies certain properties, then the theorem prover produces outputs that satisfy certain specifications. Can you know for sure whether the machine on which you run the theorem prover actually satisfies those properties? No. At some point, you are just going to have to trust or believe something that you cannot prove.

answered Jan 24 '20 at 09:00

user21820

707
7
13

Regarding the Godel's theorem argument, I think it is a source of confusion for many whenever you talk about a self-verifying system, and phrasing it negatively like this (in the sense "it's been proved that the task is impossible") is driving the wrong point. It certainly well is possible to give formal mathematical statements that, to a human, convince them of correctness of the verifier. For example, if I'm working in PA, then I can prove (in PA) that if verifier $X$ proves $\varphi$, then $PA\vdash \varphi$. ... – Mario Carneiro Jan 25 '20 at 00:58
I could also prove that $\varphi$ is true for any particular $\varphi$ (because PA proves all its finite fragments), or we could go to ZF and prove $\varphi$ there (because ZF models PA). If you look at any of these theorems from an informal outside perspective, one where theories much stronger than ZF are used routinely, these are clearly clearing the bar of "acceptable truth", and they are entirely formal statements. – Mario Carneiro Jan 25 '20 at 01:01
@MarioCarneiro: It is not a source of confusion, because this kind of goal is exactly what Hilbert wanted until Godel cleared that up. And it seems to me that many people (excluding people who already know the incompleteness theorems) do not understand that it is truly impossible to verify consistency within the same system. And I don't see your point about using ZF to prove consistency (or soundness) of PA, because it just hides the issue under the infinite regression carpet. Whatever system S underlies the theorem prover, you can obviously prove it 'correct' in S + (S is sound), but not S... – user21820 Jan 25 '20 at 01:41
After all, the title of the question here is "Do theorem proves demonstrate their own correctness?" Furthermore, adding consistency statements or moving to a stronger system cannot increase your confidence in the original one. By the way, "PA proves all its finite fragments" is trivial, so it seems you typed wrongly; PA proves consistency of every finite fragment. Still, that fact cannot be observed within PA itself, which is the whole point of being unable to verify self correctness. Even if the asker knows all this, I'm sure many other readers don't. – user21820 Jan 25 '20 at 02:11
Regarding the titular question, I think the problem depends on what "correctness" actually means. Of a theorem prover implementation, I would distinguish between "correctness" meaning that it checks theorems according to the intended formal system, and "soundness" meaning that the theorems so validated are actually true. Only the latter property is unprovable in the same axiom system. – Mario Carneiro Jan 25 '20 at 09:36
Whatever system S underlies the theorem prover, you can obviously prove it 'correct' in S + (S is sound), but not S... So? I would take that proof any day over no proof at all, and arguing that because S can't prove S is consistent the entire enterprise is pointless is actively harmful. In practice, there are real concerns about bugs in big programs that we have to worry about, and there are a variety of ways to modulate the statements in question so that you get something almost as good as a proof of consistency (for example, a proof of soundness in PA+Con(PA)). – Mario Carneiro Jan 25 '20 at 09:40
@MarioCarneiro: As I said in my last comment, you may read "correctness" in a certain way, but it is very clear that most readers do not understand the issues I stated in my answer. Furthermore, suppose you use your theorem prover to prove that your theorem prover only outputs theorems of some specific formal system. That still is just some symbolic string that you have to read and interpret and see whether it says what you want it to say. There is no verified connection between the theorem prover output and the fact of 'implementation correctness' that you want. – user21820 Jan 25 '20 at 09:44
I'm not saying, and never ever said, that everything is pointless given the incompleteness theorems. But one must be actually aware of the results so that one does not have an incorrect (and actively misleading) view about what one can (or has) achieved. As for my comment about correctness provable in S + ( S is sound ), it is to emphasize the point that we can prove anything by adopting the conclusion as an assumption. It is also known as "begging the question". – user21820 Jan 25 '20 at 09:47
Regarding the last point about connecting the text in the computer to math in the mind of the reader, you are absolutely correct that this is unverified, I don't dispute it. There is nothing we can do, save making the text as clear and unambiguous as possible, and educating the reader on intended semantics. – Mario Carneiro Jan 25 '20 at 09:47
Right; that's why I think it is important for people to know all this, rather than to just avoid talking about it just because all the logicians already know this. – user21820 Jan 25 '20 at 09:48
Fair enough. A logical analysis of anything to do with self verification must take into account the limitations imposed by Godel's theorems. My point is that it's not actually that hard to sidestep it and get something that is approximately as good as self verification in the colloquial sense. – Mario Carneiro Jan 25 '20 at 09:50
PA proves consistency of every finite fragment. Still, that fact cannot be observed within PA itself, which is the whole point of being unable to verify self correctness. It is observable within the system itself, because I don't care about all proofs, I care about some particular proof of some particular statement with some particular finite number of quantifiers. For example, say I run some proof of the Feit-Thompson theorem in the verifier, and it says yes. Does that mean the theorem is true? ... – Mario Carneiro Jan 25 '20 at 09:52
Let's say it was proven in PA; then I can look at the proof and see that it uses only 17 quantifiers. Then I can prove (in PA) that every statement provable in less than 18 quantifiers (that is, the subsystem $I\Sigma_{17}$) is true, and conclude that F-T is true. – Mario Carneiro Jan 25 '20 at 09:54
@MarioCarneiro: Let us continue this discussion in chat. – user21820 Jan 25 '20 at 10:05

score 0 · Answer 5 · answered Jan 23 '20 at 15:35

This depends on the underlying axioms of the theorem prover. If a prover is based on primitive set theory, it will have few axioms. It can also have optional axioms such as the axiom of choice, which may or may not be useful for higher level proofs. From these primitive axioms and theorems such as Peano arithmetic, we can build up the familiar algebra of ordinary and complex and floating point arithmetic. A further feature is the use of abtract types to represent theorems. If you have not proved a theorem, you cannot make use of it because that type will not exist yet. A further level of complexity is to write the core of the prover inside the prover itself. This may be used to give confidence that the prover written in itself satisfies the same proof traces as the prover running on a general purpose computer.

Do theorem provers demonstrate their own correctness?

5 Answers5