Why is there not a system for computer checking mathematical proofs yet (2018)?

Question

As of 2018, mathematical proofs are still being decided by human consensus. i.e. Give the proof to a few capable humans and if none of them can find any errors than they vote that the proof is correct and it can be published.

This surely is not a foolproof way of deciding mathematical truths.

One would think that there would by now be some standard way proofs could be written and computer checked for accuracy.

I read about a few computer checking systems a few years ago that hoped to do things like check Wile's proof of Fermat's last theorem or the Classification of Simple Groups Theorem. But not much more has been heard about them.

What has happened to this endeavour. Have mathematicians lost interest in having proofs computer checked?

It used to be that mathematicians would strive for formalisation of all mathematics. And yet it seems that they prefer the sloppy formalism of current mathematics which cannot be translated into computer language because much of it is ambiguous.

I would have thought that in 2018, every proof submitted to a mathematics journal would basically be written in a form that could be automatically proof checked. And conversely every unproved theorem written in a language that could be fed into a computer.

Even an amateur could then post a proof to the journal and with the press of a button could see if the proof was correct or garbage.

Do you know what the current state of this is?

The refereeing process in maths journals is as much about deciding which papers are of sufficient interest and novelty than about their correctness. — Angina Seng, Apr 21 '18 at 18:34
If you are interested in the history behind this you might want to read about the "A.I. Winter", wherein many human tasks proved to be surprisingly difficult to automate with computers. Math has turned out to be a lot more James T. Kirk and a lot less Spock than people suspected. — JonathanZ, Apr 21 '18 at 19:30
Such systems exist, but they only work for proofs where every single step is provided. Most math proofs in journals aren't written quite so formally. If they were, they would be much longer. For example, you can't just say "let X be the Smith-Jones ideal of [whatever]". You'd actually have to define it precisely, and probably define what an ideal is as well. — , Apr 21 '18 at 20:59
The title is misleading; the obvious answer is "there is such a system". The only question remains: is the system usable enough to become such a database of mathematical truths? The answer is no -- it is not. Now, there are two explanations for this: either (1) mathematicians are too cocksure of their proofs, when they should not be, and they need to just go through the effort of writing everything down; or (2) the systems are too stupid, and we need better tools first. I myself think the truth is somewhere in the middle. — Caleb Stanford, Apr 22 '18 at 00:46
Let it be known that there are many whose research agenda is entirely centered around formally verifying statements in Coq, Isabelle, and other theorem provers; sometimes, these efforts focus on mathematical theorems. Yet, it is mainly computer scientists, not mathematicians (except some category theory and logic types) who are enthusiastic about doing the gruntwork. — Caleb Stanford, Apr 22 '18 at 00:47
If you have ever taken a course in Coq you will understand why the mainstream of mathematicians have not accepted it. It is extremely tedious. Often, the proof you write in Coq has little to do with the intuitive reason you already knew the statement was true; and there is much ado about minor details. Even the most trivial of statements can end up being a long proof when entered into the theorem prover. — Caleb Stanford, Apr 22 '18 at 00:48
I must say that this question gave rise to very interesting answers, even if I used to know something (a very modest amount of something) about these matters, it has been a good while since I last thought about it. — Francesco, Apr 22 '18 at 03:36
There is a large gap between mathematical language as practiced by mathematicians and mathematical language as understood by contemporary proof assistants to be bridged; see https://mathoverflow.net/questions/155909/wanted-a-coq-for-the-working-mathematician for the case of Coq, for example. — darij grinberg, Apr 22 '18 at 03:59

score 20 · Accepted Answer · answered Apr 21 '18 at 18:18

20

Have a look into the Flyspeck Project in which the proof of Kepler's sphere-packing conjecture was checked by computer precisely to avoid the problems that Martin Argerami denies in the human review process, but are significant in problems whose solution necessarily involves computation.

answered Apr 21 '18 at 18:18

Rob Arthan

48,577

3

So, I may be missing something, but how would using computer programs to check that the computer programs were correct, improve the situation? Who's checking the checker and how? – Martin Argerami Apr 21 '18 at 18:58
14

The computer programs doing the checking have been designed so that all the logical inference is done in a small kernel that is amenable to human review. See my feedback on one of Doron Zeilberger's opinions for more detail.In summary, computer scientists have done a lot of work on trustworthy computation and mathematicians should be aware of that. – Rob Arthan Apr 21 '18 at 19:11
6

Note also that we ultimately have to rely on humans at some point, because you can't algorithmically verify the behavior of arbitrary programs. – Kevin Apr 22 '18 at 01:46
4

@Kevin: Rice's theorem is of tremendous importance in recursion theory but completely irrelevant to practical program verification. Here is one approach that allows programs to be designed and instrumented to facilitate verification.. Of course we have to rely on humans, e.g., to check that we are proving the right properties of our programs, but recursion theory is irrelevant to that. – Rob Arthan Apr 22 '18 at 09:03
@RobArthan the semantics of loops are recursive functions. Rices Theorem applies even to programs that consist only of one while-loop containing nothing but (conditional) primitive transformations. See also Kleenes Normal Form Theorem. – Algoman Apr 22 '18 at 10:45
I am fully aware of the meaning and importance of Rice's theorem, but I am talking about practical program verification! That is to say verifying programs that have been designed and instrumented to be verifiable. – Rob Arthan Apr 22 '18 at 10:48

score 16 · Answer 2 · answered Apr 21 '18 at 18:12

16

They haven't been forgotten - These systems are still being actively researched and developed. I personally know someone who is working on the Lean theorem prover. In addition you can see Coq is being actively developed.

answered Apr 21 '18 at 18:12

qwr

10,716

But they seem to be good enough at the moment but the actual process of translating all the proofs into a big database seems to have stalled. And not many mathematicians check their proofs with them. – zooby Apr 21 '18 at 18:18
1

@zooby This is because it is quite difficult and tedious. Now, the Coq people say "of course it is difficult and tedious work, but it must be done"; the mathematicians say, "it is difficult and tedious because Coq and other tools have not yet reached the level of utility that is necessary; and it is unnecessarily hard to write down a proof and verify it in Coq". Which of the two positions do you agree with? – Caleb Stanford Apr 22 '18 at 00:43
Lean sounds like a great idea, but their own FAQ says they are still in alpha phase (I admire their honesty). – darij grinberg Apr 22 '18 at 04:01
@darijgrinberg I'm not claiming Lean is ready for actual use. Just that it is in development. – qwr Apr 22 '18 at 16:37

J. Brazile · Answer 3 · 2018-04-21T20:04:47.483

11

I would suggest having a look at Freek Wiedijk's list tracking which of the "top 100 mathematical theorems" have so far been formalized in various theorem prover/theorem checker systems such as HOL, Isabelle, COQ, Mizar, Metamath, ProofPower, nqthm/ACL2, PVS, and NuPRL/MetaPRL.

http://www.cs.ru.nl/~freek/100/

And also "what might be the smallest proof checker" (500 lines of python):

https://en.wikipedia.org/wiki/Metamath#Proof_checkers

Those should give you hours (if not days) of reading material :-)

edited Apr 21 '18 at 20:04

answered Apr 21 '18 at 19:46

J. Brazile

211

3

But, please, please, please, note that Freek's list does not claim to be representative in any way: the theorems on it are just a fun selection identified by by two mathematicians as the "top 100 theorems" in some millennium event.The list is great fun and has provided inspiration for many of us who work on mechanized theorem proving. – Rob Arthan Apr 21 '18 at 21:03
1

Thanks for the list. Interesting. Well, they're getting to 19th century maths. I guess it will take some time to get to 21st century maths. The good thing is once a lot of these things are done, newer proofs can just use these old proofs as libraries! – zooby Apr 21 '18 at 22:24
For the record about whose list this is, and its arbitrariness: 'at a mathematics conference in July, 1999, Paul and Jack Abad presented their list of "The Hundred Greatest Theorems."' Source: http://pirate.shu.edu/~kahlnath/Top100.html ("The Hundred Greatest Theorems"), by Nathan W. Kahl (currently at Seton Hall University), who commented: 'The list is of course as arbitrary as [...] "Top 100" or "Best 100" lists'. Freek Wiedijk's page lists which theorems have formalizations in various systems. – MarnixKlooster ReinstateMonica Apr 25 '18 at 16:14

score 9 · Answer 4 · answered Apr 21 '18 at 18:06

9

"One would think". No, why would "one think"?

What you claim is sloppy is not so. If a proof matters to enough people, the scrutiny is deep, and there are very few cases where wrong proofs go through. In fact, for results that are important enough, it is very common that people produce many different proofs, which lead to remove any possible doubt that the results is wrong.

More importanly, the main part of the peer reviewing process is to assess the significance of the results. An automated system publishing lots and lots of correct but useless math would be detrimental to the discipline.

answered Apr 21 '18 at 18:06

Martin Argerami

205,756

9

You assume they are not wrong. But how can one possibly know if the thousands of pages of the Classification of Simple Groups or Fermat's Last Theorem doesn't contain a mistake? It would be like looking at the source code of Windows XP and knowing if it contained a bug without running it. – zooby Apr 21 '18 at 18:17
2

Tom Hales' proof of the Kepler conjecture is an example where the Annals of Mathematics concluded after several years that a review by humans was infeasible. – Rob Arthan Apr 21 '18 at 18:20
5

But we do "run it". If there is a "bug" in any of those, proofs, it would very likely be found by either all the people who have worked on the proof, or by someone finding a counterexample. And let me ask you this: who would prove that the proof checker has no bugs, and how? – Martin Argerami Apr 21 '18 at 18:21
8

First you make the proof checker code very simple, so as to be easily checked by humans. Then translate the statement of the theorem into the computer language. (This has to be checked but is much simpler than checking a 1000 page proof). Then enter the proof to be checked. Then we can say with almost 100% certain that the theorem (as entered) is either proved or not by the supposed proof. Secondly you can "prove" proof checkers by putting them in another proof checker (or themselves)! – zooby Apr 21 '18 at 18:25
Why would having lots of correct proofs be "detrimental to the discipline"? Also, no-one said journals can't choose what to put in their journals. But at least they know it would be correct. – zooby Apr 21 '18 at 18:30
8

Do you have any experience with writing software? Writing software without bugs is extremely hard (I would say impossible, unless you are writing a "Hello World" program), and one decides that the software has close to no bugs by running it, not by looking carefully at the code. Besdies, how do you know that it is at all possible to write a general proof checker with "very simple" code. And who would translage the 1000 page proof into language understandable by the proof checker, and how would you avoid bugs in that process? – Martin Argerami Apr 21 '18 at 18:51
1

As for "detrimental", math is not about constructing as many correct proofs as possible. It is about developing new ideas that somehow bring more clarity and new perspectives. The correctness of proofs is a step, not the main goal. – Martin Argerami Apr 21 '18 at 18:52
4

@MartinArgerami: writing software without bugs is hard. But designing sofrware that it resilient to bugs and mitigates their presence has been a hot research topic for decades and computer science has made a lot of progress on it. I think you should study some of the literature. I agree with you about the development of new ideas, but in areas where there is reasonable cause for doubt about the correctness of proofs, it makes sense for mathematicians to know something about what has been done to provide trustworthy computation. – Rob Arthan Apr 21 '18 at 19:24
8

@MartinArgerami It is very well possible to have a reliable proof checker, by (1) having a very simple kernel of the basic proof steps that are allowed, and (2) implementing the same simple kernel multiple times. It is possible to have a reliable translation by having a few axioms and definitions which directly match the standard versions from the literature. For example, see Metamath: a 4-page specification, a dozen cleanroom implementations. And in particular, you'd be interested in the Metamath book section 1.2. – MarnixKlooster ReinstateMonica Apr 21 '18 at 19:57
I don't follow the logic in the last paragraph. Papers must be judged both for correctness and significance. If someone uses an automated tool to assess the correctness of a paper, how does that imply they will automatically assume the paper is significant too? – David K Apr 21 '18 at 23:45
1

"It would be like looking at the source code of Windows XP and knowing if it contained a bug without running it." Though surely mistakes are often made when constructing proofs, I think that this analogy is very far from the truth. It is harder to look at computer code and tell if there is a bug than it is to look at a mathematical proof and find a flaw. See this question on CS for a discussion. – wgrenard Apr 22 '18 at 06:56
@MartinArgerami Writing software without bugs is possible, because it is abstracted into a proof which can actually be verified by computer (so-called formal verification). Some software like miTLS, seL4, INTEGRITY, and even Microsoft's HTTP.sys is partially or fully formally verified. And yes, it is very difficult (something crazy like $10k per line of code on average, to the point that it vastly exceeds the cost of EAL7+ verification). Also, even a hello world program can have bugs if the compiler or standard library are buggy! – forest Apr 22 '18 at 07:26

score 7 · Answer 5 · answered Apr 22 '18 at 03:22

The practical value is not necessarily high enough to warrant the cost.

An automated theorem proving tool can help identify flawed algorithms, but it can't actually prove valid ones. The automated proving tool can generate a new proof which proves the old one is invalid (which may indeed be a simpler task), but it can't generate a proof that the old proof was correct.

The reason for this is the bootstrapping process. J. Brazile's answer provides an excellent context to describe this issue. In that answer, mmverify.py is referenced, which is a proof checker in 500 lines of Python. Surely it is trivial to prove that 500 lines of Python are correct, but how do we execute the python? The official implementation of Python, CPython, is over 100,000 lines of code. An error in one of them could cause the proof checker to erroneously produce a "valid" result to an invalid proof.

Likewise, CPython depends upon your operating system. Linux is hard to measure, but a reasonable estimate would be about 1.5 million lines of code. In theory, one bad line in that OS could cause bad results.

Indeed, we see this with L4, a Microkernel whose claim to fame is the fact that it has been verified as "correct" by Coq. However, their wording is very precise. The claim they make is that "If L4 is compiled with a standards conforming compiler, then it will behave as the documentation says it will." Compilers are hard to write. Tremendously hard.

And that doesn't even include issues like the infamous FDIV bug. There is absolutely no guarantee that the hardware implements things correctly. Granted they do an amazingly good job, but it's not mathematically provably perfect.

So, in the end, you can indeed improve the apparent validity of your proof by using a proof checking tool. However, the tool cannot make your proof correct. It may help during the process to help you find things that you miss, but once it's "right," then it still might not actually be right.

The question then becomes how valuable is this extra authority? It can't give you perfect authority, due to these issues, and there is a cost, as Nonyme mentioned in terms of readability. As it happens, it appears the mathematical community at large has not yet found that cost to be a reasonable one to expect from mathematicians.

... and to go even further: let's say that we manage to implement our entire stack properly, from hardware up through all languages and to UI. (And note that to even claim that we did so "properly" requires an objective language for software APIs.) Even then, there's the possibility that the Universe says no! — Quelklef, Sep 24 '21 at 07:30

score 5 · Answer 6 · answered Apr 21 '18 at 18:34

5

This is really a question calling for opinion so here's my opinion.

Mathematics is a human endeavor. Mathematicians work to understand some interesting and useful abstract notions. You can even start a philosophical discussion about whether mathematical objects "exist". Whether they are interesting or useful is clearly a matter of opinion.

We reason as carefully as we can to convince each other that our theorems are interesting and true (in some usually implicit but unspecified formal sense). The best proofs tell readers why a theorem is true, not just that it's true. It's that kind of understanding that leads to new mathematics.

The foundations on which we build our inferences are always under construction even as we build. What satisfied Archimedes or Euler or Cauchy wouldn't do today. Godel proved that Hilbert's hope that we could establish rigor once for all was vain. Sufficient unto the day is the rigor thereof.

Computer (assisted) proof is an interesting and active part of contemporary mathematics, but will never replace humans reasoning mathematically. How would you know (prove?) that the engine in your computer prover was properly programmed, and that you properly programmed the input you fed it? With another program ...?

answered Apr 21 '18 at 18:34

Ethan Bolker

95,224
7
108
199

1

You would state your proof like this: "In this proof we assume X logical system and Y axioms." So every proof will assume some axioms. It is far far far simpler to check a proof checker since it is usually built on about 10 easily check-able axioms. All the rest of the code (the higher level formalisms) can be checked with the same proof checker. It's not as complicated as you make out. – zooby Apr 21 '18 at 18:38
1

@zooby You've addressed the last paragraph of my answer. But the first paragraphs are what matter to me. I wouldn't want to spend any time putting the proofs in my papers into a formal system for checking. I'd rather trust the referees, and spend my time doing more mathematics. – Ethan Bolker Apr 21 '18 at 18:41
Ah I think you're assuming that a computer readable proof will be unreadable to humans. This is not the case with most proof checkers. They will look very similar to normal proofs. So they will convey "why" something is true just as much as a traditional proof. Also, one could add comments to a proof if one wished which can be ignored by a computer. – zooby Apr 21 '18 at 18:46
@EthanBolker: sure, how you go about your mathematical work is your prerogative and I respect that. But your comment about how one would know or prove that a program used to verify a proof is trustworthy a question for computer science and a lot of progress on it has been made over the last 50 years. Please see the links in my answer to this question and my other comments. – Rob Arthan Apr 21 '18 at 20:22

score 4 · Answer 7 · answered Apr 21 '18 at 23:20

For the same reasons in Computer Science not all papers have a clear-cut implementation provided. There is more to research than the end result.

In Mathematics, human proofs take a lot of shortcuts. Those are usually small shortcuts, but if every single step of the way you had to explicit every single statement into an unambiguous equation, you would make your research unreadable and the process of creating new research would become really tedious. There are lot of Ph.D. that do so, for a big part of their research, but only once they have a clean "usual" proof on paper.

A proof of a theorem in a theorem prover is ideal, but a lot of work. The same can be said about Computer Science algorithms, where they are usually implemented into a barely working script that needs a lot of elbow grease to work. But because the idea behind works, and can be useful to build upon, it gets published, even though no final product can be shipped.

score 1 · Answer 8 · answered Apr 22 '18 at 08:06

Mathematical proofs are based completely on assumptions and assuming 'n' variables and taking diffrent constants in it. The problems with creating a system for computer checking mathematical proofs are:

Most written proofs have not been defined precisely and are derived from other proofs that we are assumed to understand. Creating a backlog for a proof will be a tedious process and everything is to stacked up. A proof takes certain conditions in every stage without considering the complete enviroment and takes everything part by part and not as a whole. Again, awaring the system of these sub-cases can be a tedious process. well, proof checkers are being made but most they can do is to validate the results derived in the proof by placing the conditions to the highest possible value of it. Having a System would be a good thing but we cannot validate the conditions itself are suitable enough or not for the theorems or the proofs on the grounds of which they are being tested. On a personal note, I think it completely destroys the idea of societies and groups formed for establishing the mathematical proofs and results. (Automation taking it's toll, eh.)

Why is there not a system for computer checking mathematical proofs yet (2018)?

8 Answers8

Linked