What is the motivation for the axioms for Propositional Calculus in Mendelson's "Introduction to Mathematical Logic"?

Question

On pp. 26-27 of his Introduction to Mathematical Logic (5th edition), Elliott Mendelson writes:

If $\mathscr{B}$, $\mathscr{C}$, and $\mathscr{D}$ are wfs of $\mathrm{L}$, then the following are axioms of $\mathrm{L}$:
(A1) $(\mathscr{B} \Rightarrow (\mathscr{C} \Rightarrow \mathscr{B}))$
(A2) $((\mathscr{B} \Rightarrow (\mathscr{C} \Rightarrow \mathscr{D})) \Rightarrow ((\mathscr{B}\Rightarrow\mathscr{C})\Rightarrow(\mathscr{B}\Rightarrow\mathscr{D})))$
(A3) $(((\neg\mathscr{C})\Rightarrow(\neg\mathscr{B}))\Rightarrow(((\neg\mathscr{C})\Rightarrow\mathscr{B})\Rightarrow \mathscr{C}))$

In my prior experience, axioms have always been chosen so as to closely mirror our intuitions. E.g. "things that are equal to the same thing are also equal to each other."

In contrast, I find the axioms quoted above completely opaque.

I fail to see the purpose of grounding a theory on such opaque statements. Sure, one can derive "results" from them, but who cares? In other words, what makes the results derived from axioms A1, A2, and A3 above more worthy of our interest than results derived from some other random set of equally obscure axioms?

EDIT: Thank you for all the answers and comments! They have indeed been extremely instructive. I now have a better idea of what is tripping me up. It all boils down to the conflict I perceive between the word "axiom", as I have come to understand it from my prior experience (e.g. with Euclid's axioms or Peano's axioms), and Mendelson's A1, A2, and A3.

More specifically, the "axioms" that Mendelson proposes lack two "defining features" of my prior understanding of the word "axiom."

The first of these is that axioms are supposed to codify "plain common sense." No special training is required to accept them as true.

The second defining feature of my prior idea of an "axiom" is that it cannot be justified by appeal to even more obviously true facts. An "axiom" is not only obviously true, but also irreducibly so.

After reading your comments and answers, I think that I can come up with a reasonable explanation for the first discrepancy. It goes something like this: in this field the word "axiom" is chosen only (or primarily) for its connotation as "starting point". All other traditional connotations of the word "axiom" (such as "being commonsensical") are left out. In this new usage, "axiom" is, basically, a "term of art". It is suggestive, in the same way that words like "expression," "clause," and "statement" are when used to describe a computer "language". The meanings of all these technical words have only a distant resemblance to those of their natural language namesakes.

I have a harder time disposing of the second issue. What still confuses me is that, in discussions of these axioms, I often come across appeals to using truth tables as a way to convince oneself that these axioms are true. This suggests to me that these "axioms" may be true, but they are certainly not fundamental, since we are justifying them by appeal to something else, namely whatever we appeal to when we use truth tables. More specifically, it appears as though the real axioms here (in the sense of "rock-bottom principles") are the truth tables for $\neg$ and $\Rightarrow$.

Take a look at this: https://math.stackexchange.com/questions/320437/how-to-demystify-the-axioms-of-propositional-logic?rq=1 (esp. Andreas Blass’ answer) — Vivaan Daga, Mar 30 '22 at 19:16
Following @VoiletFlame 's suggestion to the wiki about that was useful for me, https://en.wikipedia.org/wiki/Deduction_theorem#Conversion_from_proof_using_the_deduction_meta-theorem_to_axiomatic_proof — Alan, Mar 30 '22 at 19:25
Mendelson is probably assuming that you’re experienced enough in logic that these axioms do not strike you as completely opaque. It’s possible that this book is a step above the ideal level for you at the moment. — Kevin Carlson, Mar 30 '22 at 22:58
It is not an issue of "clear vs opaque": the issue is to find a SOUND & COMPLETE (in the technical sense) set of axioms/rules. Some approach prefer a minimal set of connectives (Frege, Russell, Mendelson), while other prefer to approach every connective separately (Hilbert-Bernays, Natural Deduction). — Mauro ALLEGRANZA, Mar 31 '22 at 09:12
@KevinArlin's comment is ultimately the right answer. To make it clear why, note that it is easy to understand Fitch-style systems, and it is also easy to translate from Fitch-style to Hilbert-style. Once you view Hilbert-style axioms in terms of their Fitch-style meaning, then all these axioms become trivially obvious. (A1) is restatement in a subcontext and (A2) is modus ponens in a subcontext and (A3) is classical contradiction elimination. However, Hilbert-style is good for studying logic but not for using logic. — user21820, Mar 31 '22 at 16:06
Mendelson's text, like many others, is about studying logic, not using logic, and so the primary motivation for the deductive system given in such texts is to make it easiest to study them, and hence there is little to no concern to make the system user-friendly. In fact, Hilbert-systems as given in such texts simply cannot be used by humans or even computer proof assistants. So the point is, if you truly want to understand Hilbert-style systems and how their axioms are devised, you would have to first already know how to use logic, such as via Fitch-style reasoning. — user21820, Mar 31 '22 at 16:17
But after devising these axioms based on Fitch-style reasoning, "what makes the results derived from axioms A1, A2, and A3 above more worthy of our interest"? Because they are sufficient to guarantee semantic-completeness (for PL). Similarly, there is an incredible variety of deductive systems for FOL, but invariably each one given in any logic text will be semantically-complete for FOL. — user21820, Mar 31 '22 at 16:21
The edit has a very 19th-century approach to axioms. If you know the definition of a topological space, then you know that axioms in modern math definitely don’t need to be self-evident. This is true even for the axioms of a group, insofar as “a group” is in itself not a self-evident concept. (If you don’t know these definitions, then you almost certainly aren’t as experienced as Mendelson is going to expect.) — Kevin Carlson, Mar 31 '22 at 16:37
@user21820 "In fact, Hilbert-systems as given in such texts simply cannot be used by humans or even computer proof assistants." This is just wrong. Such systems were not only the first used by humans, they were the first used by computers for proofs in propositional calculus, and one such study inspired the LISP programming language following Newell and Simon's work. — Doug Spoonwood, Mar 31 '22 at 19:03
This question also doesn't have an answer in the link, since the question concerns this particular axiom set. The linked answer concerns a distinct axiom set. — Doug Spoonwood, Mar 31 '22 at 19:10
@DougSpoonwood: When I said "cannot be used", I mean "cannot be used for real full-scale mathematics", which is what I was talking about. And regarding the linked post, anyone who understands the contents of that post would also trivially be able to apply the technique to any reasonable target Hilbert-style system. — user21820, Apr 01 '22 at 07:40

score 11 · Accepted Answer · answered Mar 31 '22 at 02:12

I always think of the first axiom as a kind of Conditionalization:

$P$

$\therefore Q \to P$

Conditionalization allows you, in effect, to bring results inside a certain context. That is, once we know that $P$ is true, then within the context of $Q$, we still have $P$.

Now, instead of as an inference, we write this as a single statement $P \to (Q \to P)$, but note that together with the Modus Ponens (MP) inference rule that almost all these kinds of axiom systems have, you can of course now make exactly the above inference:

$P$

$P \to (Q \to P)$ (axiom)

$\therefore Q \to P$ (MP)

The second axiom is a kind of Conditionalized Modus Ponens. In effect, it points out that we can still do Modus Ponens inside a certain context ($P$). That is, the Conditionalized version of Modus Ponens:

$Q \to R$

$Q$

$\therefore R$

becomes:

$P \to (Q \to R)$

$P \to Q$

$\therefore P \to R$

And again, we can capture this with a single statement: $((P \to (Q \to R) \land (P \to Q)) \to (P \to R)$ which by Exportation is equivalent to:

$((P \to (Q \to R) \to ((P \to Q) \to (P \to R))$

(and you really want the latter, since you want to express everything with $\to$'s and $\neg$'s)

Because of these two properties, we can prove the Deduction Theorem as a result of these two axioms:

If $\Gamma, \phi \vdash \psi$, then $\Gamma \vdash \phi \to \psi$

That is: If we can derive $\psi$ from $\Gamma$ within the further context of $\phi$, then we can derive the conditionalized version of $\psi$, i.e. $\phi \to \psi$ from $\Gamma$ alone.

This important meta-logical result is why you so often see these first two axioms included in the various axiom systems.

Now, the third axiom is, by itself, probably the most understandable one: it expresses the Reductio Ad Absurdum: we can prove $P$, if we can show that its opposite $\neg P$ leads to a contradiction (i.e. if $\neg P$ leads to both $Q$ and $\neg Q$ for some statement $Q$:

$\neg P \to Q$

$\neg P \to \neg Q$

$\therefore P$

And once again, we can capture this with a single statement (reversing the two premises): $((\neg P \to \neg Q) \land (\neg P \to Q)) \to P$ which (again by Exportation) is equivalent to:

$(\neg P \to \neg Q) \to ((\neg P \to Q) \to P)$

Now, what's cool about this is that together, these three axioms actually become a complete system, as I am sure other Answers will point out. But I thought I would try to provide you with a more conceptual understanding of the axioms themselves, which is I believe you were really asking for.

But why A2 ... in Polish CCbCcdCCbcCcd... instead of A2' CCbcCCbCcdCbd? Anything which has {A1, A2'} as a subset of the axioms or as a subset of the theorems also has a deduction metatheorem, and the proof is no harder than if we had {A1, A2}! And why A3 ... CCNcNbCCNcbc ... instead of A3' CCNcbCCNcNbc? A3' also expresses reductio ad absurdum! — Doug Spoonwood, Mar 31 '22 at 05:39
@Bram28: I just made some comments on the question along similar lines as your answer, so feel free to include any bits from them into your answer! — user21820, Mar 31 '22 at 16:45
I found this post extremely instructive. Thank you! In particular, it is the first time I come across "exportation." I cannot quite explain why, but I find it downright cool. It is not just that it is (of course) true, but that it is so "interpretable", it just "fits right in." In fact, it immediately suggests to me that A1 can be rendered as $(\mathscr{B}\wedge\mathscr{C})\Rightarrow\mathscr{B}$, which strikes me as more "obviously true" than the original A1 (admittedly, a 100% subjective opinion). — kjo, Apr 01 '22 at 10:08
@kjo: Your idea in your comment is incorrect, because often the purpose of having axioms involving only ⇒,⊥,¬ is that you can define the other boolean operations in terms of those. As I stated in my other comments, this is not for practical usage of the system but to make it easier to study the system as a mathematical object. So you cannot use your version of (A1) because it simply won't work the same. — user21820, Apr 01 '22 at 14:05
@user21820: maybe I did not use the right wording. I did not mean to replace A1 with my version. Rather, I mean that A1 can be understood as a way to render the (more compelling) $(\mathscr{B}\wedge\mathscr{C})\Rightarrow\mathscr{B}$ in a form that uses only $\Rightarrow$, $\neg$, and $\bot$. — kjo, Apr 01 '22 at 16:44
@kjo: Ah, then that's better, but as I said in my earlier comments the true meaning of (A1) is restatement. There is not really any other meaning, nor is there a need to use conjunction to explain (A1). I think it's best if you actually look at a Fitch-style system to understand Hilbert-style. — user21820, Apr 02 '22 at 18:13
@user21820: Thank you! I really appreciate your help (and patience)! — kjo, Apr 03 '22 at 12:46

score 9 · Answer 2 · edited Mar 30 '22 at 22:52

Hilbert himself cites the relevant axioms as follows (see his The Foundations of Mathematics in From Frege to Gödel: A Source Book in Mathematical Logic, 1879-1931 edited by Jean van Heijenoort, Harvard University Press, 1967; I have updated the notation for several symbols):

I. Axioms of implication

$A\to (B\rightarrow A)$ (introduction of an assumption)

$(A\rightarrow (A\rightarrow B))\rightarrow (A\rightarrow B)$ (omission of an assumption)

$(A\rightarrow (B\rightarrow C))\rightarrow (B\rightarrow (A\rightarrow C))$ (interchange of assumptions)

$(B\rightarrow C)\rightarrow ((A\rightarrow B)\rightarrow (A\rightarrow C))$ (elimination of a proposition).

II. Axioms about $\wedge$ and $\vee$

$A\wedge B\rightarrow A$;

$A\wedge B\rightarrow B$;

$A \rightarrow (B \rightarrow A \wedge B)$

$A\rightarrow A\vee B$;

$B\rightarrow A\vee B$;

$((A\rightarrow C)\wedge (B\rightarrow C))\rightarrow ((A\vee B)\rightarrow C))$.

III. Axioms of negation

$(A\rightarrow B\wedge\neg B)\rightarrow\neg A$ (principle of contradiction);

$\neg(\neg A))\rightarrow A$ (principle of double negation).

The axioms of groups I, II, and III are nothing but the axioms of the propositional calculus. From 11 and 12 there follows, in particular, the formula

$(A\wedge\neg (A))\rightarrow B$

and further the logical principle of excluded middle,

$((A\rightarrow B)\wedge (\neg A\rightarrow B))\rightarrow B$.

We are more interested in the implicational fragment. The names of the axioms Hilbert supplies a guidance. The first axiom, for instance, can be further explicated as: We have already a proposition $A$, and if we assume $B$, then we can write $B\to A$, and so for the others.

Looking from the perspective of natural deduction may help grasp the ideas. Dosen gives the following correspondences (see his A Historical Introduction to Substructural Logics in Substuctural Logics edited by Peter Schroder-Heister and Kosta Dosen, Clarendon Press, 1993):

The 1st axiom, introduction of an assumption, corresponds to thinning.

The 2nd axiom, omission of an assumption, corresponds to contraction.

The 3rd axiom, interchange of assumptions, corresponds to permutation.

The 4th axiom, elimination of a proposition, corresponds to what Dosen calls association and notes that it is related to cut rule, which I think more explicative.

The basis that Hilbert set out has been later rearranged into handier systems; Mendelson's is just one of them.

Thank you, the historical perspective is really helpful here. Contrary to other areas of mathematics (e.g. group theory) which I have been able to study without needing to know much historical background, I'm finding that without some historical background, the study of logic is a bit bewildering. (I have a similar problem with statistics, by the way.) — kjo, Mar 31 '22 at 11:39
Glad to help. In the case of foundational and logical matters, the original intentions and meanings of statements are wiped away in time, and clueless textbook expositions remain. Hence, it is quite rewarding to read the original arguments and discussions as far as it is possible. — Tankut Beygu, Mar 31 '22 at 11:50

score 3 · Answer 3 · answered Mar 30 '22 at 19:38

Note: This answer works with $\neg,\to$ as the base connectives, the word "calculus" always refers to a Hilbert-style proof calculus for propositional logic.
The system presented in Mendelson is a sound and complete Hilbert-style propositional calculus. It’s easy to check that all the axioms are logical validities, and since modus ponens is a valid inference rule, one has the following theorem:
Soundness:Given a set of assumptions $T$, if there is a proof of $\phi$ from $T$, then $T$ semantically entails $\phi$.
One obviously wants this theorem to be true, and so can’t pick any random set of axioms, still one could just pick any set of tautologies, this is where completeness comes in, one also wants the calculus to satisfy: Completeness: If $T$ semantically entails $\phi$, then there should be a proof of $\phi$ from $T$
Now if we want both of these to be true we can’t take any random set of even tautologies as axioms, but one may still want take the set of all tautologies instead of the seemingly opaque axiom schemas, but as it turns out(take a look at the proof of completeness), that one actually requires much less, and if you work very hard and keep trying to remove redundancies, you can come up with a similar set of schemas.(Which is much nicer than throwing in every tautology!)

J126 · Answer 4 · 2022-03-30T22:41:18.380

You refer to the axioms as as "opaque" and "obscure", but they are not that at all, once you get used to logic.

For example, $P \Rightarrow (Q \Rightarrow P)$ is trying to say "If $P$ is true, then no matter what $Q$ is, it will imply it". This tells you the nature of $\Rightarrow$. If $P$ is true, then $Q \Rightarrow P$ is always true. You can also think about this as functions. If I know that $P$ is a non-empty set, with element $a \in P$, then I can define a function from any other set $Q$ by $f(x) = a$.

The next says that if $P \Rightarrow (Q \Rightarrow R)$, then $P \Rightarrow Q$ will imply that $P \Rightarrow R$. So, if $P$ is enough to prove that $Q$ will imply $R$, and if $P$ implies $Q$ is true, then we must have $R$ based only on the fact $P$ is true. The function analogy works again. The assumption is every element $x \in P$ gives us a function $f_x : Q \to R$. Then, from a function $g: P \to Q$, we get a function $P \to R$ by $a \mapsto f_a(g(a))$.

Why one cares, is that we need to make sure the foundations of our mathematical system produces what we want to consider "math", and that it matches our intuition for what rules of logic we use when producing proofs. For most mathematicians this is "birds don't need to know aerodynamics to fly". But, someone has to keep track of the foundational theory to make sure it works correctly.

Thank you! This is the first time that I've come across the analogy between implications and functions. I tried Google to learn more about it, but I could not come up with a good search strategy. Where/how can I learn more about this analogy? — kjo, Mar 31 '22 at 10:56
@kjo If you look up type theory as a foundation for math, the function type is both implication and functions. The Homotopy Type Theory book (HoTT Book) has this in the first chapter. By the way, I didn't mean my first sentence to be condescending. I just think that most math is opaque until you spend a lot of time with it. I was hoping to encourage spending some time. — J126, Mar 31 '22 at 14:48
Thank you again. And no worries, I didn't take your first sentence as condescending. I took as being in the same vein as von Neumann's quote, which I rather like. — kjo, Mar 31 '22 at 14:57

score 1 · Answer 5 · edited Mar 31 '22 at 11:34

In other words, what makes the results derived from axioms A1, A2, and A3 above more worthy of our interest than results derived from some other random set of equally obscure axioms?

The results derived from some other sound and complete set of axioms for classical propositional calculus aren't, ceteris paribus, any more or less worthy of our interest than results derived from A1, A2, and A3. However, getting those results won't be as easy in every single system that could get used (there is no end to such possible systems). Take a look at the exercises in Mendelsohn's book and see how many of the other systems you find easier to use.

Sure, one can derive "results" from them, but who cares?

Among other things, such axiomatic systems have gotten used to test and help develop theorem provers. Studying such systems has lead to the development of proof strategies in automated reasoning, which has had applications elsewhere. The above axiom set, I would argue, also makes it easier to automatically produce tautologies than many, many other and indeed most, other existing and known axiom sets. I think people who have studied other axiom sets would also tend to agree that the above system comes as easier to use than most other axiom sets in the literature. Why is that?

As Bram has pointed out (A1) and (A2) allow us to prove a deduction metahtheorem. The proof for this system is also simpler than the proof for some other systems. Also, as Bram has pointed out (A3) captures reductio ad abusurdum, and the combination of the two makes reasoning in propositional calculus relatively easy.

In other words, the above system is a relatively easy system in comparison to some others for deriving results, at least in my experience. Now that does still leave some open questions though...

What's the motivation for using A2: ((B⇒(C⇒D))⇒((B⇒C)⇒(B⇒D))) instead of A2': ((B⇒C)⇒((B⇒(C⇒D))⇒(B⇒D)))? A2 expresses the principle that '⇒' distributes over itself in one direction, while A2' does not. Also, Frege used something close to A2. So, using A2 might get used, in part, because of historical reasons.

What's the motivation for using A3: (((¬C)⇒(¬B))⇒(((¬C)⇒B)⇒C)) instead of A3': (((¬C)⇒B)⇒(((¬C)⇒(¬B))⇒C))? A3 allows for a short derivation of the law of Clavius: (((¬C)⇒C)⇒C), while the derivation using A3' as an axiom instead would be longer and take some more work.

Fangru Shao · Answer 6 · 2022-10-19T08:28:15.860

Forgive me to write the Axioms in the ways I am used to:

A1. $\phi \to (\psi \to \phi)$

A2. $(\phi \to (\psi \to \chi)) \to ((\phi \to \psi) \to (\phi \to \chi))$

A3. $(\neg \phi \to \psi) \to ((\neg \phi \to \neg \psi) \to \phi)$

We now use deduction theorem to rewrite the Axioms to help us think.

A1. $\phi \vdash (\psi \to \phi)$

If we have $\phi$, we could add any premise $\psi$ to $\phi$.

A2. $\phi \to (\psi \to \chi), \phi \to \psi \vdash (\phi \to \chi)$

It is actually a strong version of hypothetical syllogism. $\psi \to \chi$ is the premise of $\phi \to (\psi \to \chi)$ and the version using premise $\psi \to \chi$ instead of the consequence $\phi \to (\psi \to \chi)$ is the hypothetical syllogism we always see.

HS. $\psi \to \chi, \phi \to \psi \vdash (\phi \to \chi)$

A3. $\neg \phi \to \psi, \neg \phi \to \neg \psi \vdash \phi$

It is the contradiction. If in the condition of $\neg \phi$ we could infer both $\neg \psi$ and $\psi$, then the contradiction would lead us to $\phi$.

Both A2 and A3 are already in their strongest statement which means that both inverses of these 2 Axioms also hold. It also amazes me that only these 3 Axioms could form a complete system.

What is the motivation for the axioms for Propositional Calculus in Mendelson's "Introduction to Mathematical Logic"?

6 Answers6