Intuition on an inductive proof of gcd Bezout identity (from Apostol: Math, Analysis 2ed)

Question

I've done proofs in discrete mathematics, but I'm still at the stage where proofs with more than a few steps make me uncomfortable.

From Apostol's Mathematical Analysis [2nd Ed.] on page 5, we have

Theorem 1.6. Every pair of integers $a$ and $b$ has a common divisor $d$ of the form $$ d = ax + by $$ where $x$ and $y$ are integers. Moreover, every common divisor of $a$ and $b$ divides this $d$.

The proof (with my questions throughout) goes as follows:

Proof. First assume that $a \geq 0, b \geq 0$ and use induction on $n = a + b$. If $n = 0$ then $a = b = 0$, and we can take $d = 0$ with $x = y = 0$. Assume, then, that the theorem has been proved for $0, 1, 2, ..., n - 1$.

I am a little confused about taking $n$ to be $a + b$, since it's not obvious that all pairs $\{a, b\}$ would be covered by induction for all combinations of $a, b \in \mathbb{Z}$.

By symmetry, we can assume $a \geq b$. If $b = 0$ take $d = a, x = 1, y = 0$.

OK.

If $b \geq 1$ we can apply the induction hypothesis to $a - b$ and $b$, since their sum is $a = n - b \leq n - 1$. Hence there is a common divisor $d$ of $a - b$ and $b$ of the form $d = (a - b)x + by$.

I'm going to let $a' = a - b$, let $b' = b$ and let $d' = a'x + b'y$. (I wish Apostol did something like this to make his proofs clearer.)

I don't understand this logical step. Why does the fact that $a' + b' \leq n - 1$ imply that $d'$ exists and is a common divisor of $a'$ and $b'$? This seems like a huge leap.

This $d$ also divides $(a - b) + b = a$, so $d$ is a common divisor of $a$ and $b$ and we have $d = ax + (y-x)b$, a linear combination of $a$ and $b$.

At this point I am clueless. Why does $d$ divide $a$ and why does this imply it also divides $b$? And where does Apostol get $y-x$ from??

To complete the proof we need to show that every common divisor divides $d$. Since a common divisor divides $a$ and $b$, it also divides the linear combination $ax + (y-x)b = d$. This completes the proof if $a \geq 0$ and $b \geq 0$. If one or both of $a$ and $b$ is negative, apply the result just proved to $|a|$ and $|b|$.

Why not just do the entire proof with absolute values from the beginning?

Soft question: is it normal for authors to be very terse and not explain or give motivation for any steps? How do you go about trying to understand proofs that require a higher level of intuition than you currently have?

Also, in the step where they say that $d$ also divides $(a-b)+b\quad (=a)$, they are using the fact that if $d$ divides $m$ and $d$ divides $n$ (where $d,m,n$ are integers), then $d$ divides $m+n$ (in fact, $d$ divides $sm+tn$ for any integers $s,t$). As for why $d$ divides $b$, that was from the definition of that $d$ (it is a common divisor of $a-b$ and $b$). — Minus One-Twelfth, Dec 27 '19 at 23:57
As for why a $d$ exists that divides both $a'$ and $b'$, that follows from the induction assumption. — Minus One-Twelfth, Dec 28 '19 at 00:02
Note that there is no such thing as a perfect proof, so your comments about how you wish he would use $a', b', |a|, |b|,$ etc., don't really mean much. As you follow a proof, you should be writing down all the steps yourself. If you want to use $a', b',$ etc. then do so, and try to fill in any logical steps that might not be clear. This will become especially important when you start to read research articles. --- Often, with practice, you will be able to fill in the logical steps fairly easily, but even with lots of practice, you'll sometimes find truly deficient proofs.... — Brian Moehring, Dec 28 '19 at 00:13
While Apostol is one of my favorite authors, I think this proof of his is unnecessarily complicated. One should always provide simple proofs if possible. For example the default approach of analyzing the set $A={ax+by\mid x, y\in\mathbb {Z}, ax+by>0} $ works fine. The set $A$ has a least element $d$ by well ordering principle and one can check that $d$ is indeed the gcd. — Paramanand Singh, Dec 28 '19 at 14:50
@Paramanand That is exactly the sort of proof that I linked to in my answer. — Bill Dubuque, Dec 28 '19 at 21:11

Bill Dubuque · Accepted Answer · 2023-03-20T23:35:14.980

Theorem 1.6. Every pair of integers $a$ and $b$ has a common divisor $d$ of the form $$ d = ax + by $$ where $x$ and $y$ are integers. Moreover, every common divisor of $a$ and $b$ divides this $d$.

The proof (with my questions throughout) goes as follows:

Proof. First assume that $a \geq 0, b \geq 0$ and use induction on $n = a + b$. If $n = 0$ then $a = b = 0$, and we can take $d = 0$ with $x = y = 0$. Assume, then, that the theorem has been proved for $0, 1, 2, ..., n - 1$.

I am a little confused about taking $n$ to be $a + b$, since it's not obvious that all pairs $\{a, b\}$ would be covered by induction for all combinations of $a, b \in \mathbb{Z}$.

Define the height $h$ of a point $(a,b)\in\Bbb N^2$ by $\,h(a,b) := a+b.\,$ We prove by strong induction on height that the statement $P(a,b)$ is true for all points $\,(a,b)\in\Bbb N^2.\,$ Since this type of induction often proves puzzling to students I will explain it from a geometric viewpoint to help aid intuition.

The points $(x,y)$ of height $\:\!n\:\!$ satisfy $\,x+y = n\,$ i.e. $\,y = n -x,\,$ so they are the lattice points on the line segment $\ell_n$ of slope $\,-1\,$ from $(0,n)$ to $(n,0)\,$ in the first quadrant. If we rotate the plane $\,45^\circ $ counter-clockwise then then $\ell_n$ is the $n$'th horizontal line in the partition of the first quadrant (looking up from the origin).

These lines $\ell_n$ form a partition of $\Bbb N^2,\,$ so to prove that the statement $P$ is true for all points in $\Bbb N^2\,$ it suffices to prove that the statement $P$ is true for all points on each line $\,\ell_n,\,$ which we do by complete induction on $\,n,\,$ lifting the truth of $P$ on lower height lines $\ell_k,\ k < n\,$ up to the line $\,\ell_n.\,$

By symmetry, we can assume $a \geq b$. If $b = 0$ take $d = a, x = 1, y = 0$. If $b \geq 1$ we can apply the induction hypothesis to $a - b$ and $b$, since their sum is $a = n - b \leq n - 1$. Hence there is a common divisor $d$ of $a - b$ and $b$ of the form $d = (a - b)x + by$.

I'm going to let $a' = a - b$, let $b' = b$ and let $d' = a'x + b'y$. (I wish Apostol did something like this to make his proofs clearer.)

I don't understand this logical step. Why does the fact that $a' + b' \leq n - 1$ imply that $d'$ exists and is a common divisor of $a'$ and $b'$? This seems like a huge leap.

$h(a',b') = h(a\!-\!b,b) = \color{#c00}a\!-\!b\!+\!\color{#c00}b = \color{#c00}n\!-\!b <n $ (by $\,b\ge 1)$ so $\:\!(a',b')\:\!$ is on lower height line $\,\ell_{n-b}\,$ so $P(a',b')$ is true (our induction hypothesis is that $P$ is true for all points on lower height lines).

Here $P(a,b) := [\![\,d\mid a,b\,$ and $\,d = ax+by\,$ for some $\,x,y\in\Bbb Z\,]\!],\,$ so $\,P(a',b')$ $\,\Rightarrow\,d\mid a',b'\,$ i.e. $\,d\mid a\!-\!b,\,b\,$ and $\,d = a'x+b'y = (a-b)x+by$.

This $d$ also divides $(a - b) + b = a$, so $d$ is a common divisor of $a$ and $b$ and we have $d = ax + (y-x)b$, a linear combination of $a$ and $b$.

At this point I am clueless. Why does $d$ divide $a$ and why does this imply it also divides $b$? And where does Apostol get $y-x$ from??

Here we are transforming the lower height statement $P(a',b')$ into the form $P(a,b)$ at height $\,n.\,$ At lower height we have $\,d\mid a\!-\!b,\,b\,$ so $\,d\mid (a\!-\!b)+b = a,\,$ hence $\,d\mid a,b,\,$ which is what we need for $\,P(a,b)\,$ at height $n$. Similarly we lift the linear combination by rearranging it to sought form, i.e. $\,d = (a\!-\!b)x + by = ax+b(y\!-\!x) = ax+by'$ in the required $P(a,b)$ form.

To complete the proof we need to show that every common divisor divides $d$. Since a common divisor divides $a$ and $b$, it also divides the linear combination $ax + (y-x)b = d$. This completes the proof if $a \geq 0$ and $b \geq 0$. If one or both of $a$ and $b$ is negative, apply the result just proved to $|a|$ and $|b|$.

Why not just do the entire proof with absolute values from the beginning?

Because peppering sign handling throughout the proof would obfuscate the essence of the matter, which has nothing to do with signs. As you've seen, the proof can be challenging to understand already without this extra complexity.

Soft question: is it normal for authors to be very terse and not explain or give motivation for any steps? How do you go about trying to understand proofs that require a higher level of intuition than you currently have?

Yes, unfortunately many proofs are presented completely unmotivated so you have to "reverse engineer" them to discover the underlying intuition.

The intuition is obfuscated in this presentation. They key idea is that sets of integers closed under subtraction are closed under remainder so closed under gcd, so they are precisely the multiples of their least positive element (= gcd of all elements), as is easily proved by descent using the Euclidean algorithm (in subtractive form (as here) or remainder form). This is explained in elementary language in this answer. It will be clarified if you study ring theory in a course on abstract algebra (viz. Euclidean domains are PIDs, e.g. see this answer)

Thank you @BillDubuque for your incredible clarity! I just have one last gripe; could you please tell me if the following explanation is correct: when "lifting" the statement $P(a', b')$ to $P(a, b)$, we cannot do it in one fell swoop, i.e. there is no single induction step that "lifts" an entire line $\ell_{n - 1}$ (for example) to $\ell_n$. Instead, we have to do the induction step for each lattice point $(a, b)$ on the line $\ell_n$. For example, if $n = 7$ then we need to do the induction step on $(0, 7), (1, 6), (2, 5)$, etc. — Jeremy Lindsay, Dec 28 '19 at 11:08
@Jeremy Good question. Note that the proof of the inductive step starts by assuming that $,(a,b),$ is any point of height $n,,$ so the inductive inference - in one fell swoop - lifts the truth up to all points of height $n,,$ i.e. up to the entire line segment $,\ell_n., $ Thus we can view the induction as working on lines $\ell_n,$ indexed by $\Bbb N,$ (vs. points indexed by $,\Bbb N^2$). This dimensional reduction allows us to use normal ($1$-dimensional) induction on the $\Bbb N$-indexed lines $,\ell_n.\ \ \ $ — Bill Dubuque, Dec 28 '19 at 21:32

score 4 · Answer 2 · answered Dec 28 '19 at 00:07

I am a little confused about taking $n$ to be $a+b$, since it's not obvious that all pairs $\{a,b\}$ would be covered by induction for all combinations of $a,b\in\mathbb{Z}$.

Note at this point in the proof we've already restricted our attention only to all non-negative integer $a,b$, according to the very first statement "First assume that $a\ge0$, $b\ge0$". The proof will come back to all integers in the very end. But for now $a,b$ are non-negative. For any such non-negative integers $a,b$, their sum $n=a+b$ is also a non-negative integer. So induction by $n\ge0$ will cover all possible pairs $\{a,b\}$ that we're currently considering.

Why does the fact that $a′+b′\le n−1$ imply that $d$ exists and is a common divisor of $a′$ and $b′$?

He didn't say that yet. But he will justify it in the next paragraph. For now, here's what has been said, using your notation for more clarity.

Let $a′=a−b$ and let $b′=b$. Then by the induction hypothesis there exists $d=a′x+b′y$ satisfying the conclusion of the theorem for $a'$ and $b'$, which in particular means that $d$ is a common divisor of both $a'=a-b$ and $b'=b$. Note that I intentionally used the notation of "$d$" rather than "$d'$" for this new number.

Before we move on to the next part, let me reiterate where we are. For now, this $d$ has been found for $a'=a-b$ and $b'=b$, but not for $a$ and $b$ yet. However, as the next step, we will show that the very same $d$ works for $a$ and $b$ too.

Why does $d$ divide $a$ and why does this imply it also divides $b$?

Since $a=(a-b)+b=a'+b'$ and we know that $d$ divides both $a'$ and $b'$, it also divides their sum. And it divides $b=b'$ from the previous step.

And where does Apostol get $y−x$ from?

From $d=a'x+b'y=(a-b)x+by=ax-bx+by=ax+b(y-x)$.

Why not just do the entire proof with absolute values from the beginning?

That's effectively exactly what he did by saying that first of all we consider the case of $a,b\ge0$.

score 2 · Answer 3 · answered Dec 28 '19 at 00:01

Your crucial problem with this proof would appear to be at the point where you say:-" I don't understand this logical step. Why does the fact that a′+b′≤n−1 imply that d exists and is a common divisor of a′ and b′? This seems like a huge leap."

If you consider the first paragraph of the proof you will see it stated that we are assuming that the theorem is true if $a+b\le n$.

Since $(a-b)+b$ is less than $a+b=n$ we can therefore assume the theorem to be true for $(a-b)$ and $b$ and that is precisely what Apostol has done.

Intuition on an inductive proof of gcd Bezout identity (from Apostol: Math, Analysis 2ed)

3 Answers3

Linked

Related