Theorem 1.6. Every pair of integers $a$ and $b$ has a common divisor $d$ of the form
$$ d = ax + by $$
where $x$ and $y$ are integers. Moreover, every common divisor of $a$ and $b$ divides this $d$.
The proof (with my questions throughout) goes as follows:
Proof. First assume that $a \geq 0, b \geq 0$ and use induction on $n = a + b$. If $n = 0$ then $a = b = 0$, and we can take $d = 0$ with $x = y = 0$. Assume, then, that the theorem has been proved for $0, 1, 2, ..., n - 1$.
I am a little confused about taking $n$ to be $a + b$, since it's not obvious that all pairs $\{a, b\}$ would be covered by induction for all combinations of $a, b \in \mathbb{Z}$.
Define the height $h$ of a point $(a,b)\in\Bbb N^2$ by $\,h(a,b) := a+b.\,$ We prove by strong induction on height that the statement $P(a,b)$ is true for all points $\,(a,b)\in\Bbb N^2.\,$ Since this type of induction often proves puzzling to students I will explain it from a geometric viewpoint to help aid intuition.
The points $(x,y)$ of height $\:\!n\:\!$ satisfy $\,x+y = n\,$ i.e. $\,y = n -x,\,$ so they are the lattice points on the line segment $\ell_n$ of slope $\,-1\,$ from $(0,n)$ to $(n,0)\,$ in the first quadrant. If we rotate the plane $\,45^\circ $ counter-clockwise then then $\ell_n$ is the $n$'th horizontal line in the partition of the first quadrant (looking up from the origin).
These lines $\ell_n$ form a partition of $\Bbb N^2,\,$ so to prove that the statement $P$ is true for all points in $\Bbb N^2\,$ it suffices to prove that the statement $P$ is true for all points on each line $\,\ell_n,\,$ which we do by complete induction on $\,n,\,$ lifting the truth of $P$ on lower height lines $\ell_k,\ k < n\,$ up to the line $\,\ell_n.\,$
By symmetry, we can assume $a \geq b$. If $b = 0$ take $d = a, x = 1, y = 0$.
If $b \geq 1$ we can apply the induction hypothesis to $a - b$ and $b$, since their sum is $a = n - b \leq n - 1$. Hence there is a common divisor $d$ of $a - b$ and $b$ of the form $d = (a - b)x + by$.
I'm going to let $a' = a - b$, let $b' = b$ and let $d' = a'x + b'y$. (I wish Apostol did something like this to make his proofs clearer.)
I don't understand this logical step. Why does the fact that $a' + b' \leq n - 1$ imply that $d'$ exists and is a common divisor of $a'$ and $b'$? This seems like a huge leap.
$h(a',b') = h(a\!-\!b,b) = \color{#c00}a\!-\!b\!+\!\color{#c00}b = \color{#c00}n\!-\!b <n $ (by $\,b\ge 1)$ so $\:\!(a',b')\:\!$ is on lower height line $\,\ell_{n-b}\,$ so $P(a',b')$ is true (our induction hypothesis is that $P$ is true for all points on lower height lines).
Here $P(a,b) := [\![\,d\mid a,b\,$ and $\,d = ax+by\,$ for some $\,x,y\in\Bbb Z\,]\!],\,$ so $\,P(a',b')$ $\,\Rightarrow\,d\mid a',b'\,$ i.e. $\,d\mid a\!-\!b,\,b\,$ and $\,d = a'x+b'y = (a-b)x+by$.
This $d$ also divides $(a - b) + b = a$, so $d$ is a common divisor of $a$ and $b$ and we have $d = ax + (y-x)b$, a linear combination of $a$ and $b$.
At this point I am clueless. Why does $d$ divide $a$ and why does this imply it also divides $b$? And where does Apostol get $y-x$ from??
Here we are transforming the lower height statement $P(a',b')$ into the form $P(a,b)$ at height $\,n.\,$ At lower height we have $\,d\mid a\!-\!b,\,b\,$ so $\,d\mid (a\!-\!b)+b = a,\,$ hence $\,d\mid a,b,\,$ which is what we need for $\,P(a,b)\,$ at height $n$. Similarly we lift the linear combination by rearranging it to sought form, i.e. $\,d = (a\!-\!b)x + by = ax+b(y\!-\!x) = ax+by'$ in the required $P(a,b)$ form.
To complete the proof we need to show that every common divisor divides $d$. Since a common divisor divides $a$ and $b$, it also divides the linear combination $ax + (y-x)b = d$. This completes the proof if $a \geq 0$ and $b \geq 0$. If one or both of $a$ and $b$ is negative, apply the result just proved to $|a|$ and $|b|$.
Why not just do the entire proof with absolute values from the beginning?
Because peppering sign handling throughout the proof would obfuscate the essence of the matter, which has nothing to do with signs. As you've seen, the proof can be challenging to understand already without this extra complexity.
Soft question: is it normal for authors to be very terse and not explain or give motivation for any steps? How do you go about trying to understand proofs that require a higher level of intuition than you currently have?
Yes, unfortunately many proofs are presented completely unmotivated so you have to "reverse engineer" them to discover the underlying intuition.
The intuition is obfuscated in this presentation. They key idea is that sets of integers closed under subtraction are closed under remainder so closed under gcd, so they are precisely the multiples of their least positive element (= gcd of all elements), as is easily proved by descent using the Euclidean algorithm (in subtractive form (as here) or remainder form). This is explained in elementary language in this answer. It will be clarified if you study ring theory in a course on abstract algebra (viz. Euclidean domains are PIDs, e.g. see this answer)