Proof and Intuition of the Determinant Formula?

Question

[I did notice similar questions were asked here before, but I couldn't find a satisfactory answer for me to grasp as a beginner, so I chose to post this question]

I'm just starting to teach myself linear algebra with Linear Algebra and Group Theory. The book starts with the concept of determinant with the definition of even & odd permutations by giving the example of a {3 by 3 array} with the following equation:

$$\begin{vmatrix} a_{11}\ \ a_{12}\ \ ...\ \ a_{1k}\ \ ...\ \ a_{1n}\\a_{21}\ \ a_{22}\ \ ...\ \ a_{2k}\ \ ...\ \ a_{2n}\\...\ \ ...\ \ ...\ \ ...\ \ ...\ \ ...\\a_{n1}\ \ a_{n2}\ \ ...\ \ a_{nk}\ \ ...\ \ a_{nn}\end{vmatrix}=\displaystyle\sum_{(p_1, p_2, ..., p_n)}(-1)^{[p_1, p_2, ..., p_n]}a_{1p_1}a_{2p_2}...a_{np_n}$$

where $[p_1, p_2, ..., p_n] $ is the number of inversions of permutation $p_1, p_2, ..., p_n$

So because there is no justification given in the book for the above equation for {$n$ by $n$ array} and the concept of determinant feels a bit odd at the first place, I tried to investigate it myself:

[Step 1]: I started with a {2 by 2 array} first, by taking a system of two equations in two unknowns:

$$(eq1):\ a_{11}x_1+a_{12}x_2=b_1\\(eq2):\ a_{21}x_1+a_{22}x_2=b_2$$ $$A=\begin{Vmatrix} a_{11}\ \ a_{12}\\a_{21}\ \ a_{22}\end{Vmatrix}$$with the determinant $detA$

[Step 2]: I then tried to eliminate {$x_2$ in $eq1$} and {$x_1$ in $eq2$} by doing the following:

$$(eq1 \cdot a_{22})-(eq2 \cdot a_{12})=(a_{11}a_{22}-a_{12}a_{21})x_1\\(eq2 \cdot a_{11})-(eq1 \cdot a_{21})=(a_{11}a_{22}-a_{12}a_{21})x_2$$

[Step 3]: I noticed that the two coefficients above both give me the determinant of the array, so I then postulated the following statement:

the determinant is {the coefficient of unknown $x_k$ in $k$th row} after eliminating other unknowns from the $k$th row by {multiplication} and {subtracting other rows} in the array.

that is to say, if I have a $n$th order array $$N=\begin{Vmatrix} a_{11}\ \ a_{12}\ \ ...\ \ a_{1k}\ \ ...\ \ a_{1n}\\a_{21}\ \ a_{22}\ \ ...\ \ a_{2k}\ \ ...\ \ a_{2n}\\...\ \ ...\ \ ...\ \ ...\ \ ...\ \ ...\\a_{n1}\ \ a_{n2}\ \ ...\ \ a_{nk}\ \ ...\ \ a_{nn}\end{Vmatrix}$$ I can eventually transfer $N$ into $$\begin{Vmatrix} detN & 0 & ... & 0 & ... & 0\\0 & detN & ... & 0 & ... & 0\\... & ... & ... & ... & ... & ...\\0 & 0 & ... & 0 & ... & detN\end{Vmatrix}$$

[Step 4]: I tested my statement with a {3 by 3 array} and it seems to work. And the idea of odd & even permutation seems to become more intuitive as it has to do with the order of subtracting depending on the row of the unknowns.

So here comes my questions

if my guess is right, how do I construct the permuatation formula at the beginning of the question for {$n$ by $n$ array} without defining determinant by using a set of formalistic operation at the first place ?
I've seen multiple answers talking about the geometric intuition of determinant (and I roughly get the idea). How does the intuition of permutation connects, or transfers into the geometric intuition ?

[Note: I have never studied abstract algebra, so answers without using notations in abstract algebra will be much appreciated :)]

-----------------------------------------------------------------------
EDIT: I think I figured out my question 2 (the geometric intuition).... correct me if I am wrong

So again using a {2 by 2 array} as an example:

[Step 1]: Assume again I have the following equations and array

$$(eq1):\ a_{11}x_1+a_{12}x_2=b_1\\(eq2):\ a_{21}x_1+a_{22}x_2=b_2$$ $$A=\begin{vmatrix} a_{11}\ \ a_{12}\\a_{21}\ \ a_{22}\end{vmatrix}$$ with the determinant $detA$

[Step 2]: I can immediately transfer the array into $$\begin{Vmatrix} detA & 0\\0 & detA\end{Vmatrix}$$ [Step 3]: Becasue the above array is the coefficient of $x_1$ and $x_2$, I can write the unknowns down as a vector $$\begin{bmatrix} x_1\\x_2\end{bmatrix}$$ which makes the {$detA$ array} a linear transformation when multiply with this vector

The determinant of a matrix is the unique number satisfying the properties: (1) the determinant of the identity matrix is $1$, (2) adding a multiple of a row to another row doesn't change the determinant, (3) scaling a row scales the determinant. Geometrically, if the determinant is the volume of a parallelepiped spanned by the rows of the matrix, (1) is that the standard unit (hyper)cube has volume $1$, (2) is that shear transformations don't change volume, and (3) is that stretching scales the volume. It turns out the determinant is multilinear, which gives permutation-based formula. — Kyle Miller, May 20 '21 at 19:31
The key property that follows from (2)&(3) is that the determinant of a matrix where one row is a sum of two vectors is the sum of the determinants of the two matrices where that row is replaced by one of the two vectors in the sum. This property lets you write a determinant as a sum of scalar multiples of determinants of permutation matrices. Swapping rows of a determinant multiplies the determinant by $-1$, so you can work out what these are (and it coincides with the sign of the permutation). In case it's useful, some notes I wrote: https://math.berkeley.edu/~kmill/math54fa16/det.pdf — Kyle Miller, May 20 '21 at 19:35
Thanks for the comments, but what you said seems to predefine what a determinant is and then derive all the properties from it. Is there anyway to derive the formula without giving its definition at the first place? @Kyle Miller — P'bD_KU7B2, May 20 '21 at 19:46
You need to start with some kind of definition, otherwise there's nothing to refer to. Some options include (a) define it by the permutation formula (which lacks any intuition whatsoever but which is obviously well-defined), (b) define it geometrically as the volume of a parallelepiped (which is essentially the row operation approach), (c) define it in terms of alternating multilinear forms (closely related to (b)). The fancy version of (c) is to choose a basis vector for the n-fold exterior power of an n-dimensional vector space, which is 1-dimensional, which requires some abstract algebra! — Kyle Miller, May 20 '21 at 19:56
I know there are some (mostly older) linear algebra textbooks that start with determinants, and yours appears to be one. But that's a torturous way to learn linear algebra, in my opinion. — , May 20 '21 at 19:58
I'd second @KyleMiller's description of determinant by its characterization (=desired/required properties) rather than by a formula, even though in common math education things are constructed more often than characterized. Maybe it's more tangible. Historically, I myself do not know off-hand whether people thought about determinants by characterization or formula. (It may be worth noting that determinants were studied for decades before linear algebra was a thing!) — paul garrett, May 20 '21 at 19:59
I understand your points, but I mean there's got to be a way to construct it without defining it at the first place...... My question is not about defining determinant, but what excatly is the way to construct its formula through the properties of the equation system. @KyleMiller — P'bD_KU7B2, May 20 '21 at 20:08
Maybe Google "Cramer's rule" (for solving linear systems), and see whether you want to characterize "determinants" by making Cramer's rule correct... and then derive the formula from that... ? (This would be a variant of what @KyleMiller had mentioned, I think, but it might be a variant with more appeal to you...) — paul garrett, May 20 '21 at 20:32
@paulgarrett, Thanks for the suggestion. I did check it before posting the question because to understand Cramer's rule requires the notion of determinant as prerequisite (this also happened in the book I use). I guess to put myself in a better way: if I have a $n$ by $n$ equation system, how can I find the general property of it (with the notion of permutation), formulate it, and then call it a "determinant" instead of doing it in reverse ? — P'bD_KU7B2, May 20 '21 at 20:47
Gilbert Strange has an excellent lecture on the determinant that uses the method @KyleMiller suggests which derives all the useful properties of the determinant. Historically the determinant was found as a means by which to discover if the linear equations involved have a unique solution or not and Strang shows why this is the case using the alternating multilinear form with $\det I = 1$ definition. You can find it here. https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-18-properties-of-determinants/ — CyclotomicField, May 20 '21 at 21:04
I like the way you discovered the determinant formula in the cases where $n = 2$ or $n = 3$. My answer to your question 1 is that you have to guess the correct formula based on what you learned from cases $n = 2$ and $n = 3$ (and possibly $n = 4$, if necessary) and then prove that the formula you guessed has all the properties you'd hope it has. You can guess the "big sum" formula you wrote at the beginning of your question just by looking at the formula for the case $n = 3$. (And an obsessed mathematician would not hesitate to work out the case $n = 4$ if the pattern weren't yet clear.) — littleO, May 23 '21 at 09:04

score 10 · Accepted Answer · answered May 20 '21 at 20:55

You seem to have rediscovered the adjugate matrix. I suppose you could try to use it to define the determinant, but I'd be hesitant that it would be well-defined, i.e. independent of whatever choices you've made while row-reducing. It is basically another way to think of Cramer's rule.

The indisputably* conceptually correct way to introduce determinants is through exterior algebra using the induced map on the highest exterior power. This is much too technical for virtually anyone who's just learning it, unfortunately. So different authors will pick random bits and pieces of the true picture that they think are sufficiently palatable to their audience.

But I can easily give you a flavor for what's going on and why inversions show up naturally, if you're willing to take a bit on faith.

Suppose $\vec{u}, \vec{v}$ are 2D vectors. Let $f(\vec{u}, \vec{v})$ be the area of the parallelogram they determine. Imagine replacing $\vec{u}$ with $t\vec{u}$ for a scalar $t$ which varies from $1$ to $-1$. We have $f(t\vec{u}, \vec{v}) = |t|f(\vec{u}, \vec{v})$. That absolute value sign is a bit strange, though--it prevents the function from being smooth! It feels like maybe when $t$ passes through zero, we should just use a "negative area". This ends up being the correct choice. In 3D, the notion of "orientation" ends up being extremely natural if you do anything with, say, computer graphics. So we effectively just get rid of the absolute value sign and introduce a signed area. More generally, we'd be interested in the signed hypervolume of the $n$-dimensional parallelogram determined by $n$ vectors in $n$-dimensional space.

In 2D, if you play around with it, you'll find any reasonable signed area function $A(\vec{u}, \vec{v})$ must satisfy at least three properties:

Scaling: $A(c\vec{u}, \vec{v}) = cA(\vec{u}, \vec{v})$
Linearity: $A(\vec{u}_1 + \vec{u}_2, \vec{v}) = A(\vec{u}_1, \vec{v}) + A(\vec{u}_2, \vec{v})$
Alternating: $A(\vec{u}, \vec{v}) = -A(\vec{v}, \vec{u})$

Note that (3) says $A(\vec{u}, \vec{u}) = 0$, which is obvious from the area interpretation (whew!).

Ok, what if we had the coordinates of $\vec{u}$ and $\vec{v}$ in terms of the standard basis vectors--what would $A$ be in those coordinates? That is, suppose $\vec{u} = a_{11} \vec{e}_1 + a_{21} \vec{e}_2$, $\vec{v} = a_{12} \vec{e}_1 + a_{22} \vec{e}_2$. Liberally using properties (1)-(3), we compute:

\begin{align*} A(\vec{u}, \vec{v}) &= A(a_{11} \vec{e}_1 + a_{21} \vec{e}_2, a_{12} \vec{e}_1 + a_{22} \vec{e}_2) \\ &= a_{11} a_{21} A(\vec{e}_1, \vec{e}_1) + a_{11} a_{22} A(\vec{e}_1, \vec{e}_2) + a_{21} a_{12} A(\vec{e}_2, \vec{e}_1) + a_{21} a_{22} A(\vec{e}_2, \vec{e}_2) \\ &= (a_{11} a_{22} - a_{12} a_{21}) A(\vec{e}_1, \vec{e}_2) \\ &= (a_{11} a_{22} - a_{12} a_{21}). \end{align*}

This is exactly the determinant of the 2x2 matrix listing the coordinates of $\vec{u}$ and $\vec{v}$ in its columns.

You can play the same game with $n \times n$ matrices. You'll see quickly that the resulting expression will be a sum over permutations, and the only question will be what sign to use. The inversion number is simply the number of swaps needed to straighten out the relevant term, so it's got the right parity!

Ok, but existence of a function $A$ satisfying (1)-(3) isn't necessarily clear. To prove it rigorously, you reverse the whole thing, first defining inversion numbers and studying their basic properties, then using the Laplace expansion formula to define the determinant, then you show it actually satisfies properties (1)-(3). Or you could do a higher-tech version of the same thing by introducing the exterior algebra. But at some point you're going to have to show that the $n$th exterior product of an $n$-dimensional vector space is $1$-dimensional (and not $0$-dimensional), which will require some sort of construction like this no matter what.

*(Hah!)

Thank you for the detailed anwser. This is very helpful for a beginner like me to have a litte bit idea on what I am actually doing when reading the book :) — P'bD_KU7B2, May 20 '21 at 21:26
I agree with this. The determinant is best introduced as a way to measure how much a linear map “distorts” the area of a square or volume of a cube. After that it’s easy to understand its role in solving a linear system of equations. — Deane, May 20 '21 at 22:36

Proof and Intuition of the Determinant Formula?

1 Answers1

Linked