It will be easier if it is first proved that the determinant of a square matrix can be computed by expansion about any row or column; see this for the details.
I will use the $\rho$-notation which is defined in the above answer.
Here are some facts about the $\rho$-notation:
- Suppose that $x$ and $y$ are distinct integers. $\rho(x, y) = 0$ if and only if $\operatorname{sgn} {(y - x)} = 1$; $\rho(x, y) = 1$ if and only if $\operatorname{sgn} {(y - x)} = -1$.
- Suppose that $c_1$, $c_2$, $\dots$, $c_n$ are $n$ distinct positive integers. Put
$\tau (c_1, c_2, \dots, c_n)
= \sum_{1 \leq i < j \leq n}
{\rho (c_i, c_j)}$.
Then
$$
s(c_1, c_2, \dots, c_n) = (-1)^{\tau (c_1, c_2, \dots, c_n)}.
$$
- Suppose that $a = (1, 2, \dots, n)$ is a list, consisting of the first $n$ positive integers. Suppose that $b$ is the list obtained from $a$ by removing $k$ distinct positive integers $i_1$, $i_2$, $\dots$, $i_k$ less than or equal to $n$. Suppose that $x$ is a member in $b$ (which means that $x$ is not equal to none of $i_1$, $i_2$, $\dots$, $i_k$). Then the position of $x$ in list $b$ is
$$
x - \rho(x, i_1) - \rho(x, i_2) - \dots - \rho(x, i_k).
$$
- Suppose that $c_1$, $c_2$, $\dots$, $c_n$ are $n$ distinct positive integers less than or equal to $n$. Then
$$
c_n - \rho(c_n, c_1) - \rho(c_n, c_2) - \dots - \rho(c_n, c_{n-1}) = 1.
$$
I will prove only the first theorem; the proof of the second one is very similar to that of the first one, which I leave as an exercise.
Expanding about column $j_1$ yields
$$
\det {(A)}
= \sum_{1 \leq i_1 \leq n}
{(-1)^{i_1 + j_1} [A]_{i_1,j_1} \det {(A(i_1|j_1))}}.
$$
We note that column $j_2$ of $A$ corresponds to column $j_2 - \rho(j_2, j_1)$ of $A(i_1|j_1)$, that row $i_2$ of $A$ corresponds to row $i_2 - \rho(i_2, i_1)$ of $A(i_1|j_1)$ and that
$$
i_2 - \rho(i_2, i_1) + j_2 - \rho(j_2, j_1)
= i_2 + j_2 + \rho(i_1, i_2) + \rho(j_1, j_2) - 2,
$$
so
$$
\det {(A(i_1|j_1))}
=
\sum_{\substack{
1 \leq i_2 \leq n \\
i_2 \neq i_1
}}
{(-1)^{i_2 + j_2 + \rho(i_1, i_2) + \rho(j_1, j_2)}
[A]_{i_2,j_2} \det {(A({i_1,i_2}|{j_1,j_2}))}}.
$$
⋯
We note that column $j_k$ of $A$ corresponds to column $j_k - \rho(j_k, j_1) - \dots - \rho(j_k, j_{k-1})$ of $A({i_1,\dots,i_{k-1}}|{j_1,\dots,j_{k-1}})$, that row $i_k$ of $A$ corresponds to row $i_k - \rho(i_k, i_1) - \dots - \rho(i_k, i_{k-1})$ of $A({i_1,\dots,i_{k-1}}|{j_1,\dots,j_{k-1}})$, and that
$$
\begin{aligned}
&
i_k - \rho(i_k, i_1) - \dots - \rho(i_k, i_{k-1})
+ j_k - \rho(j_k, j_1) - \dots - \rho(j_k, j_{k-1})
\\
= {} &
i_k + j_k
+ \rho(i_1, i_k) + \dots + \rho(i_{k-1}, i_k)
+ \rho(j_1, j_k) + \dots + \rho(j_{k-1}, j_k)
- 2(k-1),
\end{aligned}
$$
so
$$
\begin{aligned}
& \det {(A({i_1,\dots,i_{k-1}}|{j_1,\dots,j_{k-1}}))}
\\
= {} &
\sum_{\substack{
1 \leq i_k \leq n \\
i_k \neq i_1, \dots, i_{k-1}
}}
{(-1)^{i_k + j_k
+ \rho(i_1, i_k) + \dots + \rho(i_{k-1}, i_k)
+ \rho(j_1, j_k) + \dots
+ \rho(j_{k-1}, j_k)}}
[A]_{i_k,j_k}
\det {(A({i_1,\dots,i_{k}}|{j_1,\dots,j_{k}}))}.
\end{aligned}
$$
⋯
Continuing in this fashion, we yield
$$
\begin{aligned}
&
\det {(A({i_1,\dots,i_{n-1}}|{j_1,\dots,j_{n-1}}))}
\\
= {} &
\sum_{\substack{
1 \leq i_n \leq n \\
i_n \neq i_1, \dots, i_{n-1}
}}
{(-1)^{i_n + j_n
+ \rho(i_1, i_n) + \dots + \rho(i_{n-1}, i_n)
+ \rho(j_1, j_n) + \dots + \rho(j_{n-1}, j_n)}
[A]_{i_n,j_n}},
\end{aligned}
$$
in which the fact is used that
$$
\begin{align*}
& 1 + 1
\\
= {} &
i_n - \rho(i_n, i_1) - \dots - \rho(i_n, i_{n-1})
+ j_n - \rho(j_n, j_1) - \dots - \rho(j_n, j_{n-1})
\\
= {} &
i_n + j_n
+ \rho(i_1, i_n) + \dots + \rho(i_{n-1}, i_n)
+ \rho(j_1, j_n) + \dots + \rho(j_{n-1}, j_n)
- 2(n-1).
\end{align*}
$$
After some backward substitutions, we get
$$
\begin{aligned}
& \det {(A)}
\\
= {} &
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{(-1)^{i_1 + j_1 + i_2 + j_2 + \dots + i_n + j_n
+ \tau(i_1, i_2, \dots, i_n)
+ \tau(j_1, j_2, \dots, j_n)}}
[A]_{i_1,j_1} [A]_{i_2,j_2} \dots [A]_{i_n,j_n}.
\end{aligned}
$$
Since
$$
\begin{aligned}
i_1 + j_1 + i_2 + j_2 + \dots + i_n + j_n
= 2(1 + 2 + \dots + n)
\end{aligned}
$$
is even,
$(-1)^{\tau (i_1, i_2, \dots, i_n)} = s(i_1, i_2, \dots, i_n)$
and
$(-1)^{\tau (j_1, j_2, \dots, j_n)} = s(j_1, j_2, \dots, j_n)$,
we have
$$
\det {(A)}
=
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{ s(i_1, i_2, \dots, i_n)\,
s(j_1, j_2, \dots, j_n)\,
[A]_{i_1,j_1} [A]_{i_2,j_2} \dots [A]_{i_n,j_n}}.
$$
(Note that if $j_1$, $j_2$, $\dots$, $j_n$ are taken to be $1$, $2$, $\dots$, $n$, respectively, we will only have to use the definition of determinants.)
There is an interesting by-product, which is worth mentioning.
Suppose that $a_1$, $a_2$, $\dots$, $a_n$ are $n$ matrices of size $m \times 1$. Define $[a_1, a_2, \dots, a_n]$ to be the $m \times n$ matrix $A$ which has the property that $[A]_{i,j} = [a_j]_{i,1}$. This piece of notation enables us to express a matrix via its columns.
Corollary.
Suppose that $a_1$, $a_2$, $\dots$, $a_n$ are $n$ matrices of size $n \times 1$. Suppose that $k_1$, $k_2$, $\dots$, $k_n$ are $n$ distinct positive integers less than or equal to $n$. Then
$$
\det {[a_{k_1}, a_{k_2}, \dots, a_{k_n}]}
= s(k_1, k_2, \dots, k_n) \det {[a_1, a_2, \dots, a_n]}.
$$
Proof.
Suppose that
$$
A =[a_1, a_2, \dots, a_n]
$$
and that
$$
B = [a_{k_1}, a_{k_2}, \dots, a_{k_n}].
$$
It is not hard to see that
$$
[B]_{i,j} = [a_{k_j}]_{i,1} = [A]_{i,k_j}.
$$
We apply Theorem 1 to $B$, with $j_1$, $j_2$, $\dots$, $j_n$ taken to be $1$, $2$, $\dots$, $n$, respectively, yielding
$$
\begin{aligned}
& \det {(B)}
\\
= {} &
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{s(i_1, i_2, \dots, i_n)\,
s(1, 2, \dots, n)\,
[B]_{i_1,1} [B]_{i_2,2} \dots [B]_{i_n,n}}
\\
= {} &
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{s(i_1, i_2, \dots, i_n)\,
[B]_{i_1,1} [B]_{i_2,2} \dots [B]_{i_n,n}}.
\end{aligned}
$$
We apply Theorem 1 to $A$, with $j_1$, $j_2$, $\dots$, $j_n$ taken to be $k_1$, $k_2$, $\dots$, $k_n$, respectively, yielding
$$
\begin{aligned}
& \det {(A)}
\\
= {} &
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{s(i_1, i_2, \dots, i_n)\,
s(k_1, k_2, \dots, k_n)\,
[A]_{i_1,k_1} [A]_{i_2,k_2} \dots [A]_{i_n,k_n}}
\\
= {} &
s(k_1, k_2, \dots, k_n)
\sum_{\substack{
1 \leq i_1, i_2, \dots, i_n \leq n \\
i_1, i_2, \dots, i_n\,\text{are distinct}
}}
{s(i_1, i_2, \dots, i_n)\,
[B]_{i_1,1} [B]_{i_2,2} \dots [B]_{i_n,n}}
\\
= {} &
s(k_1, k_2, \dots, k_n)
\det {(B)}.
\end{aligned}
$$
Since $s(k_1, k_2, \dots, k_n)$ is either $1$ or $-1$, we have
$$
\det {(B)} = s(k_1, k_2, \dots, k_n) \det {(A)}.
$$
(Note that the corollary holds even though two of $k_1$, $k_2$, $\dots$, $k_n$ are equal, because of the so-called "alternating property" of determinants.)
The formulae are usually derived using some properties of permutations, but we have proved them by using the Laplace expansion instead. The recursive definition might be powerful than one thought that it was.