5

By the "volume" of "parallelepiped", I mean the Lebesgue measure of n-Parallelotope.

If I have $$\vec{v_i}=\begin{bmatrix}a_{1i} \\ a_{2i} \\ \vdots \\ a_{ni}\end{bmatrix} \qquad \text{ for } i\in\{1,2,3\ldots,n\}$$ and $$\mathbf A=\begin{bmatrix}\vec{v_1} & \vec{v_2} & \cdots & \vec{v_n}\end{bmatrix}=\begin{bmatrix}a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn}\end{bmatrix}$$ Now there are two ways to define the determinant.

Definition 1: If $\mathbf C_{ij}$ is the cofactor, then $$\det(\mathbf A)=\sum_{k=1}^{n}a_{ik}C_{ik}=\sum_{k=1}^{n}a_{kj}C_{kj} \qquad \text{for any } i,j\in\{1,2,3,\ldots,n\}$$ Definition 2: $\det(\mathbf A)$ is the Lebesgue measure of the fundamental n-Parallelotope spanned by the the column vectors $\vec{v_i}\in\mathbb R^n$.

How do I prove that the two definitions are equivalent? I personally like definition 2 because, I can visualize it but in definition 1, first we need to show that the summations are giving same values for all i and j.

I can use the second definition definition for n=2, first thing I noted was that column operations does not change the area of the parallelogram because of simple geometry properties. Thus, $$\begin{vmatrix}a & c\\ b & d\end{vmatrix}=\begin{vmatrix}a & c-a\frac{c}{a}\\ b & d-b\frac{c}{a}\end{vmatrix}=\begin{vmatrix}a & 0\\ b & \frac{ad-bc}{a}\end{vmatrix}=\begin{vmatrix}a-0\frac{ab}{ad-bc} & 0\\ b-\frac{ad-bc}{a}\frac{ab}{ad-bc} & \frac{ad-bc}{a}\end{vmatrix}=\begin{vmatrix}a & 0\\ 0 & \frac{ad-bc}{a}\end{vmatrix}$$ This turns it into a rectangle whose area can be calculated easily. $$\begin{vmatrix}a & c\\ b & d\end{vmatrix}=ad-bc$$ But this is the same as we get from definition 1. Thus both definitions are equivalent for n=2.

The argument in finding determinant for n=2 by the second definition generalizes easily, but the method of computation feels completely different as compared to definition 1. Like for n=3, I got $$\begin{vmatrix}a&d&g\\b&e&h\\c&f&i\end{vmatrix}=\begin{vmatrix}\frac{a(ei-hf)-d(bi-ch)-g(ec-bf)}{ei-hf}&0&0\\0&\frac{ei-hf}{i}&0\\0&0&i\end{vmatrix}=a(ei-hf)-d(bi-ch)+g(bf-ec)$$ I can see a little bit connection for $i=1$ in definition 1.

I got to know that definition 1 is called Laplace Expansion, but the proof written on Wikipedia went above my tiny brain. I am in 11th grade and I know very little about Linear algebra(I know only the stuff Grant Sanderson told in his essence of LA playlist). After reading the answer Determinant of transpose I can make sense why row operations do not change the determinant as well. I would be really happy if someone proves definition 1 using definition 2.

mathreadler
  • 25,824
  • 1
    Related: https://math.stackexchange.com/questions/427528/why-determinant-is-volume-of-parallelepiped-in-any-dimensions – Hans Lundmark Jan 31 '18 at 20:39
  • This is definitely not trivial, but https://tartarus.org/gareth/maths/Linear_Algebra/determinants.pdf may help. – Patrick Stevens Jan 31 '18 at 20:52
  • @HansLundmark My question is why the 'computation' of determinant is related to the volume and not why the determinant is related to the volume. I was able answer the latter myself, but not the first one. –  Jan 31 '18 at 21:20
  • @PatrickStevens If they were defining the determinant as D(a,b,c) assuming D(e1,e2,e3)=1 then how did they use the fact that $\det(A)=\sum_{k=1}^n a_{1k}C_{1k}$. I know that can be proved because they prove D(a,b,c)=$\sum \pm a_i b_j c_k$. But I don't want verification, I want to know if there is something deep relating $\det(A)$ and $\sum_{k=1}^n a_{1k}C_{1k}$ –  Jan 31 '18 at 21:26
  • @DevanshSehta: I'm not sure I understand the difference. Anyway, your second definition is not quite correct, since the determinant is the signed volume (which implies that it has the usual multilinearity and antisymmetry properties, which leads to the “sum over all permutations” formula, which in turn implies the cofactor expansion formulas). – Hans Lundmark Feb 01 '18 at 07:59

3 Answers3

4

$\def\vect{\mathbf} \DeclareMathOperator{\Mat}{\rm{Mat}} \newcommand{\vol}{{\rm{vol}}} \newcommand\sign{{\rm{sign}}} $

Since you say you haven't yet studied linear algebra, you may have to take this explanation as a study program for the (near) future rather than something you can take in at one go.

Step 1. Forget both of your definitions of determinant. The determinant is something intrinsically associated with matrices (or with linear transformations) and is determined by certain properties. We regard a determinant alternatively as a function of matrices or as a function of a sequence of $n$ column vectors in $\mathbb R^n$. We go back and forth between $n$-by-$n$ matrices and sequences of column vectors in the obvious way. The defining properties:

(1) $\det(v_1, v_2, \dots, v_n)$ is a multilinear function of the the $n$ vector variables. That is, it is linear in each vector variable separately.

(2) $\det(v_1, v_2, \dots, v_n)$ is alternating; that means if you switch any two vector variables, the result changes by a factor of $-1$

(3) Normalization: $\det(E) = 1$, where $E$ is the identity matrix.

The last property says for the standard unit vectors $\vect{e}_1, \dots, \vect{e}_n$, $\det(\vect{e}_1, \dots, \vect{e}_n) = 1$.

$\DeclareMathOperator{\Mat}{\rm{Mat}}$

Theorem: There is a unique function $\det$ on $\Mat_n(\mathbb R)$ satisfying the three properties listed above. Moreover \begin{equation} \det(A) = \sum_{\sigma \in S_n} \epsilon(\sigma) a_{1, \sigma(1)} a_{2, \sigma(2) }\cdots a_{n, \sigma(n)}. \tag{S} \end{equation}

In the theorem statement, the sum is over the symmetric group $S_n$.

Now using only the three defining properties, one can show the following. Let $A$ be an $n$-by-$n$ matrix, and define $A_{i, j}$ to be the matrix obtained by striking out the $i$--th row and $j$--th column of $A$, so $A_{i, j}$ is $(n-1)$-by-$(n-1)$. Let $\mathcal C(A)$ be the matrix whose $(i, j)$ entry is $(-1)^{i+j} \det(A_{i, j})$. Then \begin{equation} A \mathcal C(A)^t = \mathcal C(A)^t A = \det(A) E. \tag{L} \end{equation} This statement encompasses all the Laplace expansions of $\det(A)$ and some orthogonality relations as well. So don't take the Laplace expansion as a definition but as a consequence of the intrinsic definition.

Some other will known properties of determinants that will be used are:

  • $\det(AB) = \det(A) \det(B)$

  • $\det(A^t) = \det(A)$

These can be derived using the defining properties or the summation formula (S).

Step 2. We need a working definition of volume and signed volume. We want to define for $1 \le r \le n$ the (non-negative) $r$ dimensional volume of the parallelepiped spanned by $r$ vectors $v_1, \dots, v_r$ in $\mathbb R^n$, denoted $|\vol_r|(v_1, \dots, v_r)$. We start with $|\vol_1|(v_1) = ||v_1||$. Supposing that $r \ge 2$ and that $r-1$ dimensional volume has been defined, we do the following. If $v_1, \dots, v_r$ are linearly dependent, define $|\vol_r|(v_1, \dots, v_r) = 0$. Otherwise, apply the Gram-Schmidt procedure to the sequence of vectors $v_1, \dots, v_r$ to get an orthonormal basis $\vect f_1, \dots, \vect f_r$ of the subspace $M$ spanned by $v_1, \dots, v_r$. Note that the dot product $(v_r, \vect f_r)$ is positive. In fact, it is the length of the projection of $v_r$ onto the orthogonal complement in $M$ of the span of $v_1, \dots, v_{r-1}$ (exercise). Define $$ |\vol_r|(v_1, \dots, v_r) = (v_r, \vect f_r) \cdot |\vol_{r-1}|(v_1, \dots, v_{r-1}) . $$ We write $|\vol|$ for $|\vol_n|$. This is the procedure we know from elementary mathematics: we take the $(r-1)$ dimensional volume of the base and multiply it by the one dimensional altitude of the parallelepiped.

It is not manifest that the result is independent of the order in which the vectors $v_1, \dots, v_r$ are listed. This will emerge later.

Finally we can define the $n$ dimensional signed volume $\vol(v_1, \dots, v_n)$ for a sequence of $n$ vectors as follows. If the vectors are linearly dependent, the answer is $0$. Otherwise,
take $$ \vol(v_1, \dots, v_n) = \sign(\det(v_1, \dots, v_n)) |\vol|(v_1, \dots, v_n). $$

This is related to a notion of orientation. A ordered basis of $\mathbb R^n$ is said to be positively oriented if $\det(v_1, \dots, v_n)$ is positive and negatively oriented otherwise. So the signed volume is positive if the basis is positively oriented and negative if the basis is negatively oriented. In dimension $3$ orientation can be described in terms of a familiar right hand rule.

Step 3. Let's discuss a little about orthogonal matrices. A matrix $U$ is orthogonal if $U^t U = U U^t = E$. This is so if and only if $U$ preserves all dot products $(U u, Uv) = (u, v)$ for all $u, v$. Also if and only if the columns of $U$ form an orthonormal basis of $\mathbb R^n$. An orthogonal matrix has determinant equal to $\pm 1$ because $$\det(U)^2 = \det(U^t) \det (U) = \det(U^t U) = \det(E) = 1.$$

Orthogonal matrices with determinant 1 are called special orthogonal matrices.

Observation: If $U$ is an orthogonal matrix, then for any $r \le n$ and any $v_1, \dots, v_r$, $$ |\vol_r|(Uv_1, \dots, Uv_r) = |\vol_r|(v_1, \dots, v_r). $$ Moreover, for any $v_1, \dots, v_n$, $$ \vol(Uv_1, \dots, Uv_n) = \det(U) \vol(v_1, \dots, v_n). $$ In particular if $U$ is special orthogonal, then $$ \vol(Uv_1, \dots, Uv_n) = \vol(v_1, \dots, v_n). $$

Proof: follows from the definitions because orthogonal matrices preserve inner products, and because $$\det(U v_1, \dots, U v_n) = \det (U) \det(v_1, \dots, v_n).$$

Step 4. Let's take some sequence of vectors $(b_1, \dots, b_n)$ in special position and compare $\det(b_1, \dots, b_n)$ and $\vol(b_1, \dots, b_n)$. The special assumption is $b_1, \dots, b_{n-1}$ have zero last coordinate, so lie in the span of $\vect e_1, \dots, \vect e_{n-1}$, or equivalently are perpendicular to $\vect{e}_n$. Let $B = (b_1, \dots, b_n)$. The last row of $b$ has non-zero $(n, n)$ entry, $b_{n, n} $ and the rest of the entries are zero. $B$ is thus block triangular: $$ B = \begin{bmatrix} B_{n, n} & * \\ 0 & b_{n, n} \end{bmatrix}. $$
where $B_{n,n}$ is $(n-1)$--by--$(n-1)$, $*$ indicates a $(n-1)$--by--$1$ column, and $0$ a $1$--by--$(n-1)$ row of zeros.

In this special situation,
$$ \det B = \det(b_1, \dots, b_n) = b_{n,n} \det (B_{n,n}), $$ and this is the Laplace expansion according to the last column.

Now consider the computation of volume and signed volume. We have $$ |\vol_{n-1}|(b_1, \dots, b_{n-1}) = |\vol_{n-1}|(B_{n, n}), $$ as follows from the definitions. Since the span of $b_1, \dots, b_{n-1}$ is the same span of $\vect e_1, \dots, \vect e_{n-1}$, the final Gram-Schmidt vector $\vect f_n$ entering into the computation of $|\vol_n|(b_1, \dots, b_n)$ is necessarily $(\pm 1) \vect e_n$, and $(b_n, \vect f_n)$ is $|b_{n,n}|$. Thus $$ |\vol_n|(b_1, \dots, b_n) = |b_{n, n}| \, |\vol_{n-1}|(B_{n, n}). $$ By induction on dimension, we may assume $$ \det(B_{n, n}) = \vol_{n-1} (B_{n, n} ) \quad \text{and therefore} \quad |\det(B_{n, n})| = |\vol_{n-1} |(B_{n, n} ). $$ Substituting, $$ |\vol_n|(b_1, \dots, b_n) = |b_{n, n}| \, |\vol_{n-1}|(B_{n, n}) = |b_{n, n}| \, |\det(B_{n, n})| = |\det(B)|. $$ But then, \begin{align*} \vol(b_1, \dots, b_n) &= \sign(\det(b_1, \dots, b_n)) |\vol_n|(b_1, \dots, b_n) \\ &= \sign(\det(b_1, \dots, b_n)) |\det(b_1, \dots, b_n)| \\ &= \det(b_1, \dots, b_n). \end{align*}

Step 5. Let's reduce the general problem to the special case just considered. Lets start with $A = (a_1, \dots, a_n)$. Take any special orthogonal matrix $U$. Then we already know that $$ \det U A = \det(U a_1, \dots, U a_n) = \det U \det A = \det A, $$ and $$ \text{vol}( U A) = \text{vol}(U a_1, \dots, U a_n) = \text{vol}(a_1, \dots, a_n) = \text{vol}(A). $$ Now it is always possible to find a special orthogonal matrix such that $(U a_1, \dots, U a_{n-1})$ is a system of vectors in the coordinate hyperplane plane orthogonal to $\vect{e}_n$, and for such a choice, $\text{vol}( U A) = \det (UA)$ by Step 4. Thus for our original $A = (a_1, \dots, a_n)$, $$ \text{vol}(A) = \text{vol}(UA) = \det(UA) = \det A. $$

Step 6. So far we have shown that $\text{vol}(A) = \det A$ for any $A = (v_1, \dots, v_n)$, but we haven't emphasized the Laplace expansions. You particularly wanted the Laplace expansions to be naturally interpreted as signed volumes. In some sense, there is nothing to show, because we know that the determinant is calculated by Laplace expansions and is on the other hand equal to volume. But we do get for free a geometric interpretation of the Laplace expansion. Let's take a particular expansion, that along the first column: $$ \text{vol}(A) = \det(A) = \sum_i a_{i, 1} (-1)^i \det(A_{i, 1}) = (v_1, C), $$ where $C$ is the vector whose $i$-th coordinate is $ (-1)^i \det(A_{i, 1})$. If we keep $v_2, \dots, v_n$ and replace $v_1$ by any vector $w$ in the hyperplane spanned by $v_2, \dots, v_n$, we get $$ 0 = \det ( w, v_2, \dots, v_n) = (w, C). $$ which means that the vector $C$ is perpendicular to the hyperplane spanned by $v_2, \dots, v_n$. If we keep $v_2, \dots, v_n$ and replace $v_1$ by $C$, we get $$ \det ( C, v_2, \dots, v_n) = (C, C) = ||C||^2 > 0, $$ which means that $(C, v_2, \dots, v_n)$ is positively oriented. Now if we replace $v_1$ by the unit vector $u = C/||C||$, we get $$ \det ( u, v_2, \dots, v_n) = (u, C) = ||C||. $$ But since $u$ is perpendicular to the hyperplane spanned by $( v_2, \dots, v_n)$ and of length $1$, and $(u, v_2, \dots, v_n)$ is positively oriented, we have $$ \text{vol}(u, v_2, \dots, v_n) = \text{vol}_{n-1}(v_2, \dots, v_n) $$ the $n-1$ dimensional volume of $(v_2, \dots, v_n) $. Thus the length of $C$ is the $n-1$ dimensional volume of $(v_2, \dots, v_n) $.

To summarize, in the Laplace expansion along the first column, the vector $C$ appearing in the expansion $\det (v_1, \dots, v_n) = (v_1, C)$ is perpendicular to the hyperplane spanned by $(v_2, \dots, v_n) $, of length equal to the $n-1$ dimensional volume of $(v_2, \dots, v_n) $, and determines a positively oriented system of vectors $(C, v_2, \dots, v_n)$, and the $n$ dimensional volume $\text{vol}(v_1, \dots, v_n)$ is given by $\text{vol}(v_1, \dots, v_n) = (v_1, C)$.

fred goodman
  • 4,253
  • In some sense my very long answer is unsatisfactory because I didn't connect the "volume" of a parallelepiped with its Lebesgue measure. To do so, one has to find out how Lebesgue measure scales under linear transformations, and doing that makes most of answer superfluous. I addressed this in a question and answer here. – fred goodman Feb 03 '18 at 23:49
  • I was reading this, again. Could you please elaborate.... "Now it is always possible to find a special orthogonal matrix such that .........."(step 5) ? –  Aug 17 '18 at 12:33
3

Theorem: Given an $m$-dimensional parallelepiped, $P$, the square of the $m$-volume of $P$ is the determinant of the matrix obtained from multiplying $A$ by its transpose, where $A$ is the matrix whose rows are defined by the edges of $P$.

Proof: The proof is straightforward by induction over the number of dimensions. It is obviously true for $m=1$. Assume it's true for $m$, and consider an $(m+1)$-dimensional parallelepiped, $P$. Let the rows of $A$ be denoted $a_i$, where $i$ runs from $1$ to $m+1$. We can find $b,c\in\mathbb R^{m+1}$ such that $a_1=b+c$, $b$ is orthogonal to the set $S=\{a_2,\ldots, a_m+1\}$, and $c$ falls in $span(S)$. Let $B$ be the matrix formed by replacing row $a_1$ with the vector $b$. As there are elementary matrices $E_1,\cdots E_k$ such that $A=E_1\cdots E_kB$, we have that $det(A^TA)=det(BB^T)$.

Let $C$ be the matrix obtained by removing the first row of $A$. Note that $C$ represents the embedding of an $m$-dimensional parallelepiped in $\mathbb{R}^{m+1}$. Therefore we can apply our inductive hypothesis to it.

Using basic properties of the determinent that follow from your Definition 1, we have: $$\det(AA^T)=\det(BB^T)=\begin{bmatrix} bb^T & bD^T \\ Db^T & DD^T \end{bmatrix}=\begin{bmatrix} bb^T & 0 \\ 0 & DD^T \end{bmatrix}=bb^T\det(DD^T)$$

We can replace the two entries with zero, because $b$ is orthogonal to the rows of $D$ by construction. By induction, $\det(DD^T)$ is the square of the volume of one face of $P$, and by the Pythagorean theorem, $bb^T$ is the square of the length of the perpendicular side. Therefore their product is the square of the volume of the entire parallelepiped, exactly as desired.


I've proven that the two quantities are equal, but that doesn't seem to satiate you. I think this is in part because you're a little confused about the logic of the situation. It's not like you asked me to prove $P\rightarrow Q$ and instead I proved $Q\rightarrow P$. You asked me to prove that $P=Q$ and "instead" I proved $Q=P$. Except equality is symmetric, so this is a non-issue. If you have two quantities and can algebraically manipulate one to obtain the other, then they are equal. It doesn't matter which side you manipulated.

You seem to be interested in the key idea behind how you can see that co-factor expansions work, without fully expanding the whole $m\times m$ matrix. The idea is that you can replace the first row with the component that is perpendicular to the rest of the rows, and then when you expand across the first row you wind up with a formula that is the product of the side length (in the modified dimension) with the volume of the face that that side is perpendicular to. That is, up to elementary matrix equivalents co-factor expansion directly represents the usual way we find volumes of parallelepipeds: calculating the volume of the base and multiplying it by the height.

  • This answer proves that unsigned volume is the absolute value of the determinant. ( that is using definition 1 to prove definition 2). Is there something which proves definition 1 using definition 2. The above theorem is trivial if we start with the definition 2. –  Jan 31 '18 at 22:02
  • @DevanshSehta If your objection is that I’m using $1$ to prove $2$ instead of using $2$ to prove $1$, can’t you just read this proof backwards? – Stella Biderman Jan 31 '18 at 22:06
  • @DevanshSehta I can rewrite it in reverse, if that would make it easier for you to follow. But you can just do the induction the other way. – Stella Biderman Jan 31 '18 at 22:16
  • 1
    How can I see $$\begin{vmatrix}a&d&g\b&e&f\c&f&i\end{vmatrix}=a\begin{vmatrix}e&h\f&i\end{vmatrix}-d\begin{vmatrix}b&h\c&i\end{vmatrix}+g\begin{vmatrix}b&e\c&f\end{vmatrix}$$ Without expanding everything. Using that determinant is volume. –  Jan 31 '18 at 22:22
  • @DevanshSehta I've added a note at the end of my answer that hopefully addresses your concerns. If it does not, can you please make explicit what about my answer fails to satiate you and what would need to be included to make you feel better about it? – Stella Biderman Jan 31 '18 at 22:32
0

Write your matrix $A$ as a product of simpler matrices using elementary row and column operations. The absolute value of the determinant of $A$ corresponds to the volume of the parallelepiped described by $A$ because it does so when $A$ is one of the simpler matrices, and the absolute value of the determinant and volume operations are multiplicative (and the unit matrix has determinant 1 and the unit cube has volume 1).

Some algebraic details: The elementary row operations are exchanging 2 rows, multiplying a row by a nonzero constant, and adding a multiple of a row to a different row. Write them as left multiplication by simple matrices. Similarly for the elementary column operations, except write them as right multiplication by simple matrices. Reduction thus expressions the reduced matrix $B$ as a product $R_1 ... R_r A C_1 ... C_s$ where each $R_i$ is a row operation matrix and each $C_i$ is a column operation matrix. Multiply by inverse matrices to get $A = R_r^{-1} ... R_1^{-1} B C_1^{-1} ... C_s^{-1}$. Various normalizations are possible for $B$. Use the canonical one in which it is the unit matrix with all 1's after some point possibly replaced by 0's. If $B$ is the unit matrix, remove it from the expression unless that would leave no terms on the right. Otherwise, replace it by a product of degenerate row operations (multiply some rows one at a time by 0).

Each matrix corresponds to a linear transformation. Exchanging 2 rows or columns corresponds to exchanging 2 coordinates. Multiplying a row or column by a (possibly zero) constant corresponds to scaling (or zeroing) one of the coordinates. Adding a multiple of a row or column to an different row or column corresponds to a shear transformation. To reduce special cases, write nontrivial shear matrices as products of 2 coordinate-scaling matrices and 1 shear matrix with multiple 1.

Using induction, we only have to show that the operations are multiplicative for one of our simple matrix/transformations on the right multiplied by a general one. This is very easy for the determinant (using the not so good definition of the determinant as a big alternating sum) since our simple matrices are so simple. The absolute value of the determinant is multiplicative since the determinant itself is.

Note that for a scaling matrix with scale factor $c$, the determinant of the matrix is $c$ but the scale factor for the volume is $|c|$. Exchange operations negate the determinant but don't change the volume. We have to take the absolute value of the determinant to make the factors equal at all stages.

In 2 dimensions, "volumes" are areas and are given by the formulas from elementary geometry for the area of a rectangle and a parallelogram, but it is not easy to see that Lebesgue measure is the same as the fuzzy or axiomatic area of elementary geometry, so these formulas shouldn't be used. Instead, prove them directly from properties of Lebesgue measure (approximate regions by unions of small squares of the same size with sides parallel to the axes. Use translation invariance of Lebesgue measure to prove the formula for the area of a square. Think of the squares as paving stones or water molecules). Given these formulas, it is obvious that the volumes scale as claimed for squares. For the unit square, exchanging coordinates doesn't change its image. Scaling one coordinate turns squares into rectangles (or lines when the scale factor is 0). Shear transformations turn squares into rhombuses of the same size. If the image is a line, then it has volume 0 which equals the determinant. (Then the matrix and region are degenerate, and this case can be eliminated earlier, but it is a good exercise to think about approximating the measure 0 sets for this case either by null coverings of the interior or coverings a little larger than the region.)

In higher dimensions, adapt the methods used to prove the formulas in 2 dimensions. Use induction. Initially, the partial product is the identity matrix and the corresponding region is the unit cube. Each stage transforms the region into a more general parallelepiped and approximates it by a union of smaller hypercubes with sides parallel to the axes. The simple transformation for the next stage changes coordinates in at most 2 dimensions, so is no harder to handle than in the 2-dimensional case.

Notes: We used (hyper)cubes to minimize complications. We made them have sides parallel to the axes for the same reason. It is not obvious that a rotated unit cube has volume 1 even in 2 dimensions. It is obvious that a rotating a ball doesn't change its volume (it doesn't change it at all), but approximating regions as unions of balls is not as easy as with cubes -- there would be gaps that would have to be filled in with smaller balls, while with cubes we just have to fill in the interior of the region to nearly the edges. The easiest way that I know of to show that rotations preserve volumes is to show that their matrix has determinant one and then use the correspondence proved here.