137

How can I prove $\operatorname{rank}A^TA=\operatorname{rank}A$ for any $A\in M_{m \times n}$?

This is an exercise in my textbook associated with orthogonal projections and Gram-Schmidt process, but I am unsure how they are relevant.

jaynp
  • 2,151

4 Answers4

204

Let $\mathbf{x} \in N(A)$ where $N(A)$ is the null space of $A$.

So, $$\begin{align} A\mathbf{x} &=\mathbf{0} \\\implies A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x} &\in N(A^TA) \end{align}$$ Hence $N(A) \subseteq N(A^TA)$.

Again let $\mathbf{x} \in N(A^TA)$

So, $$\begin{align} A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x}^TA^TA\mathbf{x} &=\mathbf{0} \\\implies (A\mathbf{x})^T(A\mathbf{x})&=\mathbf{0} \\\implies A\mathbf{x}&=\mathbf{0}\\\implies \mathbf{x} &\in N(A) \end{align}$$ Hence $N(A^TA) \subseteq N(A)$.

Therefore $$\begin{align} N(A^TA) &= N(A)\\ \implies \dim(N(A^TA)) &= \dim(N(A))\\ \implies \text{rank}(A^TA) &= \text{rank}(A)\end{align}$$

Empiricist
  • 7,933
A.D
  • 6,400
  • 1
  • 20
  • 43
24

Let $r$ be the rank of $A \in \mathbb{R}^{m \times n}$. We then have the SVD of $A$ as $$A_{m \times n} = U_{m \times r} \Sigma_{r \times r} V^T_{r \times n}$$ This gives $A^TA$ as $$A^TA = V_{n \times r} \Sigma_{r \times r}^2 V^T_{r \times n}$$ which is nothing but the SVD of $A^TA$. From this it is clear that $A^TA$ also has rank $r$. In fact the singular values of $A^TA$ are nothing but the square of the singular values of $A$.

  • 7
    Note that from Strang's textbook, it actually use the fact that, there're $r$ non-zero eigenvalues of $A^TA$ i.e. $rank(A^TA)=rank(A)$, to decide the size of $\Sigma_{r \times r}$ and prove the SVD. To avoid circular argument here it would require a different SVD proof. – Weishi Z Jan 06 '21 at 12:03
  • This proof does not have ill logic of circular argument. And to prove SVD of A also does not require any information of $A^TA$. – Kuo Mar 15 '24 at 16:06
4

Since elementary operations do not change the rank of a matrix we have $\text{rank}(A^TA) = \text{rank}(E^TA^TAE)$, where $E$ is a multiplication of several elementary operations which make $AE = [A_1, A_2]$, where $A_1$ is a column full rank matrix with $\text{rank}(A_1) = \text{rank}(A)$.

Thus we can find a matrix $P$ such that $A_1P= A_2$ and $AE = [A_1, A_1P] = A_1[I, P]$.

Thus $\text{rank}(E^TA^TAE) = \text{rank}(A_1[I, P])^T(A_1[I, P])$. In this equation, the matrices are all of full rank and the rank equals $\text{rank}(A)$, so on a real space $\text{rank}(A^TA) = \text{rank}(A)$, completing the proof.

user26857
  • 52,094
  • 2
    I cannot decipher what is said here, but it must be wrong since it never uses that the matrices are over $\Bbb R$ (or more generally an ordered field) rather than for instance over $\Bbb C$ where the result is not true. – Marc van Leeuwen Apr 27 '16 at 14:38
  • 1
    The last theorem actually implicitly uses they are over real space. Thank you for pointing that out. I have added that prerequisite into my answer. – Xiangru Lian Apr 30 '16 at 04:47
  • 3
    An alternative simple way to see it: Matrix $A^T$ may be reduced to its reduced row-echelon form, $R$, by $PA^T=R$, where $P$ is the product of a sequence of elementary matrices. So, $A^T=P^{-1}R$ and hence $$\mathrm{rank}(A^TA)=\mathrm{rank}(P^{-1}RR^T(P^{-1})^T)=\mathrm{rank}(RR^T).$$ The result then follows easily from this, since clearly $\mathrm{rank}(RR^T)=\mathrm{rank}(R)=\mathrm{rank}(A).$ – syeh_106 Jan 14 '17 at 02:07
  • 1
    @syeh_106 Why is it that $\operatorname{rank}(RR^T)=\operatorname{rank}(R)$? I'm sorry if this is too basic. – JPYamamoto Aug 17 '20 at 19:08
  • 2
    @JPYamamoto If $R^T$ is full column rank, this is clearly true: $RR^Tx=0 \Rightarrow x^TRR^Tx=\Vert R^Tx\Vert^2=0 \Rightarrow R^Tx = 0 \Rightarrow x = 0$, i.e. $RR^T$ is still full column rank. Otherwise, $R^T= [R_1^T, 0]$, where $R_1^T$ is full column rank, and it's easily verified that $\mathrm{rank}(RR^T)=\mathrm{rank}(R_1R_1^T)=\mathrm{rank}(R_1)=\mathrm{rank}(R).$ – syeh_106 Aug 19 '20 at 04:33
1

The question mentions the Gram-Schmidt process, so here's an answer using it.

Pick an orthonormal basis of $\operatorname{im} B$: $\{B v_1, \dots, B v_n\}$, using the Gram-Schmidt process. We claim $\{B^T B v_1, \dots, B^T B v_n\}$ is a basis of $\operatorname{im} B^T B$. It clearly spans $\operatorname{im} B^T B$, so we just need linear independence.

Suppose $\sum_i a_i B^T B v_i = 0$. Then for any $k$, $0 = \langle \sum_i a_i B^T B v_i, v_k \rangle = \sum_i a_i \langle B^T B v_i, v_k \rangle = \sum_i a_i \langle B v_i, B v_k \rangle = a_k$. Hence $a_k = 0$ for all $k$.

sdcvvc
  • 10,528