5

For a transformation $A \in \mathbb{R}^{n\times m}$ what exactly is the geometric interpretation of the transformation $A^TA$. If I understand it correctly the entries of $A^TA$ are the inner products or the columns of $A$ but how exactly should I interpret this geometrically as a linear transformation? And why is $A^TA$ often loosely called squaring the matrix, how does having the pairwise inner products (which normally are interpreted as projecting one vector on the other) yield us something close to a matrix squared?

One thing I've noticed is that using the SVD for a real matrix $A$ we get $A = UDV^T \leftrightarrow A^TA=VD^TU^TUDV^T=VD^TDV^T$ where $U$ and $V$ are orthogonal, but how does changing basis to $V$ and scaling by the eigenvalues squared relate to the concept above?

  • One consequence of your observations regarding the singular value decomposition is that the eigenvalues of $A^TA$ (which are also its singular values) are the squares of the singular values of $A$. – Ben Grossmann Apr 23 '22 at 21:08
  • The polar decomposition is a really nice way to understand svd, thanks for that! This means $A^TA$ is the same as applying $P^2$ and ignoring $U$ which would map the vectors to a different dimension, this also maps perfectly with $VD^TDV^T$. One last thing I'm still confused about is the first part of my question, what does all of this have to do with the inner products of our basis vectors? – jonithani123 Apr 23 '22 at 22:55
  • 1
    By $A = U D V^T$, I think you are referring to the Singular Value Decomposition (SVD) of $A$. You may edit your question and add this info - along with the detail that $U$ and $V$ are orthogonal matrices. – Dr. Sundar Apr 24 '22 at 09:58
  • 2
    An intuitive way to understand this matrix $A^T A$ is as representing the pullback of the Euclidean scalar product by $A$. This scalar product is $\langle A x, A y \rangle$. SVD expresses the matrix $A$ as a diagonal matrix by choosing correctly the orthonormal basis of the source and target, and your observation is intuitively explained by this coordinate-free point of view. – Dabouliplop Apr 24 '22 at 10:00
  • 1
    What I mean is that this matrix should probably be thought of as representing a bilinear form and not a linear transformation. – Dabouliplop Apr 24 '22 at 10:02
  • If I recall correctly, SVD is more or less the same thing as ortho-diagonalizing this bilinear form $A^T A$, right? – Dabouliplop Apr 24 '22 at 10:08
  • can you give me a link or explain in a short form what pullback means in the context of the scalar product, I'm sorry but I've never really deeply studied mathematics, just some intro courses for my CS degree. – jonithani123 Apr 24 '22 at 10:23
  • Do you know what is a bilinear form? If you have a linear map $f : V→W$ between two vector spaces and if $b(-,-)$ is a bilinear form on $W$, you can "pull it back to $V$" via $f$, and it gives $(x,y) ∈ V^2 ↦ b(f(x),f(y))$. Do you think this is something intuitive to do? For instance, if you look at the unit sphere in $W$, the unit sphere of the pullback of the scalar product will be an ellipsoid in $V$ (and the "principal axis" of this ellipsoid gives you the SVD of $f$). – Dabouliplop Apr 24 '22 at 10:40
  • If $V$ and $W$ are two Euclidean spaces, and if $f$ is injective, this "pullback" of the scalar product of $W$ will again be a scalar product on $V$ (do you see why?). This way, we see for instance that if $f$ is represented by the matrix $A$, then $f$ is injective if and only if $A^T A$ is non-degenerate, ie has a non-zero determinant. Visually, non-injectivity corresponds to the ellipsoid being "degenerate" and "very very long" in one direction (so that it's a cylinder instead). – Dabouliplop Apr 24 '22 at 10:41
  • Okay so the pullback essencially encodes how to get from $f(x), f(y) = Ax, Ay$ back to $x,y$? And with this comment "What I mean is that this matrix should probably be thought of as representing a bilinear form and not a linear transformation" did you mean roughly what I commented under the accepted answer, that it's more reasonable to look at $A^TA$ as encoding information about the transformation instead of looking at it as a transformation itself? Please correct me if anything I'm saying is wrong, I'm definitly not in my domain of expertise ^^ – jonithani123 Apr 24 '22 at 11:01
  • 1
    I don't really understand what would mean by "to get from $f(x),f(y) = Ax,Ay$ back to $x,y$"... But yes, I mean that $A^T A$ should not be thought of as encoding a linear transformation. It should be thought of as encoding an object called a bilinear form. – Dabouliplop Apr 24 '22 at 11:09
  • Yea I gotta be honest I don't understand what I meant by that as well, but I think I'm starting to grasp the general concept of $A^TA$ or $AA^T$ and why its used and also how this makes sense in relation to SVD thanks to the different view points in this thread, thanks so much! – jonithani123 Apr 24 '22 at 11:36

1 Answers1

4

$A^TA$ is a square matrix. As you said, its entries are the inner products of the columns of A.

The determinant of $A^TA$ will therefore be the Grammian of columns of $A$ and will be $>=0$ always. It is zero only when the columns of A are linearly dependent. To get an intuitive idea, you may think of the Grammian as the square of the volume of the parallelepiped (in $R^n$) formed by the vectors that form the columns of $A$.

If the vectors are linearly dependent, then of course all of the vectors can be written as a linear combination of each other and thus this volume is zero (It is easy to imagine this in $R^3$).

If they are orthogonal, then they are also linearly independent, and this volume is simply the product of the norms of the vectors, which agrees with the fact that $Gram(v_1, v_2, . ., v_k) \leq ||v_1||^2 . . ||v_k||^2$ where the equality holds under two conditions:

  1. If $v_1, v_2,. ., v_k$ are linearly dependent (which is trivial), or
  2. $v_1,. . ,v_k$ are orthogonal, intuitive idea behind which I've provided.
Math boi
  • 102
  • 1
    I think the gram matrix is the exact keyword I needed, I'll report back as soon as I have an understanding for it, thanks! It's beautiful to see how things come up in different contexts in mathematics – jonithani123 Apr 24 '22 at 10:14
  • 2
    Okay after reviewing this and also the comments above I conclude that $A^TA$ holds information about the transformation of $A$ about how much it stretches space squared, where the total amount can be found by taking the square root of the determinant. From what I now understand I think I will mainly look at $A^TA$ as gathers information about $A$, not as a real "transformation of space" itself. Correct me if I'm wrong and thanks for your help! – jonithani123 Apr 24 '22 at 10:40
  • 2
    To be a bit more precise, $\det(A^TA)$ is the square of the volume of the parallelpiped spanned by the columns of $A$ – Ben Grossmann Apr 24 '22 at 14:34