2
svd(T) = u sigma v

Here I understand meaning of each and every term and why SVD is important.

But I am failing to interpret this equation from Linear Algebra glasses.

When I have learnt about Linear Algebra, there was one thing common in all sources that is to view matrix is as Basis Vectors (or transformation matrix)

T v = λ u

T = transformation matrix v = some vector to be transformed after applying T

u = transformed unit vector λ = scale of transformation

But I am failing to relate this when it comes to SVD.

T = u sigma v

(Note: I am not looking for a answer what svd means and what each term means. I am looking for an answer precisely on following confusion)

To be more precise:

Represent our data as Transformation Matrix:

Our data = m*n matrix = T

Apply transformation matrix T (that is our data) on some vector:

T * some-vector = new-rotated-unit-vector * scaling-factor

we get the same effect as above, by calling 3 different transformations (rotation-scaling-rotation) denoted by:

T = U.sigma. V ....... svd

So it means, on any vector v, we can apply T (our data matrix) or 3 transformations (U.sigma.V) and we can have the same effect.

So far so good when we see above operations only from transformation perspective.

Now suddenly, we change whole perspective. It is no more a transformation perspective.

As per new perspective, sigma-matrix also have other meaning apart from scaling-matrix which suggests which axis has highest variation to project our data.

This is so confusing. We used our data as transformation matrix T and decomposed it on 3 different matrices. This is okay no issues.

Now we are saying, on one of these decomposed matrix, we can project our data (our data which we used as transformation matrix) onto that.

I am unable to match these 2 perspective and that is my problem.

LLB
  • 29
  • You might find the explanation from the wikipedia page on SVD to be helpful. – Ben Grossmann Jul 05 '20 at 08:13
  • I don't understand why exactly the idea of $Tv = \lambda u$ "takes a hit" when one learns about SVD. – Ben Grossmann Jul 05 '20 at 08:14
  • Yes.I have gone through wiki article @Omnomnomnom

    intuition I get from that is, our data which is a transformation matrix T. When we apply T on standard basis, it gives new basis (unit vectors) and lengths as (scaling factor). If scaling actor small we ignore those. And we take into account only vectors which corresponds to high scaling values and project data on that vetctors only.

    Thus we can achieve dimensionality reduction

    – LLB Jul 05 '20 at 09:50
  • 1
    Okay, so what exactly is it that is still unclear? Also, as far as visualization goes, did you notice the animation on that page? – Ben Grossmann Jul 05 '20 at 09:57
  • I have added it to description of my question. It is hard to type in comment – LLB Jul 05 '20 at 10:38
  • That's fine. The edit makes things clearer, thank you – Ben Grossmann Jul 05 '20 at 10:51
  • @Omnomnomnom can you please check question again (i have edited my question) – LLB Jul 05 '20 at 10:56
  • Another user on this forum, also had the same question. https://math.stackexchange.com/questions/1450097/geometrical-interpretations-of-svd Answer to this question also helped me. – LLB Jul 05 '20 at 15:45

1 Answers1

2

The essence of your question, as far as I understand it, is this.

Suppose that $T = U\Sigma V^T$ is an SVD of an $m \times n$ data matrix. How does the geometric interpretation of the SVD connect with the statistical interpretation of $\Sigma$, wherein the singular values suggest which axis has the highest variation to project our data?

I will assume that each column of $T$ represents a single data point. To use the rows of $T$ in this way instead, simply transpose $T$ and apply the same analysis.

First of all, we should establish how it is that the transformation $T$ "encodes" the data in question. This relationship is simple: if $e_i$ denotes the $i$th standard basis vector of $\Bbb R^n$ ($i$th column of the size $n$ identity matrix), then the $i$th data-point is $Te_i$. So, $T$ is a transformation that takes the points $e_1,e_2,\dots,e_n$, each of which lies on the unit sphere in $\Bbb R^n$, and maps them to the data points $Te_1,Te_2,\dots,Te_n$.

The SVD shows what $T$ does to the unit sphere $S^{n-1}$, and the vector $v_1$ of $V$ corresponding to the largest singular value $\sigma_1$ shows the direction in which the sphere is "stretched" to the greatest extent within the ellipsoid $T(S^{n-1})$. Because the points $T e_1,\dots, T e_n$ are the output corresponding to points on the sphere, we see that they "move along" with the sphere, and so the "spherical" cloud of points is stretched to produce the "ellipsoidal" could of points corresponding to our data. The fact that $v_1$ is the direction in which the sphere is stretched the most corresponds to the fact that $v_1$ is the direction along which we find the most variation in the data cloud.

Ben Grossmann
  • 225,327
  • I have one more related query @Omnomnomnom.

    See our data T is m*n matrix.

    Now from above explanation, it is very clear that how this matrix, T is being used as a transformation matrix (or basis vector matrix) and what it is transforming (it is transforming standard basis vector ). Problem is, now when I see T as a basis matrix, slight confusion because it's size is mn. Actually basis matrix should be a square matrix right (nn)? like standard basis but it is not.

    Then how to interpret any basis vector matrix which is non square?

    (should i create a new question for this?)

    – LLB Jul 06 '20 at 05:56
  • @user2599739 It is not true that the columns of a transformation matrix will necessarily form a basis, so I don't see the problem – Ben Grossmann Jul 06 '20 at 08:32
  • I have just confirmed this from one of the scientist. Each matrix (transformation matrix) can be interpreted as Basis Vector. When matrix is a square matrix. It is a straightforward case. But when matrix is m * n (non-square) it is a composite basis-vector matrix meaning, it transforms to m * m & n * n basis vector matrix. =Σ – LLB Jul 06 '20 at 09:31
  • This is also given in same wiki article under heading a) The columns of U and V are orthonormal bases b) Geometric meaning – LLB Jul 06 '20 at 09:34
  • @LeenaBora The columns of an invertible transformation matrix form a basis. However, it is not true that the columns of an arbitrary transformation matrix are linearly independent. $U$ and $V$ are the matrices of orthogonal transformations, which are invertible, so their columns indeed form bases. – Ben Grossmann Jul 06 '20 at 10:29
  • make sense @Omnomnomnom. Thanks for clarifying it. – LLB Jul 06 '20 at 10:50