How does the matrix after the Gram-Schmidt Orthonormalization Process relate to that before?

Question

The Gram-Schmidt Orthonormalization Process becomes very intuitive after reading this answer, but I don't know how to intuit the differences and relations between the two matrices.

After some digging into my problem, I found that this is mostly a problem about linear transformation(here and here). So my question can be divided into two parts: 1) what is the physical meaning of linear transformation; 2) why we need to transform the original matrix into an orthonormal one?

For the first one, I found that the linear transformation is amount to left multiplying a matrix, so it becomes simple(just some affine transformation in the space, parallel preserving which I think is different from the activation functions in machine learning). But how about the second part?

As the G-S basis is obtained from linear combinations of more and more vectors from the given matrix, we have a decomposition $O=TA$ where $O$ is the orthogonalized matrix and $T$ a triangular matrix. — , Dec 12 '19 at 14:12
@Michael The matrix before the process and the matrix after the process — Lerner Zhang, Dec 12 '19 at 14:14
@YvesDaoust Maybe they represent some physical meanings respectively? — Lerner Zhang, Dec 12 '19 at 14:16
Since nothing is stated, I assume you are applying Gram-Schmidt to the polynomial functions $f:[0,1]\rightarrow \mathbb{R}$ given by ${1, t, t^2}$. Now, what matrix are you talking about? — Michael, Dec 12 '19 at 14:18
@YvesDaoust I only know its two features: 1) transformations preserve lengths if and only if they are orthogonal; 2) transformations preserve angles if and only if they orthogonal. Reference: https://www.reddit.com/r/askmath/comments/783vcu/what_is_the_geometric_intuition_for_an_orthogonal/ — Lerner Zhang, Dec 12 '19 at 14:21
@YvesDaoust They are all of the lengths 1 (or normal) and orthogonal to each other — Lerner Zhang, Dec 12 '19 at 14:31
@YvesDaoust : That is the first time I have heard that "Gram-Schimdt is more often thought about as applied to a matrix." The link the asker gives does not even say anything about matrices. — Michael, Dec 12 '19 at 14:32
@YvesDaoust Any one type of matrix is enough, and I just need some concrete intuition about the difference between the matrices before and after the pross. — Lerner Zhang, Dec 12 '19 at 14:33
@LernerZhang : Asking a question involves some minimal effort in actually describing the question completely and precisely. Often, once you describe the question precisely, the answer becomes easier to understand (perhaps obvious). It is good practice to communicate your problem well. If you can communicate your problem well, then you will be "better at math" than half the world! — Michael, Dec 12 '19 at 14:35
@Michael Let me try that by updating my question first. Thanks. I will keep that in mind. — Lerner Zhang, Dec 12 '19 at 14:36
@LernerZhang Gram-Schmidt is a procedure that turns any list of vectors ${u_1, ..., u_n}$ into an orthonormal list of vectors ${v_1, ..., v_m}$ with the same span. (It turns out that $m\leq n$ necessarily). Notice that this question does not involve matrices. It is not clear what matrices you want to talk about, or why you are obsessed with matrices. The answer to your second question "why we need to transform the original matrix into an orthonormal one?" is context-dependent and has general answer "You do not need to transform every matrix into an orthonormal one." — Michael, Dec 12 '19 at 17:14

score 4 · Accepted Answer · edited Jun 12 '20 at 10:38

Asking about linear transformations in the context of Gram-Schmidt is (generally) the wrong question. That is because Gram-Schmidt is about signal representation, not about linear transformation. It deals with general vector spaces $V$. For simplicity assume $V=\mathbb{R}^k$.

We have:

Positive integers $k$ and $n$.
A general collection of $n$ vectors in $\mathbb{R}^k$ (not all of which are zero): $\{u_1, …, u_n\}$.

Define $U$ as the span of these vectors, being the set of all linear combinations of the vectors: \begin{align} U &= Span(\{u_1, ..., u_n\}) \\ &= \left\{u \in \mathbb{R}^k : u = \sum_{i=1}^n x_i u_i \quad \mbox{ for some real numbers $x_1,...,x_n$}\right\} \end{align} Notice that span $U$ does not depend on the order in which we list the vectors $\{u_1, ..., u_n\}$, we can permute the order however we like and it does not change $U$.

If we create a real-valued $k\times n$ matrix $A$ with columns equal to the $\{u_1, ..., u_n\}$ vectors, so that the first column of $A$ is the vector $u_1$, the second column of $A$ is the vector $u_2$, and so on, then $U$ is equal to the column space of $A$: $$ U = Span(\{u_1, ..., u_n\}) = \{ u : u=Ax \mbox{ for some $x \in \mathbb{R}^n$}\}$$ Permuting the columns of $A$ does not change its column space. Now every matrix $A$ happens to define a linear transformation but we do not care about this. Permuting the columns of a matrix creates different linear transformations (but we do not care about this). We only care about the subspace $U$, and about efficiently representing vectors in this subspace.

What we can do with this:

Any vector $u \in U$ can be represented by a (possibly non-unique) $n$-tuple $(x_1, …, x_n) \in \mathbb{R}^n$. Given $(x_1, …, x_n)$, we can obtain $u$ by: $$ u = \sum_{i=1}^n x_i u_i = Ax$$ The total “energy” in the vector $u$ can be obtained via: $$ ||u||^2 = \sum_{i=1}^n \sum_{j=1}^n x_i x_j u_i^Tu_j = x^TA^TAx$$ Is the tuple $(x_1, ..., x_n)$ the most efficient way of representing a vector $u \in U$? (Generally no).

Enter Gram-Schmidt

We get a procedure for taking the vectors $\{u_1, …, u_n\}$ in $\mathbb{R}^k$ (not all of which are zero) and producing an orthonormal list of vectors $\{v_1, .., v_m\}$ (where $m \leq n$) with the property that $$U=Span(u_1, .., u_m) = Span(v_1, .., v_m)$$ If we form a real-valued $k \times m$ matrix $B$ by stacking the vector $\{v_1, ..., v_m\}$ as columns, then $U$ is the column space of $B$. Now matrix $B$ is not necessarily the same size as matrix $A$, but both $A$ and $B$ have the same column space (that is the only relationship between $A$ and $B$ that we care about).

It follows that every vector $u \in U$ can now be uniquely represented by a tuple $y=(y_1, ..., y_m) \in \mathbb{R}^m$: $$ u = \sum_{i=1}^m v_i y_i = By$$ Further, the energy is easy to compute: $$ ||u||^2 = y^TB^TBy = y^Ty = ||y||^2 = \sum_{i=1}^m y_i^2$$

Thus, the only reason we care about Gram-Schmidt is that it gives us a nicer representation of vectors in the subspace $U$. At first, that seems to diminish the value of Gram-Schmidt. Not at all: The reason you hear about Gram-Schmidt is that it is very important to give nice representations of things.

Now if we happen to start with a linear transformation $T:V\rightarrow W$ for some vector spaces $V, W$, then it may make sense to try to represent this transformation efficiently, so we might want to find an orthonormal basis for $V$, and another for $W$, then represent the transformation $T$ by how it transforms basis vectors in $V$ to linear combinations of basis vectors in $W$, and we can define the matrix of $T$ with respect to this, and so on.

I have answered this question against my better judgment: The question has not yet been precisely formulated, the asker did not specify a clear context or specify what "matrix" was of interest. The risk is that the asker will say "no that is not what I mean, what I really mean is [enter some new information that the asker did not bother to specify before]." On the other hand, I assume the asker does not really know what question they want to ask, so, I have provided some context myself. — Michael, Dec 12 '19 at 18:47
You can always post the question you thought you were answering to, and then re-post the answer above as an answer to that question. Your effort and work do not need to be wasted. — Rodrigo de Azevedo, Dec 13 '19 at 10:36
Tangential comment, but a popular way to solve $Ax=b$ is to first factor $A$ as $A = QR$, where $Q$ is orthogonal and $R$ is upper triangular. Then we can easily compute $x = R^{-1} Q^T b$. This is why we often think of performing Gram-Schmidt on the columns of a matrix $A$. — littleO, Dec 13 '19 at 11:09
@Michael The answer is awesome and I may ask you some followups in this comment session later. Thanks! It is already very helpful to me(and maybe also others, hopefully). — Lerner Zhang, Dec 14 '19 at 01:30
For people who are interested, I find the lecture 17 of MIT 18.06 (by professor Gilbert Strang) discussed a bit about the matrix to represent the Gram-Schmidt operations. It makes sense to me there exists such a matrix because Gram-Schmidt process operates on column vectors. That said, he didn't give a complete matrix. The discussion started at 44'52" in the video (https://www.youtube.com/watch?v=0MtwqhIwdrI). — rayx, Feb 19 '21 at 08:26
Thanks! I understand what you mean now. Just say thank you to you. — Lerner Zhang, Oct 27 '22 at 13:15

How does the matrix after the Gram-Schmidt Orthonormalization Process relate to that before?

1 Answers1

We have:

What we can do with this:

Enter Gram-Schmidt