3

I'm having trouble reasoning the motivations for Gram-Schmidt. I understand that it's an algorithm by which one can derive an orthonormal basis that spans the same space when given a set of vectors $\mathbf{A} \in \mathbb{R}^{m \times n}$. But two questions still bother me:

  1. What are benefits of using Gram-Schmidt over another method that similarly finds an orthonormal basis? For example, the SVD or eigendecomposition of $\mathbf{AA}^\top = \mathbf{Q\Lambda Q}^{-1}$ can also derive an orthonormal basis for $\mathbf{A} \in \mathbb{R}^{m \times n}$. Both algorithms have a runtime of $O(mn^2)$. My intuition tells me it has to do with how Gram-Schmidt is iterative, and thus beneficial in streaming problems when we don't need all the vectors in an orthonormal basis, but only a subset such as in orthogonal matching pursuit (OMP).

  2. How does Gram-Schmidt result in a lower computational complexity for some algorithms? For example, if $\mathbf{A}$ is orthonormal, then in least squares, $$x = (\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top b$$ calculating the inverse is an $O(n)$ operation instead of $O(n^3)$ since calculating the matrix inverse of a non-zero diagonal matrix is relatively straight forward. However, doesn't the computation cost of changing bases from a non-orthogonal basis $\mathbf{A}$ to $\mathbf{A}'$ neglect this? For least squares, we would be solving $$\mathbf{MA}x = \mathbf{M}b \implies x = (\mathbf{M}^\top\mathbf{A}^\top\mathbf{MA})^{-1}\mathbf{M}^\top\mathbf{A}^\top b$$ where $\mathbf{MA}$ is an orthonormal matrix, and $\mathbf{M}$ is the required transformation from $\mathbf{A}$ to $\mathbf{A}'$. Here again, I was thinking perhaps Gram-Schmidt results in a more efficient change of basis operation: there isn't a unique orthonormal basis that spans $\mathbf{R}^{m\times n}$, so maybe Gram-Schmidt finds a better one than the standard basis composed of $\mathbf{e}_i$.

Edit: For the question of a more efficient change of basis, I was reading through these notes that described the change of basis is simply QR factorization such that in least squares $$\mathbf{A} = \mathbf{QR} \implies x = \mathbf{R}^{-1}\mathbf{Q}^\top b$$

Sentient
  • 675

1 Answers1

3

It is not only applicable for Euclidean vector spaces $(V,g)$ (in a more general setting than just matrices), but it is also important from a topology standpoint. The Gram-Schmidt process is a deformation retract ${\rm GL}(V) \to {\rm O}(V,g)$, which restricts to yet another deformation retract ${\rm GL}^+(V) \to {\rm SO}(V,g)$. See, for example, this answer.

Ivo Terek
  • 77,665
  • 1
    Sorry I haven't taken a class in topology yet. What is the meaning of GL, O, and the important of orientation. I saw in the wikipedia for Gram Schmidt that it preserves orientation. – Sentient May 03 '19 at 04:47
  • ${\rm GL}(V)$ denotes the group of invertible linear maps $V\to V$ and ${\rm O}(V,g)$ the subgroup of those that preserve $g$. We don't need an orientation on $V$, I was thinking of something else (I've edited this out of the answer). – Ivo Terek May 03 '19 at 04:49