I'm having trouble reasoning the motivations for Gram-Schmidt. I understand that it's an algorithm by which one can derive an orthonormal basis that spans the same space when given a set of vectors $\mathbf{A} \in \mathbb{R}^{m \times n}$. But two questions still bother me:
What are benefits of using Gram-Schmidt over another method that similarly finds an orthonormal basis? For example, the SVD or eigendecomposition of $\mathbf{AA}^\top = \mathbf{Q\Lambda Q}^{-1}$ can also derive an orthonormal basis for $\mathbf{A} \in \mathbb{R}^{m \times n}$. Both algorithms have a runtime of $O(mn^2)$. My intuition tells me it has to do with how Gram-Schmidt is iterative, and thus beneficial in streaming problems when we don't need all the vectors in an orthonormal basis, but only a subset such as in orthogonal matching pursuit (OMP).
How does Gram-Schmidt result in a lower computational complexity for some algorithms? For example, if $\mathbf{A}$ is orthonormal, then in least squares, $$x = (\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top b$$ calculating the inverse is an $O(n)$ operation instead of $O(n^3)$ since calculating the matrix inverse of a non-zero diagonal matrix is relatively straight forward. However, doesn't the computation cost of changing bases from a non-orthogonal basis $\mathbf{A}$ to $\mathbf{A}'$ neglect this? For least squares, we would be solving $$\mathbf{MA}x = \mathbf{M}b \implies x = (\mathbf{M}^\top\mathbf{A}^\top\mathbf{MA})^{-1}\mathbf{M}^\top\mathbf{A}^\top b$$ where $\mathbf{MA}$ is an orthonormal matrix, and $\mathbf{M}$ is the required transformation from $\mathbf{A}$ to $\mathbf{A}'$. Here again, I was thinking perhaps Gram-Schmidt results in a more efficient change of basis operation: there isn't a unique orthonormal basis that spans $\mathbf{R}^{m\times n}$, so maybe Gram-Schmidt finds a better one than the standard basis composed of $\mathbf{e}_i$.
Edit: For the question of a more efficient change of basis, I was reading through these notes that described the change of basis is simply QR factorization such that in least squares $$\mathbf{A} = \mathbf{QR} \implies x = \mathbf{R}^{-1}\mathbf{Q}^\top b$$