Let $X \in R^{m\times n}$ with $m>n$. We aim to solve $y=X\beta$ where $\hat\beta$ is the least square estimator. The least squares solution for $\hat\beta = (X^TX)^{-1}X^Ty$ can be obtained using QR decomposition on $X$ and $LU$ decomposition on $X^TX$. The aim to compare these.
I noticed that we can use Cholesky decomposition instead of $LU$, since $X^TX$ is symmetric and positive definite.
Using $LU$ we have:
$\hat\beta = (X^TX)^{-1}X^Ty=(LU)^{-1}X^Ty$, solve $a=X^Ty$ which is order $O(2nm)$, then $L^{-1}b=a$ at cost $\sum_1^{k=n} (2k-1)$ and finally $U^{-1}a$ at the same cost of $\sum_1^{k=n} (2k-1)$.
I didn't count the cost of computing $L^{-1}$ and $U^{-1}$.
Using $QR$ we have: $\hat\beta = (X^TX)^{-1}X^Ty=((QR)^TQR)^{-1}R^TQ^Ty=R^{-1}Q^Ty$, where we solve $Q^Ty=a$ at cost $O(n^2)$ and $R^{-1}a$ with cost $\sum_1^{k=n} (2k-1)$.
Comparing the decompositions: It seems that QR decomposition is much better than LU. I think the cost of computing QR is higher than LU, which is why we could prefer to use LU. On the other hand if we are given the decompositions, we should use QR.
$SVD$ decomposition: Is there any advantage to use SVD decomposition?