We got the following problem in class:
Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ have independent rows and $d\in\text{im}\left(C\right) \subset \mathbb R^{p}$. Consider the minimization problem $$\min_{x\in \mathbb R^{n}}\frac{1}{2}\left\vert\left\vert Ax - b \right\vert\right\vert_{2}^{2} \quad \text{s.t.} \quad Cx = d. \qquad\qquad \left(\star\right)$$ Prove that a vector $\hat{x}\in\mathbb R^{n}$ solves $\left(\star\right)$ if and only if there exists $z\in\mathbb R^{p}$ s.t. $$\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}.$$ Moreover, prove that the solution is uniquely defined if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns.
In this post, I would like to ask about the uniqueness part. I first tried to do it by assuming there exist two solutions to $\left(\star\right)$ and then trying to show that $\hat{x} = \tilde{x}$. Well, maybe it's possible, but after half a page, I didn't know how to continue anymore.. I found two helpful links, one from MSE and another from some online lecture notes (particularly page 1), and this is how I would do it now:
(i)
In our case, the gradient of the Lagrangian $\mathcal L$ is given by $$\nabla_{x}\mathcal L = A^{T}Ax - A^{T}b - C^{T}\lambda \overset{!}{=} 0\Leftrightarrow A^{T}Ax = A^{T}b + C^{T}\lambda$$ Now, I was wondering whether I could just write $x = \left( A^{T}A\right)^{-1}\left(A^{T}b + C^{T}\lambda\right)$? I mean, for this to be correct, we need that $A^{T}A$ has full rank, i.e. $\text{rank}\left( A^{T}A\right) = n$ must hold. But does this already follow from the fact that $A$ has independent rows (a preprequisite) and $\begin{pmatrix} A \\ C \end{pmatrix}$ has linearly independent columns? To me, it does not quite add up yet..
I know that I haven't used anywhere that the least-squared problem is a (strictly) convex problem, maybe that's part of the solution?
Edit: Let $$By := \begin{pmatrix}A^{T}A & C^{T} \\ C & 0 \end{pmatrix}y = 0,$$ then $$\begin{pmatrix} A^{T}Ay_1 + C^{T}y_2 \\ Cy_{1} \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}.$$ Now, from the last component, why does it necessarily follow that $y_2 = 0$? For example, couldn't it be that $p = 1 = n$, s.t. $C\in R^{1\times 1}$, and then more specifically $C = 0$? Or does this have to hold for arbitrary $C$?
Now for the first component (under the assumption of $y_2 = 0$): $A^{T}Ay_1 = 0$. I guess as above, we say that this has to hold for arbitrary $A^{T}A$, and thus $y_1 = 0$?