1

We got the following problem in class:

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ have independent rows and $d\in\text{im}\left(C\right) \subset \mathbb R^{p}$. Consider the minimization problem $$\min_{x\in \mathbb R^{n}}\frac{1}{2}\left\vert\left\vert Ax - b \right\vert\right\vert_{2}^{2} \quad \text{s.t.} \quad Cx = d. \qquad\qquad \left(\star\right)$$ Prove that a vector $\hat{x}\in\mathbb R^{n}$ solves $\left(\star\right)$ if and only if there exists $z\in\mathbb R^{p}$ s.t. $$\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}.$$ Moreover, prove that the solution is uniquely defined if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns.

In this post, I would like to ask about the uniqueness part. I first tried to do it by assuming there exist two solutions to $\left(\star\right)$ and then trying to show that $\hat{x} = \tilde{x}$. Well, maybe it's possible, but after half a page, I didn't know how to continue anymore.. I found two helpful links, one from MSE and another from some online lecture notes (particularly page 1), and this is how I would do it now:

(i)

In our case, the gradient of the Lagrangian $\mathcal L$ is given by $$\nabla_{x}\mathcal L = A^{T}Ax - A^{T}b - C^{T}\lambda \overset{!}{=} 0\Leftrightarrow A^{T}Ax = A^{T}b + C^{T}\lambda$$ Now, I was wondering whether I could just write $x = \left( A^{T}A\right)^{-1}\left(A^{T}b + C^{T}\lambda\right)$? I mean, for this to be correct, we need that $A^{T}A$ has full rank, i.e. $\text{rank}\left( A^{T}A\right) = n$ must hold. But does this already follow from the fact that $A$ has independent rows (a preprequisite) and $\begin{pmatrix} A \\ C \end{pmatrix}$ has linearly independent columns? To me, it does not quite add up yet..

I know that I haven't used anywhere that the least-squared problem is a (strictly) convex problem, maybe that's part of the solution?

Edit: Let $$By := \begin{pmatrix}A^{T}A & C^{T} \\ C & 0 \end{pmatrix}y = 0,$$ then $$\begin{pmatrix} A^{T}Ay_1 + C^{T}y_2 \\ Cy_{1} \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}.$$ Now, from the last component, why does it necessarily follow that $y_2 = 0$? For example, couldn't it be that $p = 1 = n$, s.t. $C\in R^{1\times 1}$, and then more specifically $C = 0$? Or does this have to hold for arbitrary $C$?

Now for the first component (under the assumption of $y_2 = 0$): $A^{T}Ay_1 = 0$. I guess as above, we say that this has to hold for arbitrary $A^{T}A$, and thus $y_1 = 0$?

Hermi
  • 692
  • Why have you ignored the first part of the questions? It's important ;) – f10w May 10 '21 at 18:11
  • I didn't ignore it, but I am in this post more concerned about the uniqueness, because I already have the "$\Rightarrow$" direction. – Hermi May 10 '21 at 19:21
  • It doesn't seem you realized that the matrix in the first part plays a role in the second part... – f10w May 10 '21 at 19:25
  • Oh, I didn't realize it, thanks for pointing it out.. Do you think you could elaborate upon this a bit more? :) Because the vector in the second part we are given is $\begin{pmatrix}A \ C\end{pmatrix}$, but the first column of the matrix of the first part has $A^{T}A$ and $C$, so I am not sure.. – Hermi May 10 '21 at 19:33
  • Let $B$ is the left big matrix in the first part, if suffices to show that $By=0$ implies $y=0$, which is quite straightforward. – f10w May 10 '21 at 19:52
  • I edited my original question. But honestly, I don't see yet why we're considering $By = 0$, so I don't see how this is connected to the task at hand (sorry).. :) By the way, it's midnight at my place, I will reply back tomorrow. – Hermi May 10 '21 at 20:25
  • @Khue? I edited my question. :) – Hermi May 11 '21 at 07:08
  • That is to show that the matrix is invertible and thus the linear equation in the first part has a unique solution. I'll post an answer soon. – f10w May 11 '21 at 17:08

1 Answers1

1

First, there seems to be a typo in the question:

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ have independent rows

It should be

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ has independent rows

which means only $C$ needs to have independent rows.

Second, since you wondered if the convexity was used somewhere: yes it was used in the first part of the questions. The linear system \begin{equation}\label{kkt} \begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}\tag{$\star\star$} \end{equation} corresponds to the KKT conditions of the original optimization problem. The solution to this system is also the one to the original problem because it is convex. This is also how to prove the first part. Note that the variable $z$ here plays the role of $-\lambda$, where $\lambda$ is the multiplier in your Lagrangian above.

Finally, for the second part, it suffices to show that if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns (in addition to $C$ having independent rows), then the matrix $\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}$ is invertible. Why is that? Because if this matrix is invertible then \eqref{kkt} yields the unique solution \begin{equation} \begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}^{-1} \begin{pmatrix} A^{T}b \\ d \end{pmatrix}. \end{equation} To prove the invertibility, it suffices to show that if $\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} x \\ z \end{pmatrix} = 0$ then $x=z=0$ (i.e., the columns of that matrix is linearly independent). Indeed, consider \begin{align} A^TAx + C^Tz &= 0\tag{1}\label{1}\\ Cx &= 0\tag{2}\label{2}. \end{align} From \eqref{1} we have $x^T(A^TAx + C^Tz) = 0$, which means $\|Ax\|_2^2 + (Cx)^Tz = 0$, implying $\|Ax\|_2^2 = 0$ because $Cx = 0$ according to \eqref{2}. Thus we have $Ax=Cx=0$, or equivalently, $\begin{pmatrix} A\\ C \end{pmatrix}x = 0$, which implies $x=0$ because the columns of $\begin{pmatrix} A\\ C \end{pmatrix}$ are independent. It remains to prove $z=0$, which is true because \eqref{1} yields $C^Tz=0$ while the columns of $C^T$ are independent. We are done.

f10w
  • 4,509
  • 15
  • 22
  • This is an illuminatingly clear answer, thanks a lot! I upvoted it and marked it as accepted. What I was missing is that a square matrix is invertible iff its kernel is trivial (which I found on Wikipedia). With that, it's clear! I would like to follow-up on the first part that you wrote, can I open a new question and send the link here, would that be okay for? Because with the help of Lagrange multipliers, I get from the least-squares problem to the matrix equation, but I'm honestly stuck on the other way.. :) – Hermi May 11 '21 at 20:32
  • @Hermi Sure. I’ll try to help with that as well. – f10w May 11 '21 at 21:44
  • Great, thanks a lot! Here comes the link: https://math.sta/ckexchange.com/questions/4136920/the-vector-constrained-least-squares-problem. – Hermi May 12 '21 at 22:00
  • Khue, I was just going through your answer again, and I was wondering why the equation $C^T z = 0$ (cf. your last line) implies $z = 0$ if the columns of $C^T$ (or equivalently, the rows of $C$) are linearly independent? This would have made sense to me if $C$ had been a square matrix, but this is not the case for us.. – Hermi Jul 17 '21 at 11:52
  • @Hermi If $Ax = 0$ and the columns of $A$ are linearly independent then $x = 0$. This is a simple fact: If $x = (x_1,x_2,\dots,x_n)$ and $A = (a_1,a_2,\dots,a_n)$ (where $a_i$ is the $i$-th column of the matrix $A$), then $Ax = x_1a_1 + \dots+x_na_n$, which is a linear combination of the columns. A linear combination of linearly-independent vectors is zero if and only if all the coefficients are zeros, by definition. – f10w Jul 17 '21 at 11:57
  • Wow, thank you for the nice (and quick) reply! – Hermi Jul 17 '21 at 12:00