1

When penalizing the problem $$\begin{array}{r@{} l@{}} \text{minimize }& {c}^{T} x \\ \text{subject to } & {A} x = {b}, \end{array}$$ it appears the following optimization problem: $$\text{minimize } c^{T} x + \rho \|Ax-b\|^2, $$ for some $\rho>0.$ Since this function is coercive, it achieves a minimum, hence it has a minimizer, say $x^{*}$ -- see this link, for example. What is an expression for this minimizer $x^{*}$?

  • Contrary to what you write, the function is not coercive unless $c\in (\ker A)^\perp$. Otherwise the function is not bounded from below. Simply take any $v \in \ker A$ such that $c^T v <0$, and put $x_n = n v$. Then $c^T x_n + \rho |Ax_n-b|^2 = nc^T v + \rho |b|^2 \to -\infty$ as $n\to\infty$. – rafexiap Aug 01 '23 at 20:26
  • Hey, Interesting finding! – R. W. Prado Aug 01 '23 at 20:32
  • 1
    For example, take $\rho=1$, $x=(x_1,x_2)^T$, $c=(-1,0)^T$, $A=(1,-1)$, $b=0$. Then $x_1-x_2=0$, but we can still take an arbitrarily large $x_1=x_2$ and make the objective as negative as we like. – rafexiap Aug 01 '23 at 20:34

2 Answers2

1

Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, the gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero and solving we have \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c \\ & \iff x = (A^TA)^{-1}\left (A^T b - \frac{1}{2\rho}c \right ) \end{align} Which is the solution assuming that $A^TA$ is invertible (and this is guaranteed when the columns of $A$ are linearly independent).

A great resource for computing gradients involving vectors and matrices is this website.

Edit: If $A^TA$ is not invertible there are two possibilities for what can happen, and what you can do about it. Some important background facts to remember are

  • $A^TA$ is invertible if and only if the columns of $A$ are linearly independent (See this post for further discussion). In particular this implies that the matrix $A$ has a non-trivial null-space.

  • If $U \subset \mathbb{R}^d$ is a subspace and $U^\perp = \{x \in \mathbb{R}^d \ | x^Tu = 0 \ \forall \ u \in U \}$ is the orthogonal subspace to $U$, then $x$ can be written in a unique way as $x = u + v$ where $u \in U$ and $v \in U^\perp$.

With these in mind, let $c^\perp$ be the space of vectors that are orthogonal to $c$ and let $N$ be the null-space of $A$. Using $N$ and $c^\perp$ we can address the two possible cases:

  1. $N \subseteq c^\perp$. In this case we can write uniquely any vector $x$ in the form $x = y + z$ where $y \in N$ and $z \in N^\perp \subset c^\perp$. In this case the objective becomes $$ c^Tx + \rho\|Ax - b\|^2 = c^T(y + z) + \rho\|A(y+z) - b\|^2 = c^Tz + \rho\|Az - b\|^2 $$ where we used that $y \in N$ which implies $Ay = 0$ and also $y \in c^\perp$ and therefore $c^Ty = 0$. This shows that the only vectors that matter are those that belong to $N^\perp$, and that there is no unique minimizer since we can add any $y \in N$ without changing the objective.

This is not all that can be done in this case however. We can still find a minimizer by performing a trick which reduces the problem to the case above where $A^TA$ is invertible. Assume that the original problem takes in vectors of $\mathbb{R}^n$ and that $\text{dim}(N^\perp) = n'$. Now introduce a new matrix $B \in \mathbb{R}^{n \times n'}$ such that the range of $B$ equals $N^\perp$. This is most easily done by letting $B$ be the transpose of $n'$ linearly independent rows of $A$.

Using this matrix solve the new problem $$\min_{z \in \mathbb{R}^{n'}} c^T(Bz) + \rho \|ABz - b\|^2 = \min_{z \in \mathbb{R}^{n'}} (B^Tc)z + \rho \|(AB)z - b\|^2 = $$ which is of the same form as the original problem with $c' = B^Tc$ and $A' = AB$. Importantly in this new problem it must be the case that $(AB)^T(AB) = B^TA^TAB$ is invertible since $AB$ has no null space (since $Bz \in N^\perp \implies ABz \neq 0$, unless $z = 0$)

To recover a minimizer $x^*$ in the original problem from a minimizer of the new problem $z^*$, they are related by $x^* = Bz^*$. Interestingly, this $x^*$ is the unique minimizer which is contained in $N^\perp$.

The intuition for why this approach works is that since the null space does not contribute to the objective, we really just want to parameterize the subspace that is orthogonal to the null space. This is done by using $\mathbb{R}^{n'}$ and the matrix $B$ which transforms $\mathbb{R}^{n'}$ into $N^\perp$.

  1. $N \not \subseteq c^\perp$. In this case there is no minimum because there is a vector $y$ such that $Ay = 0$ and $c^Ty \neq 0$. Multiplying this by a scalar $\alpha$ gives $$c^T(\alpha y) + \|A(\alpha y) - b\|^2 = \alpha c^Ty + \|0 - b\|^2 = \alpha c^Ty + \|b\|^2$$ and since $c^Ty \neq 0$ we can choose $\alpha$ as large as we want (and either positive or negative depending on the sign of $c^Ty$) and make the objective arbitrarily negative. Overall in this case there is no minimizer.
  • But when $A^{T} A$ is not invertible, what is the solution? Usually, It is only assumed that $A$ has full row rank, and not full column rank. – R. W. Prado Aug 01 '23 at 18:59
  • 1
    If $A^TA$ is not invertible, the answer is more subtle, I will post an edited answer in a few minutes discussing this case. – Matt Werenski Aug 01 '23 at 19:07
  • 1
    If $A^TA$ is not invertible, the penalized problem is not necessarily coercive, so there is no solution in the first place if $c \not\in (\ker A)^\perp$. However, when $c\in (\ker A)^\perp = \text{range}(A^T)$ then the system $A^TA x = A^Tb - 0.5\rho^{-1} c$ is (non-uniquely) solvable. – rafexiap Aug 01 '23 at 20:02
  • I saw your comment! And it solves the problem! – R. W. Prado Aug 01 '23 at 20:34
  • 1
    Your answer is accepted, Matt! However, the way I find the answer was a bit different. Anyway, thanks, mate. – R. W. Prado Aug 01 '23 at 20:37
  • Glad to be of service, really the main headache is dealing with the null space, and this is essentially what you are doing in your answer as well just with the SVD – Matt Werenski Aug 01 '23 at 20:44
  • I changed my answer to be less computationally expensive. =) Now, I don't need SVD decomposition anymore. – R. W. Prado Aug 01 '23 at 20:53
0

Let $f(x) = c^Tx + \rho \|Ax - b\|^2$, for all $x \in \mathbb{R}^n$. This function is convex, hence finding a point zeroing the gradient is enough -- it is also necessary -- to find a solution. The gradient of this function is given by $$\nabla f(x) = c - 2\rho A^Tb + 2\rho A^TAx.$$ Setting this to zero, we have to find an $x$ such that \begin{align} 0 = c - 2\rho A^Tb + 2\rho A^TAx & \iff A^TAx = A^Tb - \frac{1}{2\rho}c. \end{align}

As commented by others, if $c \not\in (\text{kern}(A))^{\perp}, $ thus we can find an $v \in \text{kern} A$ such that $c^{T} v < 0$. Hence, $f(n v) = n c^{T} v + \rho \|b\|^2,$ which converges to $-\infty$ as $n$ goes to $\infty$. Hence, for now on, we can assume, to find a solution, that $c \in (\text{kern}(A))^{\perp} = (\text{range}(A^{T}))$. Thus, $c= A^{T} x^{0}$, for some $x^0$ -- one can use least squares in this problem to determine whether there is or not such a $x_0$ point. Hence, after this step, we need to find $x$ such that, $$A^TAx = A^T\left(b - \frac{1}{2\rho} x^{0}\right), $$ which can be solved solving the least squares problem: $$\min \left\|A x -\left(b-\dfrac{1}{2\rho} x^{0}\right)\right\|^2.$$