2

I'm trying to compute the gradient and Hessian of the following function

$$f(x,y) = \frac{1}{2}|Ax-By|^2$$

where $A$ and $B$ are $m \times n$ matrices, $x, y \in \mathbb{R}^n$, and $f: \mathbb{R}^{2n} \to \mathbb{R}$.

I honestly don't have a clue on the best way to proceed. Usually, to find the gradient, I would rewrite the function in sums and derive from there - but the square and multiple vector arguments have me stumped. I am not looking for a solution but rather a hint on where to start.

Furthermore, am I right in thinking that $\nabla f(x,y)$ is a vector in $\mathbb{R}^{2n}$ consisting of the partial derivatives along $x$ and $y$, and $\nabla^2 f(x,y)$ to be a $2n \times 2n$ matrix?

Thank you in advance.

2 Answers2

2

An answer for the gradient.

Assimilating vectors and column vectors (as you do) :

$$f(x,y) := \frac{1}{2}\|Ax-By\|^2=\frac{1}{2}(Ax-By)^T(Ax-By)=$$

$$\frac{1}{2}(x^TA^T-y^TB^T)(Ax-By)$$ $$=\frac{1}{2}\left(x^T(A^TA)x-\underbrace{(x^TA^TBy+y^TB^TAx)}_{2x^TA^TBy}+y^T(B^TB)y\right)\tag{1}$$

Let us now apply 2 classical results :

1) the gradient of $x^TMx$ with respect to $x$ is $2x^TM$, seen as a row vector. Why that ? Consider the (Taylor) expansion, where $h$ is a vector increment:

$$\underbrace{(x+h)^TM(x+h)}_{f(x+h)}=\underbrace{x^TMx}_{f(x)}+\underbrace{x^TMh+h^TMx}_{(2x^TM)h=f'(x).h}+\underbrace{h^TMh}_{\text{2nd order term}}$$

2) The gradient of $x^TMy$ with respect to $y$ is row vector $x^TM$, for a similar reason.

Using these two results, the gradient of (1) is (indeed!) a $2n$ dimensional row vector which is:

$$(x^T(A^TA)-y^TB^TA,y^T(B^TB)-x^TA^TB)$$

Remarks :

1) Besides, yes, the Hessian is a $2n \times 2n$ matrix.

2) A different derivation for (1) could have been done by writing :

$$f(x,y) := \frac{1}{2}\|Ax-By\|^2= \frac{1}{2}\begin{pmatrix}x^T \ \ y^T\end{pmatrix}\begin{pmatrix}A^T\\-B^T\end{pmatrix}\begin{pmatrix}A \ \ -B\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}$$

$$= \frac{1}{2}\begin{pmatrix}x^T \ \ y^T\end{pmatrix}\begin{pmatrix}A^TA&-A^TB\\-B^TA&B^TB\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}$$

Jean Marie
  • 81,803
  • I think you forgot to transpose, $(x^T(A^TA)-y^TB^TA,y^T(B^TB)-x^TA^TB)$ is the jacobian, not the gradient. – Hyperplane Sep 22 '19 at 22:11
  • @Hyperplane A Jacobian is a matrix. A gradient is a row vector that has to be multiplied by a column vector (see the point in $f'(x).h$ which stands for a dot product? – Jean Marie Sep 22 '19 at 22:15
  • 1
    The jacobian of a function $f\colon \mathbb R^n \to\mathbb R^m$ is by definition the $m\times n$ matrix $(\frac{\partial f_i}{\partial x_j})_{ij}$, which is in this cases precisely the $1\times n$ matrix aka row vector $(x^T(A^TA)-y^TB^TA,y^T(B^TB)-x^TA^TB)$ – Hyperplane Sep 22 '19 at 22:19
  • 1
    This is a clash of convention (cf. https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions ) but to be honest I have never seen anyone call a row vector the gradient. Gradient descent ($x_{k+1} = x_k - \eta\nabla f(x_k)$) also wouldn't make much sense with this definition, as you would add a row and a column vector – Hyperplane Sep 22 '19 at 22:27
  • @Hyperplane Take a look for example to https://math.stackexchange.com/q/54355 – Jean Marie Sep 22 '19 at 22:38
  • Thank you so much @JeanMarie! It took me a while to really understand everything but I got it now and this was a huge help! – trashnumbers Sep 28 '19 at 18:08
1

Concatenate the two vectors (and matrices) into a single column vector (and matrix). $$\eqalign{ z &= \pmatrix{x\\y} &\in{\mathbb R}^{2n\times 1} \quad\quad C &= \Big[\,A\;-\!\!B\,\Big] &\in{\mathbb R}^{m\times 2n} \\ }$$ Now the function is very simple, as are its derivatives. $$\eqalign{ f &= \tfrac{1}{2}\|Cz\|^2 = \tfrac{1}{2}z^TC^TCz \\ \frac{\partial f}{\partial z} &= C^TCz \\ \frac{\partial^2 f}{\partial z^2} &= C^TC \\ }$$

greg
  • 35,825
  • I am sorry but $\tfrac{1}{2}|Cz|^2 = \tfrac{1}{2}(|Ax|^2+|By|^2)$ isn't the same as $ \tfrac{1}{2}|Ax-By|^2$... – Jean Marie Sep 23 '19 at 06:26
  • @JeanMarie Thank you. The $C$ matrix had the wrong shape. For some reason, I thought it needed to be block-diagonal. – greg Sep 23 '19 at 14:34