Defining the Hessian of a function that takes general matrices as an input

Question

I'm trying to analyze optimization problems of the form $$\text{Minimize: }||A - XY||^{2}_{F}$$ where $A$ is size [m x n], $X$ is size [m x k], $Y$ is size [k x n], $k\le min(m,n)$, and $A$, $X$, and $Y$ contain only real numbers. It is known that in the general case this problem is not convex, but several variants of it are (Principal Component Analysis can be formulated this way, for example). My question relates to formulating the Hessian matrix of this type of system.

Using the matrix cookbook and several other online references I have calculated the Hessian as follows $$H(f(X,Y)) = \left( \begin{array}{ccc} -2YY^{T} & -2A + 4XY \\ -2A + 4XY & -2X^{T}X \end{array} \right)$$ with $f(X,Y) = ||A - XY||^{2}_{F}$. Here are the steps I used to arrive at this form. $$\frac{\partial f}{\partial X} = -2(A - XY)Y^{T}, \frac{\partial f}{\partial Y} = -2X^{T}(A - XY) \\\frac{\partial^{2} f}{\partial X^{2}} = -2YY^{T}, \frac{\partial^{2} f}{\partial Y^{2}} = -2X^{T}X\\\frac{\partial^{2} f}{\partial X \partial Y} = -2A + 4XY, \frac{\partial^{2} f}{\partial Y \partial X} = -2A + 4XY$$

My first question is whether or not this is correct (I'm moderately confident that the partials in terms of only X or Y are correct, but much less confident in the others). My second question is if the above is correct, does that mean the Hessian matrix is undefined when $A$ is not square? That appears to be the case because when $A$ is rectangular $-2YY^{T}$ and $-2X^{T}X$ are of different dimensions, which makes the Hessian a matrix of size $\left( \begin{array}{ccc} [m \times m] & [m \times n] \\ [m \times n] & [n \times n] \end{array} \right)$, but I'm relatively new to the concepts behind matrix calculus so I can't tell if I missed something (for instance, are one of the symmetric partials supposed to be transposed?).

apart from some typos, dimension mismatches, and sign errors, the Hessian should be an object that defines a bilinear map. So it should be some thing like $(mk + kn) \times (mk + kn)$ or ordered in which way ever you preferred. — user251257, Mar 01 '16 at 21:25
Where do you see the dimension mismatches? As I said I am very new to matrix calculus so I'm not surprised there are some, I'm just not seeing them... — nick.schachter, Mar 02 '16 at 01:00
You wrote $X$ is $m\times k$, and $Y$ is $n\times k$. In that case $XY$ is not defined for $k\ne n$. Also you wrote something like $k \le \min(X,Y)$. Probably you think of $k\le\min(m,n)$? — user251257, Mar 02 '16 at 01:05
Yes, you're right on both counts. When I wrote the problem down in my notes it was $XY^{T}$, hence the mistake with $n\times k$. I will edit that. And yeah, it should have been $k\le min(m,n)$. — nick.schachter, Mar 02 '16 at 01:06
Btw you know such problems are solved using singular value decomposition, right? — user251257, Mar 02 '16 at 01:09
In the case where there are no restrictions on the problem, yes. — nick.schachter, Mar 02 '16 at 01:10

Jean Marie · Answer 1 · 2020-04-01T05:39:23.277

I am afraid that your formula for the Hessian is not right.

I am going to show here a precise component-by-component solution in the particular case of $2 \times 2$ matrices, in disagreement with your general solution.

(I don't pretend in the following to provide a general solution).

Notational conventions:

$$A=\begin{bmatrix} p & q\\ r & s \end{bmatrix} \ \ \ X=\begin{bmatrix} a & b\\ c & d \end{bmatrix} \ \ \ Y=\begin{bmatrix} e & f\\ g & h \end{bmatrix} $$

The expression to be minimized is the square of the Frobenius norm of $A-XY$, i.e.,

$$s \ = \ (p - a e - b g)^2 + (q - a f - b h)^2 + (r - c e - d g)^2 + (s - c f - d h)^2$$

where the variables are $a,b,c,d,e,f,g,h$ ; $p,q,r,s$ are fixed numbers.

The upper left part of the Hessian ($4 \times 4$ block of second order partial derivatives with respect to variables $a,b,c,d$) is obtained through straightforward computations (with Mathematica, I confess)

$$\begin{bmatrix} 2YY^T & 0\\ 0 & 2YY^T \end{bmatrix} \ \ \text{instead of} \ \ -2YY^T$$

The presence of $0$ blocks is understandable: for example $\dfrac{\partial^2 s}{\partial a \partial c}$ is clearly zero.

The upperright block is

$$2\begin{bmatrix} 2 a e + b g - p & 2 a f + b h - q & b e & b f\\ a g & a h & a e + 2 b g - p & a f + 2 b h - q\\ 2 c e + d g - r & 2 c f + d h - s & d e & d f\\ c g & c h & c e + 2 d g - r & c f + 2 d h - s \end{bmatrix}$$

where everything looks scrambled, and rather far from your results (even if one finds something that is similar to $XY-A$).

Question: Don't you have any assumption about matrices $A$, $X$, $Y$? (not symmetric, not positive definite...) ?

Remark: I would have liked to follow your computation and spot a possible mistake but, honestly, I don't master the technique of derivation with respect to matrices.

To answer your question, there are no assumptions about $A$, $X$, or $Y$ except that they are all real matrices. Obviously this sort of problem becomes much simpler when you know something about the structure of the matrices, but in this case I am assuming we do not.
As far as the actual calculations go, I was going primarily off of the techniques listed here, http://www.math.uwaterloo.ca/~hwolkowi//matrixcookbook.pdf, and here, http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html, which specifically mention how to differentiate the Frobenius norm. — nick.schachter, Mar 02 '16 at 01:04

Defining the Hessian of a function that takes general matrices as an input

1 Answers1

Linked