Is it possible to define a Hessian Matrix for a Matrix-valued function?

Question

So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:

Why does the non-negative matrix factorization problem non-convex?

However this was addressed only for the scalar case which is not my project's focus.

My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).

My function:

$$ f = \min_{W, H} \left\lVert X \; - \; WH \right\rVert_{F}^{2} $$

Which is equivalent to:

$$ \min_{W, H} f = tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack = tr\lbrack (X \; - \; WH)^{T}(X \; - \; WH) \rbrack $$

I know how to calculate the partial derivatives of this function, and I acutally have:

$$ \frac{\partial f}{\partial W} = \frac{\partial tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack}{\partial W} = -2XH^{T} + 2WHH^{T}$$

Which is the same result as equation (24) found in this document:

http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf

Applying the same idea, i calculated the other partial derivative:

$$ \frac{\partial f}{\partial H} = \frac{\partial tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack}{\partial H} = -2W^{T}X + 2W^{T}WH$$

If both of these are correct, is it possible to define a vector with entries that are matrices such that:

$$ \nabla f(W, H) = \left(\frac{\partial f}{\partial W} \;\; \frac{\partial f}{\partial H} \right)$$

And what would be the way to compute the Hessian? (If it makes sense).

Please do not mark as duplicate. The question Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.

@user550103 What troubles me are the dimensions of each partial derivative. if $$ \frac{\partial f}{\partial W} \in \R^{m \times r}$$ and $$ \frac{\partial f}{\partial H} \in \R^{r \times n}$$, does the gradient i want to define exist? Does the Hessian exist? — Maganna Dev, Dec 24 '18 at 12:22
If I am understanding your problem, then the gradients are matrices. Yes, they are matrices, which is as expected... — user550103, Dec 24 '18 at 12:25
@user550103 I know the partial derivatives should be matrices, but I'm thinking of a gradient as this entity which contains all partial derivatives. My problem is the fact that each entry has different dimensions. Can a gradient be defined like this? — Maganna Dev, Dec 24 '18 at 12:29

greg · Accepted Answer · 2018-12-24T14:19:43.250

Define a new matrix $$Y=WH-X$$ Write the function in terms of this new variable $$f = \|Y\|^2_F = Y:Y$$ where a colon denotes the trace/Frobenius product, i.e. $\,\,A:B={\rm tr}(A^TB)$

Find its differential and gradients. $$\eqalign{ df &= 2Y:dY \cr &= 2Y:(dW\,H+W\,dH) \cr &= 2YH^T:dW + 2W^TY:dH \cr &= 2(WH-X)H^T:dW + 2W^T(WH-X):dH \cr \frac{\partial f}{\partial W} &= 2(WH-X)H^T,\quad \frac{\partial f}{\partial H} = 2W^T(WH-X) \cr }$$ Since the gradients are themselves matrices, the hessians will be 4th order tensors which cannot be represented in matrix notation.

One way to approach the hessian is to use vectorization which flattens matrices into vectors.
For example, $$\eqalign{ G &= \frac{\partial f}{\partial W} = 2WHH^T - 2XH^T \cr dG &= 2\,dW\,HH^T \cr {\rm vec}(dG) &= 2\,{\rm vec}(dW\,HH^T) \cr dg &= 2\,(HH^T\otimes I)\,dw \cr \nabla_{ww}f &= 2\,(HH^T\otimes I) \cr }$$ Working through the other hessians $$\eqalign{ \nabla_{hh}f &= 2\,(I\otimes W^TW) \cr \nabla_{wh}f &= 2(H\otimes W) + 2(I\otimes Y)K \cr \nabla_{hw}f &= 2(H^T\otimes W^T) + 2(Y^T\otimes I)K \cr }$$ where $K$ is the Commutation Matrix which can be used to vectorize the transpose of a matrix $${\rm vec}(X^T) = K\,{\rm vec}(X)$$

Is it possible to define a Hessian Matrix for a Matrix-valued function?

1 Answers1