So I'm doing a project on optimization (non-negative matrix factorization), which I know is not convex, from this question:
Why does the non-negative matrix factorization problem non-convex?
However this was addressed only for the scalar case which is not my project's focus.
My question is: How am I supposed to define a gradient and a Hessian matrix for more general cases. Is it possible? Is it still called a Hessian matrix or is it some sort of tensor (of which I don't really know much).
My function:
$$ f = \min_{W, H} \left\lVert X \; - \; WH \right\rVert_{F}^{2} $$
Which is equivalent to:
$$ \min_{W, H} f = tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack = tr\lbrack (X \; - \; WH)^{T}(X \; - \; WH) \rbrack $$
I know how to calculate the partial derivatives of this function, and I acutally have:
$$ \frac{\partial f}{\partial W} = \frac{\partial tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack}{\partial W} = -2XH^{T} + 2WHH^{T}$$
Which is the same result as equation (24) found in this document:
http://cal.cs.illinois.edu/~johannes/research/matrix%20calculus.pdf
Applying the same idea, i calculated the other partial derivative:
$$ \frac{\partial f}{\partial H} = \frac{\partial tr\lbrack (X \; - \; WH)(X \; - \; WH)^{T} \rbrack}{\partial H} = -2W^{T}X + 2W^{T}WH$$
If both of these are correct, is it possible to define a vector with entries that are matrices such that:
$$ \nabla f(W, H) = \left(\frac{\partial f}{\partial W} \;\; \frac{\partial f}{\partial H} \right)$$
And what would be the way to compute the Hessian? (If it makes sense).
Please do not mark as duplicate. The question Defining the Hessian of a function that takes general matrices as an input has no appropriate answer on how to compute these but rather an example to disprove the OP's method and an insult.