0

Let $Q, R \in \mathbb{R}^{n\times n}$ such that $Q, R \succ 0$. Let $g\left(\boldsymbol{x}\right) : \mathbb{R}^{n} \to \mathbb{R}$ such that

$$g\left(\boldsymbol{x}\right)=\left(\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}\right)\left(\frac{1}{2}\boldsymbol{x}^{T}R\boldsymbol{x}\right)$$

I want to find the gradient and the Hessian of $g\left(\boldsymbol{x}\right)$.


What I tried so far

To find the gradient and the Hessian using the derivative rules and got the following:

enter image description here

but while I notice (2nd element of the last row at the hessian calculation Ive got the term $2R\boldsymbol{x}\cdot Q\boldsymbol{x}$ which is column vector times another column vector and that's obviously a mistake.on my calculations product rule of gradient but I don't sure if It's a valid rule at matrix calculus. So, how can I calculate the gradient and the hessian of $g\left(\boldsymbol{x}\right)$?

2 Answers2

3

The individual terms are easy to handle: $$\eqalign{ \def\LR#1{\left(#1\right)} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\p{\partial} \alpha &= \tfrac{1}{2}x^TQx, \qquad \frac{\p\alpha}{\p x} &= Qx, \qquad \frac{\p^2\alpha}{\p x\,\p x^T} &= Q \\ \beta &= \tfrac{1}{2}x^TRx, \qquad \frac{\p\beta}{\p x} &= Rx, \qquad \frac{\p^2\beta}{\p x\,\p x^T} &= R \\ \\ }$$ The calculation for their product is straight forward: $$\eqalign{ \pi &= \alpha\beta \\ \\ \frac{\p\pi}{\p x} &= \beta\frac{\p\alpha}{\p x} + \alpha\frac{\p\beta}{\p x} \\ &= \beta Qx \;+\; \alpha Rx \\ \\ \frac{\partial^2\pi}{\p x\,\p x^T} &= \beta\fracLR{\p^2\alpha}{\p x\,\p x^T} + \fracLR{\p\beta}{\p x} \fracLR{\p\alpha}{\p x^T} + \fracLR{\p\alpha}{\p x} \fracLR{\p\beta}{\p x^T} + \alpha\fracLR{\p^2\beta}{\p x\,\p x^T} \\ &= \beta Q \;+\; Rxx^TQ \;+\; Qxx^TR \;+\; \alpha R \\ }$$


Update

This update addresses ordering issues raised in the comments.

Differentials are often the best approach for matrix calculus problems because, unlike gradients, they satisfy a simple product rule: $$\eqalign{ d(A\star B) &= (A+dA)\star(B+dB) \;\;-\;\; A\star B \\ &= dA\star B + A\star dB \\ }$$ where $A$ is a {scalar, vector, matrix, tensor}, $B$ is a {scalar, vector, matrix, tensor}, and $\star$ is any product which is compatible with $A$ and $B.\;$ This includes the Kronecker, Hadamard/elementwise, Frobenius/trace and dyadic/tensor products, as well as the Matrix/dot product.

IFF the product commutes, you can rearrange the product rule to $$d(A\star B) = B\star dA + A\star dB$$ The Hadamard and Frobenius products always commute. The other products are commutative only in special situations. For example the Kronecker product commutes if either $A$ or $B$ is a scalar, and the dot product commutes if both $A$ and $B$ are real vectors.

The differential and the gradient are related and can be derived from one another, i.e. $$\frac{\p\alpha}{\p x} = Qx \quad\iff\quad d\alpha = (Qx)^Tdx = x^TQ\,dx$$ Let's examine one of the terms in the preceding hessian calculation.
First calculate its differential, and then its gradient. $$\eqalign{ y &= \alpha(Rx) = (Rx)\alpha \qquad \big({\rm the\,scalar\star vector\,product\,commutes}\big) \\ dy &= \alpha(R\,dx) + (Rx)\,d\alpha \\ &= \alpha R\,dx \;\;\,+ Rx\;x^TQ\,dx \\ &= (\alpha R+Rx\,x^TQ)\,dx \\ \frac{\p y}{\p x} &= \alpha R+Rx\,x^TQ \\ }$$

greg
  • 35,825
  • where I can find the rules of derivatives according to $x^T$ (a row vector)? (I know the rules of derivatives according to x) – Omer Ben Apr 28 '20 at 14:22
  • ( i guess its not same as derivatives according to $x$) – Omer Ben Apr 28 '20 at 14:42
  • and, in the 2'nd term , why did you change the order of $\alpha,\beta$? here what i did: link it became matrix plus scalar and that's is incorrect. – Omer Ben Apr 28 '20 at 15:15
  • 1
    Here's how to handle the transposed gradients
    $$\eqalign{ &1)\quad;\frac{\partial\beta}{\partial x^T} &= \left(\frac{\partial\beta}{\partial x}\right)^T \ &2);;\frac{\partial^2\beta}{\partial xx^T} &= \frac{\partial}{\partial x}\left(\frac{\partial\beta}{\partial x^T}\right) \ }$$ As for changing the order, since $(\alpha,\beta)$ are scalars they can be moved to any convenient position without changing the value of an expression.
    – greg Apr 28 '20 at 16:39
  • $\alpha,\beta$ are scalar but $\frac{\partial\alpha}{\partial\boldsymbol{x}},\frac{\partial\beta}{\partial\boldsymbol{x}}$ are vectors , and I don't think you can just change their order.here what I did and encounter the same issue again: [link] (https://ibb.co/KXjY8Bs) – Omer Ben Apr 29 '20 at 15:44
  • thanks for all the support! just to be sure, here what I did : https://ibb.co/F7gYMyq originally I used product rule at (1) and got a dimensional error. but,at (2) I used product rule with commutative property I got the right answer.(without a dimensional error) I think I've got a mistake while using the product rule. what am I missing? – Omer Ben May 01 '20 at 10:14
0

I would suggest the following: $g(x)$ can be rewritten as $$ g(x)=\left(\frac{1}{2}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\sum_{i,j=1}^nR_{ij}x_ix_j\right) $$ where $Q_{ij}$ and $R_{ij}$ are the coefficients of the matrices $Q$ and $R$. Now you can proceed similarly as you did, taking derivative with respect to $x_k$, $k=1,\ldots,n$: $$ \frac{\partial g}{\partial x_k}=\left(\frac{1}{2}\frac{\partial}{\partial x_k}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\sum_{i,j=1}^nR_{ij}x_ix_j\right)+\left(\frac{1}{2}\sum_{i,j=1}^nQ_{ij}x_ix_j\right)\left(\frac{1}{2}\frac{\partial}{\partial x_k}\sum_{i,j=1}^nR_{ij}x_ix_j\right). $$

Can you take it from here?

hamath
  • 728
  • Thanks! I did that and the gradient is the same as the previous calculation,but what about the hessian?It's suppose to look like this ? (after the calculations)? $\nabla^{2}g\left(\boldsymbol{x}\right)=\left(\frac{1}{2}\boldsymbol{x}^{T}Q\boldsymbol{x}\right)R+\left(\frac{1}{2}\boldsymbol{x}^{T}R\boldsymbol{x}\right)Q$ – Omer Ben Apr 28 '20 at 12:22