2

How to calculate the gradient of $f(x)=x^TAx+b^Tx$ when $A$ is symmetric and when $A$ is not symmetric?

I will have confirmation if the computation of the gradient of $f$ when $A$ is a square matrix of size $n \times n$ non-symmetric and when $A$ is symmetric.

I begin my proof

$f:R^n \to R$

1) A is no symmetric:

It is already noted that : $f(x)=a^TAx=x^TA^Tx$ because $a^TAx$ is a scalar$ So about calculating gradient of $$ and he does that using the concept of exterior derivative.

$f(x+a)=(x+a)^T(x+h)+b^T(x+h)$

$x^TAx+a^Ax+x^Aa+a^TAa+B^Tx+B^Th$

$f(x)+x^T(A+A^T)a+a^T+B^Tx+B^Th$

$∇f(x)a=(A+A^T+B^T)x+B^Th$

2) A is symmetric so $A^T=A$

$∇f(x)a=$2Ax

It would be great if someone could help me solve the problems I will be very thankful

Also , I will also have another question in both cases the staging for a decadent gradient and quasi newton algorithm, thank you

Paul-henri

  • 1
    The skew-symmetric part contributes nothing to the quadratic form, i.e., $$\rm x^{\top} \left( \frac{A - A^{\top}}{2} \right) x = 0$$ Thus, why define a function with a quadratic form $\rm x^{\top} A , x$ in which $\rm A$ is non-symmetric? – Rodrigo de Azevedo Apr 23 '17 at 10:41
  • Oh thanks for that comment I never thought about that! – Ovi May 11 '20 at 13:45

2 Answers2

3

$f(x) = \langle Ax,x\rangle + \langle x, b \rangle = B(x,x) + L(x)$, so $f = B \circ(I,I) + L$, where:

  • $B$ is the bilinear map $(x,y) \mapsto \langle Ax,y\rangle$. It's obviously bounded, so it's differentiable and has $DB(x,y)(h,k) = B(x,k) + B(h,y) = \langle Ax,k\rangle + \langle Ah, y\rangle$.

  • $L$ is the linear map $x \mapsto \langle x, b \rangle$. We have $DL(x) = L$.

Thus,

$$Df(x)h = DB(x,x)(DI(x)h, DI(x)h) + DL(x)h = DB(x,x)(h,h) + Lh = \langle Ax,h\rangle + \langle Ah,x \rangle + \langle h,b\rangle$$

In other words

$$\langle \nabla f(x), h\rangle = \langle Ax,h\rangle + \langle Ah,x \rangle + \langle h,b\rangle$$

We have $\langle Ah, x \rangle = \langle h, A^T x\rangle = \langle A^T x, h\rangle$ and $\langle h,b\rangle = \langle b, h\rangle$. Thus:

$$\langle \nabla f(x), h\rangle = \langle (A + A^T)x + b, h\rangle$$

So $\nabla f(x) = (A+A^T)x + b$.

If $A$ is symmetric, we have $\nabla f(x) = 2A x + b$.

2

We wish to differentiate a scalar with respect to a vector to obtain a vector. DIfferentiating $x_i A_{ij}x_j + b_i x_i$ with respect to $x_k$ gives $\delta_{ik}A_{ij}x_j+x_iA_{ij}\delta_{jk}+b_i\delta_{ik}=A_{kj}x_j+x_iA_{ik}+b_k$, which is the $k$th component of $(A+A^T)x+b$. This vector is the sought gradient. If $A=A^T$, we can simplify this to $2Ax+b$.

zahbaz
  • 10,441
J.G.
  • 115,835