5

Given a differentiable function $$F:M_{n\times n}(\mathbb{R})\to\mathbb{R}$$ How am I to interpret its first order approximation?

The derivative of a real valued function of a matrix is a matrix valued function of a matrix. So what I did was treat this function just as if it was a function $$F:\mathbb{R}^{n\times n}\to\mathbb{R}$$ and formed what I hope is the first order approximation: $$F(X+H)\approx F(X) + \text{tr}(F'(X)H^T)$$

Is this correct? Would the second order approximation then contain a 3-tensor? Even though interpreting it as a vector function it would only be a matrix (the Hessian).

What precisely then is the difference between the domains $M_{n\times n}(\mathbb{R})$ and $\mathbb{R}^{n\times n}$. They are isomorphic as vector spaces, so should I think about $M_{n\times n}(\mathbb{R})$ as just $\mathbb{R}^{n\times n}$ with some extra multiplicative structure?

Where can I go to learn more about this type of stuff?

Update

Ok so I spent some time looking at Aloizio's answer and here is my understanding. Aloizio says that the derivative is a continuous linear function from $\mathbb{E}\mapsto\mathbb{F}$, and that made me realize that for functions $\mathbb{R}^n\mapsto\mathbb{R}$, the gradient is not the derivative, the first derivative is the function which computes the dot product of the gradient with a vector. This is confusing since there seems to be a tendency in mathematics to conflate the linear map of partial derivatives with the derivative itself.

Looking at the determinant as an example, we calculate $$\det(A+H) = \det(A) + \text{tr}(Adj_AH) + \epsilon(A,H)$$ where $Adj_A$ is the adjoint matrix of $A$. And thus the derivative of the determinant at $A$ is $$\text{tr}(Adj_A\;\cdot\;)$$ which is of course continuous and linear.

Now for a general differentiable function $M_{n\times n} (\mathbb{R})\mapsto\mathbb{R}$. We see that $\text{tr}(Adj_A\;\cdot\;)$ is just a row-stacked version of the dot product, and thus the distinction between $M_{n\times n} (\mathbb{R})\mapsto\mathbb{R}$ and $\mathbb{R}^{n\times n}\mapsto\mathbb{R}$ appears rather cosmetic, and we should expect all first derivatives to take the form $\text{tr}(B\;\cdot\;)$ for $B$ the matrix of partial derivatives. I also imagine there's some way to write the second derivative as a function involving a 3-tensor, but I don't know anything about tensors so I couldn't say for sure.

Set
  • 7,600

1 Answers1

1

The derivative of a real valued function is a linear functional which can be interpreted as a matrix.

But many times this interpretation is only confusing, and not practical.

The derivative of a function $f: \mathbb{E} \rightarrow \mathbb{F}$ from a banach space to another at a point $x$ is a continuous linear function $Df_x : \mathbb{E} \rightarrow \mathbb{F}$ for which:

$$f(x+h)=f(x)+(Df_x)(h)+\epsilon(h)$$

, where $\frac{||\epsilon(h)||}{||h||}\rightarrow 0$

For example, take the function $f: \mathbb{H} \rightarrow \mathbb{R}$ that maps $x \mapsto \langle x, x \rangle$. We have:

$$f(x+h)=\langle x + h, x +h\rangle=\langle x, x \rangle + 2\langle x, h \rangle + \langle h, h \rangle$$

Which implies $Df_x=\langle 2x , ~\cdot ~\rangle$.

Now, closer to your case: consider the function:

$$f: M_{n \times n} \rightarrow M_{n \times n}$$

$$ A \mapsto A^2 $$

Then,

$$f(A+H)=(A+H)^2=A^2+AH+HA+H^2$$

Which implies $Df_A=A. ~\cdot + \cdot~. A$. (This just means it is the linear function that takes $H$ to $A.H+H.A$).

Let's stretch this a bit... then we have $Df: M_{n \times n} \rightarrow L(M_{n \times n}; M_{n \times n})$, $A \mapsto A. ~\cdot + \cdot~. A$. We wish to calculate $D_BDf$

$Df_{B+H}=(B+H). ~\cdot + \cdot ~.(B+H) =(B. ~\cdot + \cdot ~.B) + (H. ~\cdot + \cdot ~.H)$

Therefore, $D_BDf=(~\cdot' ~.~ \cdot + \cdot ~.~ \cdot'~)$. We then have that $DD_f$ is constant, as we should expect, since for the real function case, $f(x)=x^2$ has constant second derivative. Note that if we were to evaluate $DDDf$, we would have the $0$ function.

Try to compute the derivatives of $\langle Ax, x \rangle$ and $\det$ for practice.


Update:

First of all, the matrix of partial derivatives is just the representation of the derivative in the canonical basis. Therefore, it is natural to see this representation: it presents a nice way to do calculations with the derivative.

Now, to respond your question about tensors...

Consider you have a differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}$

The derivative at a point $x$ is a linear map from $\mathbb{R}^n \rightarrow \mathbb{R}$. Therefore, $Df: \mathbb{R}^n \rightarrow L(\mathbb{R}^n; \mathbb{R})$.

The first derivative $Df_x$ at a point $x$ is therefore simply a linear functional.

Now, the second derivative will then be $\displaystyle DDf: \mathbb{R}^n \rightarrow L(\mathbb{R}^n; L\left(\mathbb{R}^n;\mathbb{R})\right)$.

Therefore, the second derivative $D_xDf$ at point $x$ is an element of $L(\mathbb{R}^n; L\left(\mathbb{R}^n;\mathbb{R})\right)$.

But an element of such set has a natural interpretation as an element of $L(\mathbb{R}^n, \mathbb{R}^n; \mathbb{R})$ (the set of bilinear functionals).

We associate to an element $T$ of $L(\mathbb{R}^n; L\left(\mathbb{R}^n;\mathbb{R})\right)$ the bilinear functional $T'$ as follows:

$$T'(h_1,h_2)=(Th_1)h_2$$

Analogously, $D_xDDf$ can be seen as a $3$-linear functional and so on.

Now, a multilinear functional from a product of vector spaces can be identified with a linear functional from the tensor product of those spaces.

Therefore, $D_xD^{m-1}f$ can be identified with an element of $\left(\mathbb{R}^n \otimes ... \otimes \mathbb{R}^n\right)^*$ ($m$ times).

For finite-dimensional vector spaces, the dual of the tensor product is the tensor product of the dual, therefore:

$D_xD^{m-1}f$ can be seen as an element of $\left({\mathbb{R}^n}^* \otimes ... \otimes {\mathbb{R}^n}^*\right)$ ($m$ times).