0

Let's say I have a $m\times m$ matrix function $A=(a_{ij})$, where each $a_{ij}:\mathbb R^n\to\mathbb R$ is a scalar function. Let's say I also have a vector valued function $f:\mathbb R^n\to\mathbb R^m$. Then we can define another vector valued $g:\mathbb R^n\to\mathbb R^m$ such that $g=Af$, where, for each $x\in\mathbb R^n$, $(Af)(x)$ would be the product of the matrix $A(x)$ with the vector $f(x)$.

Is there any relation between the Jacobian of $g$, $J(g)$, and $A$ and $f$?

I ask for a relation in the general case, but the question arose working with the Jacobian of $f=(f_1,\ldots,f_n)$ itself, $J(f)$ being the matrix $A$ in this scenario. The notes I was reading said that if $g=J(f)f$ then we would have

$$J(g) = J(f)J(f)^{\text{T}}+\sum_{i=1}^mH(f_i)f_i$$

where $H(f_i)$ would be the Hessian of $f_i:\mathbb R^n\to\mathbb R$. I've been trying to derive this myself, and I think the transpose written there is wrong, and it should be applied to the Hessians (maybe?).

Any thoughts on this?

Darsen
  • 3,549
  • 1
    We have \begin{align} (Af)(x + y) &= A(x + y)f(x + y) \ &= (A(x) + DA(x)y + o(y))(f(x) + Df(x)y + o(y)) \ &= A(x)f(x) + A(x)Df(x)y + (DA(x)y)f(x) + o(y). \end{align} Thus $$D(Af)(x)y = A(x)Df(x)y + (DA(x)y)f(x).$$ – Mason May 29 '21 at 19:16
  • What happens with the term $(DA(x)y)(Df(x)y)$ ? – Darsen May 29 '21 at 19:46
  • 1
    That term is $o(y)$ since $$\lVert (DA(x)y)Df(x)y \rVert \leq \lVert DA(x) \rVert \lVert Df(x) \rVert \lVert y \rVert^2.$$ – Mason May 29 '21 at 23:13

1 Answers1

1

I would suggest you not think only in terms of matrices because this very quickly gets unwieldy. We still have a product rule in this case, namely for any $x,\xi\in\Bbb{R}^n$, \begin{align} Dg_x[\xi]&= (DA_x[\xi])\cdot f(x) + A(x)\cdot (Df_x[\xi]) \end{align} The meaning of this is that for each point $x$ in the domain of the functions,

  • $Dg_x\in L(\Bbb{R}^n,\Bbb{R}^m)$ is a linear transformation (hence by a choice of basis can be represented as an $m\times n$ matrix called the Jacobian matrix $Jg_x$; but I would highly suggest you avoid matrices whenever possible). So, it eats a vector $\xi\in\Bbb{R}^n$ and spits out a vector $Dg_x[\xi]\in\Bbb{R}^m$.
  • $DA_x\in L(\Bbb{R}^n, M_{m\times m}(\Bbb{R}))$ is a linear transformation. This is a linear transformation which eats a vector $\xi\in\Bbb{R}^n$ and spits out a matrix $DA_x[\xi]\in M_{m\times m}(\Bbb{R})$. Hence, in the first term of my equation above I was able to multiply this matrix by the vector $f(x)\in\Bbb{R}^m$. The fact that $DA_x$ is a linear transformation between $\Bbb{R}^n$ and $M_{m\times m}(\Bbb{R})$ means that it is rather unwieldy to assign a matrix representation to this; particularly because if you want to "vectorize" $M_{m\times m}(\Bbb{R})$, you would have to make a choice of the ordering of the elements when you decide to identify with $\Bbb{R}^{m^2}$, and there are sooo many possible conventions here. So, this is why one always encounters so many formulae when dealing with derivatives of matrices: it all stems from the desire to express into a matrix something which shouldn't be expressed into a matrix.
  • $Df_x\in L(\Bbb{R}^n,\Bbb{R}^m)$ is a linear transformation which eats a vector $\xi\in\Bbb{R}^n$ and spits out a vector $Df_x[\xi]\in\Bbb{R}^m$.

I would suggest you take a look at this answer of mine for a general product rule.


Anyway, if you wish to be an odd-ball and express the first equation with a bunch of indices, then we have for all $i\in\{1,\dots, m\},j\in\{1,\dots, n\}$, \begin{align} \frac{\partial g_i}{\partial x^j}(x)&=\sum_{k=1}^m\frac{\partial A_{ik}}{\partial x^j}(x)\cdot f_k(x) + \sum_{k=1}^mA_{ik}(x)\cdot \frac{\partial f_k}{\partial x^j}(x). \end{align} So, the fact that $\frac{\partial A_{ik}}{\partial x^j}$ has three indices is already an indication that matrix notation is not suitable for the task at hand.

peek-a-boo
  • 55,725
  • 2
  • 45
  • 89
  • This really helps; thanks! One question though, if $A$ is $Df$, is there any way there are hessians involved in the expression of $Dg$? Specifically in $DA_x[\xi])\cdot f(x)$. – Darsen May 29 '21 at 20:42
  • @Darsen yes if $A=Df$ then $DA=D^2f$ is what I would call the Hessian. For the meaning, just unwind the definitions. We have $A:U\subset\Bbb{R}^n\to L(\Bbb{R}^n,\Bbb{R}^m)$ so $DA$ is a mapping $U\to L(\Bbb{R}^n, L(\Bbb{R}^n,\Bbb{R}^m))$, but this latter space is canonically isomorphic to the space of bilinear maps $L^2(\Bbb{R}^n;\Bbb{R}^m)$. So, for each $x$ in the domain, $DA_x=D^2f_x$ can be thought of as a bilinear mapping $\Bbb{R}^n\times\Bbb{R}^n\to\Bbb{R}^m$. SO, we would write that term as $D^2f_x[\xi,f(x)]$, which is just the hessian term – peek-a-boo May 29 '21 at 20:52
  • So in this case we would have $\begin{align} \frac{\partial g_i}{\partial x^j}(x)&=\sum_{k=1}^m\frac{\partial^2f_i}{\partial x^j\partial x^k}(x)\cdot f_k(x) + \sum_{k=1}^m\frac{\partial f_i}{\partial x^k}(x)\cdot \frac{\partial f_k}{\partial x^j}(x), \end{align}$ but I'm struggling to see how this would relate to the hessians, since $D^2f_x=(D^2f_{1x},\ldots,D^2f_{mx})$. I think it would be $Dg_x=\big(D^2f_{1x}\cdot f(x)\mid\cdots;\mid D^2f_{mx}\cdot f(x)\big)^\text{T}$, although I think it is a little bit strange. But is it correct? – Darsen May 29 '21 at 23:18
  • @Darsen I just realized some of your dimensions don't match up. You have to assume $m=n$ otherwise the product $(Jf)\cdot f$ isn't even defined. Now, with this, the first term is $\sum_{k=1}^n\frac{\partial^2f_i}{\partial x^j\partial x^k}(x)f_k(x)=D^2(f_i)_x[e_j,f(x)]$, i.e the Hessian of $f_i$ evaluated on the vectors $(e_j,f(x))$. In matrix notation one might write this as $(e_j)^t\cdot (Hf_i)_x\cdot f(x)$ (or as $(f(x))^t\cdot (Hf_i)_x\cdot e_j$; it doesn't matter since the Hessian is symmetric). What you wrote is wrong since you're ignoring the second term, other than that I guess its fine – peek-a-boo May 30 '21 at 00:14
  • You're right, I forgot about the second term. As to the dimensions, I don't see the problem: $f(x)$ is a $m\times 1$ vector and $Jf_x$ is a $n\times m$ matrix, so what's wrong with the dimensions? – Darsen May 30 '21 at 02:07
  • @Darsen the problem is I'm used to thinking of the Jaocbian matrix of $f:\Bbb{R}^n\to\Bbb{R}^m$ as an $m\times n$ matrix (because that's how several textbooks introduce it: Spivak, Munkres, Loomis Sternberg, Cartan, Dieudonne, Friedberg Insel Spence etc); i.e the transpose of what you have :) You see, this is just another illustration of why it completely absurd to resort to matrices when there is no need to. When dealing with linear transformations $\Bbb{R}^n\to\Bbb{R}^m$, one never runs into such "type" errors/issues simply because of how one chooses to display a certain array of numbers. – peek-a-boo May 30 '21 at 02:09
  • No problem. I actually agree with you, but the expression I wrote came up in some notes on convex analysis and that's why I asked. Just to be sure, so there's nothing wrong with $Jf_x f(x)$? – Darsen May 30 '21 at 02:30
  • @Darsen nope, with your conventions everything is fine. – peek-a-boo May 30 '21 at 02:32