2

I wish to compute $$ D_x \left( A(x)^{-1} v(x) \right) $$ where $x$ is a vector, $A$ is a matrix-valued function such that $A(x)$ is always invertible, and $v(x)$ is a vector-valued function.

Looking here, I know that $$ \left( D_x \left( A(x)^{-1} v(x) \right)\right)_{ij}=\sum_k \left( \frac{\partial A(x)_{ik}^{-1}}{\partial x_{j}} v(x)_k + A(x)^{-1}_{ik} \frac{\partial v(x)_k}{\partial x_j}\right). $$ The second term in the sum is easy, and is just $(A(x)^{-1} D_x v(x))_{ij}$ where $D_x v(x)$ is the Jacobian of $v$. For the first term, I do not know how to compute $$ \frac{\partial A(x)_{ik}^{-1}}{\partial x_{j}}. $$ I tried to use the result given here, which says that if $x$ is a scalar variable then $$ \frac{\partial A(x)^{-1}}{\partial x}=-A(x)^{-1} \frac{\partial A(x)}{\partial x}A(x)^{-1}, $$ but I cannot extend it to vector variable $x$ and to the three-dimensional tensor $$ \left( D_xA(x)^{-1}\right)_{ijk} = \frac{\partial A(x)_{ik}^{-1}}{\partial x_{j}}. $$

G. Gare
  • 1,398
  • 8
  • 25
  • 1
    If you consider the mapping $f$ from invertible matrices ($GL(n)$) to invertible matrices given by $f(A) = A^{-1}$, then $df_A(B) = -A^{-1}BA^{-1}$ for any matrix $B$. Your general case follows from the chain rule. Just compose $f$ with $g\colon\Bbb R^n\to GL(n)$. $D(f\circ g)(x) = Df(g(x))Dg(x) = -A^{-1}BA^{-1}$ where $A=g(x)$ and $B = Dg(x)$. – Ted Shifrin Oct 06 '22 at 20:20
  • Is it still the case for $B$ being a three-dimensional tensor? – G. Gare Oct 07 '22 at 07:29

2 Answers2

1

To try to reduce clutter, I will use the following notation for the derivative: $$ D^h[f(x)] $$ where $x$ is implicitly being differentiated, $h$ is the point at which the linear map $D[f(x)]$ is evaluated. We will also use the notation $$ D^h[f]_y $$ To denote the derivative of $f$ at the point $y$, with the linear map $D[f]_y$ evaluated at $h$. We have the equality $$ D^h[f(x)] = D^h[f]_x. $$ Additionally, an expression like $$ \dot D[f(\dot x)g(x)] $$ means that only $\dot x$ is being differentiated, and the undotted $x$ is held constant; in a more verbose notation $$ \dot D[f(\dot x)g(x)] = \Bigl[D_y[f(y)g(x)]\Bigr]_{y=x}. $$


Applications of the chain rule give $$\begin{aligned} D^h[A(x)^{-1}v(x)] &= \dot D^h[A(\dot x)^{-1}v(x)] + \dot D^h[A(x)^{-1}v(\dot x)] \\ &= D^h[A(x)^{-1}]v(x) + A(x)^{-1}D^h[v(x)]. \end{aligned}$$ Now let $I(X) = X^{-1}$ be the matrix inversion map so that $A(x)^{-1} = (I\circ A)(x)$. We can then use the chain rule to write $$ D[I\circ A]_x = D[I]_{A(x)}\circ D[A]_x, $$ and we need only determine $D[I]$. From the defining equation of $I$, $$\begin{aligned} I(X)X = 1 &\implies \dot D^H[I(\dot X)X] + \dot D^H[I(X)\dot X] = 0 \\ &\implies D^H[I(X)]X + I(X)H = 0 \\ &\implies D^H[I(X)] = -X^{-1}HX^{-1}. \end{aligned}$$ Thus $$ D^h[A(x)^{-1}] = D^h[I\circ A]_x = -A(x)^{-1}D^h[A(x)]A(x)^{-1}. $$ Finally, we have $$ D^h[A(x)^{-1}v(x)] = -A(x)^{-1}D^h[A(x)]A(x)^{-1}v(x) + A(x)^{-1}D^h[v(x)]. $$

  • Thanks for your answer. The notation you use is the most confusing I have ever seen, but the final result is valuable. Why do you write $f(\dot x)$? You differentiate $f$, not its argument. – G. Gare Oct 07 '22 at 07:26
  • @G.Gare Honestly, I'm experimenting with notation a bit; this is the first time I've use a superscipt like $D^h$. Normally you would write something like $D[f]_x(h)$ instead, but that's one of the reason I've always hated total differential notation, especially in this case where we're working with matrix-valued functions. As for why $f(\dot x)$... – Nicholas Todoroff Oct 07 '22 at 15:18
  • ...there's the situation where $f$ has multiple arguments; for example $$D[f(x,x)] = \dot D[f(\dot x, x)] + \dot D[f(x,\dot x)]$$ or $$D[f(x, g(x))] = \dot D[f(\dot x, g(x))] + \dot D[f(x,g(\dot x))].$$ In the case where we're omitting $x$ though I would put the dot over $f$ like $$D[fg] = \dot D[\dot fg] + \dot D[f\dot g].$$ – Nicholas Todoroff Oct 07 '22 at 15:22
0

Actually, you can express your derivative in a more compact form.

Since

$v(x)=A(x)(A(x)^{-1}v(x))$ (Eq. 1)

differentiating (Eq.1) w.r.t. $x$, gives:

$\frac{d}{dx}(v(x))=(\frac{d}{dx}A(x))A(x)^{-1}v(x)+A(x)\frac{d}{dx}(A(x)^{-1}v(x))$ (Eq. 2)

Thus, solving (Eq. 2) for $d(A(x)^{-1}v(x))/dx$ yields:

$\frac{d}{dx}(A(x)^{-1}v(x))=A(x)^{-1}[\frac{d}{dx}v(x)-(\frac{d}{dx}A(x))(A(x)^{-1}v(x))] $ (Eq.3)

Notice that Eq. (3) expresses the derivative of $A^{-1}v(x)$ as only function of the derivatives of $A(x),v(x)$ that are known in your problem. Also, $dv(x)/dx$ is an $n\times n$ matrix and $dA(x)/dx$ is an $n\times n\times n$ tensor. If you are not familiar about matrix derivatives with respect to a vector, have a look on some Matrix Calculus source, e.g. Click here

Example:

Let $v(x)=\begin{bmatrix} v_1(x)\\ v_2(x) \end{bmatrix}$ and $x=[x_1\,x_2\,x_3]$. Then:

$\frac{dv(x)}{dx}=\begin{bmatrix} \frac{\partial v_1(x)}{\partial x_1}&\frac{\partial v_1(x)}{\partial x_2}&\frac{\partial v_1(x)}{\partial x_3}\\ \frac{\partial v_2(x)}{\partial x_1}&\frac{\partial v_2(x)}{\partial x_2}&\frac{\partial v_2(x)}{\partial x_3} \end{bmatrix}$

Gino
  • 350
  • Good idea, but on the right-hand side of your final expression you still have $dv(x) / dx$ which is defined with $dy(x) / dx$ so you get stuck. – G. Gare Oct 06 '22 at 11:16
  • Nope. In my final expression you need to replace y(x) with $inv(A)*v$ at the left hand side, because both A(x) and v(x) are known in your problem! – Gino Oct 06 '22 at 13:20
  • I've added equation 3 where I've made such a substitution for you in my answer. Please, check it. Hope it is clear now! – Gino Oct 06 '22 at 13:26
  • As I wrote above, in Eq. 3 you have $dv(x)/dx$ at the right-hand side. If you see what $dv(x)/dx$ is given by in your second equation (just below "Differentiating (Eq. 1)") you will notice that you have $dy(x)/dx$. Ultimately, your last equation is useless. – G. Gare Oct 06 '22 at 15:19
  • If you don't like $y(x)$, remove it from my answer. Forget about Eq. (1). So, start from Eq. (2) that would result from differentiating the trivial identity: v = A (A^(-1) v). Eq.(3) is not useless, it is just the closed form solution to your problem. You could just implement it and forgetting all the above derivation. – Gino Oct 06 '22 at 16:03
  • Eq. (3) is symbolic, it is not an iterative formulation. Let me know if you understand. Othewise, I will derive everything again without using the auxiliary (known) variable y. – Gino Oct 06 '22 at 16:10
  • I have re-written the solution. Hope it is clear. BTW, you can use this approach to also prove what you've found online: A A^(-1)=I_n, which gives (dA/dx) A^(-1) + A (dA^(-1)/dx) = 0, then (dA^(-1)/dx) = - A^(-1) (dA/dx) A^(-1) – Gino Oct 06 '22 at 16:53
  • the problem is not that I don't understand the substitution $y(x) = A(x)^{-1}v(x)$, that's clear. The problem is that the derivative of $d v(x) / dx$ (that you need to know in the last line of your computation) is defined through the derivative $d(A(x)^{-1}v(x)) / dx$ which is exactly the quantity I need to compute. Please read my comments carefully before writing I don't understand. – G. Gare Oct 07 '22 at 07:23
  • Hi Gare, sorry. I didn't mean to be rude. I am assuming that you know the vector v(x), so you can just compute dv(x)/dx. BTW, you have received another answer that results in the same formula I have developed. Best wishes. – Gino Oct 07 '22 at 07:34
  • Gino, you do know $v(x)$, but you don't know $dv(x) / dx$. I don't know how to break it down to you in a simpler way than above. Just think it through 25 seconds. The other answer goes a step forward when it computes the derivative of a matrix inverse, that you never do. – G. Gare Oct 07 '22 at 07:35
  • I have added an example for you in my answer about how to calculate dv/dx. Also, in the other answer you have the same quantity, so I hope it helps. The other answer doesn't provide any step forward, it is just a different way, but the final results are identical. Goodbye – Gino Oct 07 '22 at 07:48