1

Why does the Jacobian matrix

$$ J = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \dots & \frac{\partial f_n}{\partial x_n} \end{pmatrix} $$

work, and where does it come from?

I came across this matrix in a multivariable calculus context where it was used to do multivariate substitution. It seems so arbitrary to me and I don't understand where it comes from. Can anyone give some insight or intuition about this?

John Doe
  • 580
  • 2
  • 14

4 Answers4

3

In functions of one variable, the derivative is the slope of the tangent line to graph of $f(x)$.

The tangent line to the curve $y=f(x)$ at $x=a$ is given by,

$h(x)=f(a)+f'(x)(x-a)$

As $x \rightarrow a$, $f(x)$ approaches the tangent line $h(x)$. On a computer algebra system like Mathematica, if you zoom in at the point $x=a$, $f(x)$ looks more and more like the tangent line $h(x)$. $f'(x)$ is it's slope.

In fact, a function is said to be differentiable, if there exists the limit :

$$\lim_{x \rightarrow a} \frac{f(x)-h(x)}{x-a}=0$$

The Jacobian matrix plays the role of the derivative of a vector-valued function $\mathbf{f}$,

$$\mathbf{f}=(f_1(x_1,\ldots,x_n),f_2(x_1,\ldots,x_n),\ldots,f_m(x_1,\ldots,x_n))$$

of $n$ input variables and $m$ output variables.

For concreteness assume, $m=1$, $n=2$, that is a function of two variables.

Analogous to the single variable case, the tangent plane to a surface $z=f(x,y)$ at the point $(x_0,y_0)$ is given by,

$h(x,y)=f(a,b)+Df(x,y)\cdot (x-a,y-b)$

where $Df(x,y)=\left[\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right]^T$ is the Jacobian matrix.

As $(x,y) \rightarrow (a,b)$, the surface $f(x,y)$ approaches the tangent plane $h(x,y)$. On Mathematica, if you zoom in at the point $(a,b)$, $f(x,y)$ looks more and more like the tangent plane. $\partial f/\partial x$ is the increase in the function value, for small bump $\Delta x$. $\partial f/\partial y$ is the increase in the function value, for small bump $\Delta y$.

On similar lines, a vector valued function is said to be differentiable, if there exists the limit :

$$\lim_{\mathbf{x} \rightarrow \mathbf{a}} \frac{\mathbf{f(x)}-\mathbf{h(x)}}{||\mathbf{x}-\mathbf{a}||}=0$$

Quasar
  • 5,410
2

The Jacobian matrix by itself is not the fundamental concept. The matrix by itself is simply a useful computational tool (actually sometimes it's useful, sometimes it completely obscures the "big picture"). What is important is the notion of differentiability; see this answer for some additional heuristic and motivating remarks.

In general, the definition of a differentiability is as follows:

Let $V,W$ be normed vector spaces over a field (either $\Bbb{R}$ or $\Bbb{C}$), and let $A\subseteq V$ be a non-empty open set. Let $f:A\to W$ be a function, and let $\alpha\in A$ be a point. We say $f$ is differentiable at $\alpha$ if there exists a (continuous) linear transformation $T:V\to W$ such that \begin{align} \lim_{h\to 0}\dfrac{\lVert f(\alpha+h) - f(\alpha) - T(h)\rVert_W}{\lVert h \rVert_V} &= 0 \tag{$*$} \end{align} In this case, we can prove $T$ is unique, and so we denote it using any of the symbols $Df(\alpha), Df_{\alpha}, df(\alpha), df_{\alpha}$ or really any other notation which reminds you of a derivative (depends on the author).

Now, the derivative $Df_{\alpha}$ (which is by definition a linear transformation $V\to W$) is the fundamental object. Recall from basic linear algebra that if $\dim V = n$ and $\dim W = m$ are finite-dimensional spaces, then given any linear transformation $T:V\to W$, if we choose a basis $\beta$ on $V$ and a basis $\gamma$ on $W$, then we obtain a certain $m\times n$ matrix $[T]_{\beta}^{\gamma}$.

Partial derivatives come into the picture as a calculational tool if you assume $V=\Bbb{R}^n$ and $W=\Bbb{R}^m$. In this case, we choose the standard ordered basis $\sigma_n = \{e_1, \dots, e_n\}$ on $V= \Bbb{R}^n$ and $\sigma_m = \{e_1, \dots, e_m\}$ on $W=\Bbb{R}^m$. In this case, the matrix representation $[Df_{\alpha}]_{\sigma_n}^{\sigma_m}$ will be a certain $m\times n$ matrix, called the Jacobian matrix, and it is usually denoted as $f'(\alpha)$ or $Df_{\alpha}$ (sometimes people blur the distinction between a linear transformation and its matrix representation), or even $Jf_{\alpha}$ or something like $J_{f}(\alpha)$ (I like the $f'(\alpha)$ notation because it agrees with the single-variable case $V=W=\Bbb{R}$; i.e $n=m=1$).

It turns out that the Jacobian matrix $f'(\alpha)$ is exactly the matrix of partial derivatives: \begin{align} f'(\alpha):= [Df_{\alpha}]_{\sigma_n}^{\sigma_m} &= \begin{pmatrix} \partial_1 f_1(\alpha) & \dots & \partial_n f_1(\alpha) \\ \vdots & \ddots & \vdots \\ \partial_1f_m(\alpha) & \dots & \partial_nf_m(\alpha) \end{pmatrix} \end{align}

So, now if I wanted to calculate the evaluation of the derivative on a vector, such as $Df_{\alpha}(h) \in \Bbb{R}^m$, all we have to do is basic linear algebra: \begin{align} Df_{\alpha}(h) &= Df_{\alpha}\left(\sum_{j=1}^n h_j e_j\right) =\sum_{j=1}^n h_jDf_{\alpha}(e_j) = \sum_{j=1}^n \sum_{i=1}^m h_j \partial_j f_i(\alpha) \, e_i \end{align}


So far we've been talking about differential calculus. In your question you mentioned substitution, so I guess you mean in the context of integration? Well, the determinant of the derivative of your change of variables (Jacobian determinant for short) comes up as the "fudge factor" which takes into account how volumes of regions get distorted when you change from one set of coordinates to another. In this answer I briefly outline the heuristics of why this works.

peek-a-boo
  • 55,725
  • 2
  • 45
  • 89
1

Let $f\colon\mathbb{R}^m\rightarrow\mathbb{R}^n$ be a function, $x\in\mathbb{R}^m$ a point and pick a vector $v\in\mathbb{R}^m$. The difference quotient $[f(x+tv)-f(x)]/t$ measures the rate of change in the value of $f$ as the input changes by travelling $t$ units in the $v$ direction. If the function $f$ is continuously differentiable, then the limit of this quotient exists as $t\rightarrow0$ and is denoted $\partial_vf(x)$. It is the "instantaneous rate of change of $f$ in the direction $v$" (this generalizes the interpretation of the classical derivative for functions of one variable). If $Jf(x)$ is the Jacobian of $f$ at $x$, then $Jf(x)\cdot v=\partial_vf(x)$, so the Jacobian matrix contains all the information about the instantaneous rates of change of $f$ in all possible directions.

Thorgott
  • 11,682
0

The Jacobian matrix is a listing of all the function's derivatives relative to the standard basis. It tells you how fast the function changes in each of its various dimensions, as the input coordinates change.

It plays the same role that the derivative does in single variable calculus.

nomen
  • 2,707
  • But why does listing all of its derivatives in a matrix tell us that? Is there any proof for this? – John Doe Aug 24 '20 at 18:50
  • I'm not sure what you want proved. A derivative tells us a rate of change. A specific derivative tells us a rate of change in a direction. All the derivatives tell us all the rates of change. – nomen Aug 24 '20 at 18:53
  • I get that, but for example why is the top-left term $\frac{\partial f_1}{\partial x_1}$ and not for example $\frac{\partial f_n}{\partial x_n}$. I don't understand why the matrix is set up the way it is. – John Doe Aug 24 '20 at 18:57
  • 1
    It could be set up in any arbitrary way, but all you're really asking/doing is changing the matrix's basis. It contains exactly the same information as the Jacobian, and the Jacobian could be recovered by undoing the change of basis (which is a linear isomorphism). – nomen Aug 24 '20 at 19:09
  • As you may be aware, $\Delta y$ is roughly equal to $f'(x)\cdot \Delta x$. If instead $f$ is a function that maps an input vector $(x_1,x_2,\ldots,x_n)$ to an output vector $(z_1,z_2,\ldots,z_m)$, and you are interested to know the change in the output vector $\Delta \mathbf{z}$ for a small bump $\Delta \mathbf {x}$ in the input vector, you can multiply the Jacobian matrix $D$ by the increment vector $\Delta \mathbf {x}$.

    As $D$ is a matrix, the matrix-column vector product $\Delta \mathbf{z} = D \cdot \Delta \mathbf{x}$ is well defined.

    – Quasar Aug 24 '20 at 19:41
  • 1
    @JohnDoe To motivate why it's a matrix consider that a transformation is linear if $\varphi(cx)=c\varphi(x)$ and $\varphi (x+y)=\varphi (x) + \varphi (y)$. The derivative is one such transformation and so the we can use a matrix to represent it. – CyclotomicField Aug 24 '20 at 19:59
  • @CyclotomicField: that's a nice point. :-) – nomen Aug 24 '20 at 20:08