2

From some lecture notes I am trying to puzzle through ....

"... the derivative or Jacobian of a smooth map $f: \mathbb{R}^m \rightarrow \mathbb{R}^n$ at a point $x$ is a linear map $Df: \mathbb{R}^m \rightarrow \mathbb{R}^n$. In terms of partial derivatives, $Df_x(X) = (\sum_j\partial_{x_j}f_1 \cdot X_j, \sum_j \partial_{x_j}f_2\cdot X_j, ...)$ ... "

I'm so confused I'm not even sure where to begin. Well, first, shouldn't the derivative be a map $Df:\mathbb{R}^m\rightarrow \mathbb{R}^m\times\mathbb{R}^n$? Third, I am familiar with 3D integral calculus, and the only Jacobian I heard discussed there doesn't look like this at all, except, of course, that they both involve partial derivatibes. Also, I don't even know what $f_1 \cdot X_j$ means.

Thanks.

Asaf Karagila
  • 393,674
  • @Asaf Karagila I now understand that "jacobian" isn't an "official" topic tag. But wouldn't it be a good one to have? – bob.sacamento Aug 31 '14 at 18:48
  • I'm a big believer in have a discussion before adding new tags (and in general opposed to adding new tags, unless strong evidence in favor are presented). If you think that it would make a good addition (and it might be), you should head to the [meta] site, and start a discussion thread about this. Preferably bring evidence to support that this would be a good tag (e.g. the fact that many questions ask about this topic, and that it will be hard to locate this information via other reasonable means). Then add the tag. – Asaf Karagila Aug 31 '14 at 18:52

2 Answers2

6

The best way to think about the derivative is: \begin{equation*} \tag{$\spadesuit$}f(x + \Delta x) \approx f(x) + f'(x) \Delta x. \end{equation*} The approximation is good when $\Delta x$ is small. This equation expresses the fact that $f$ is "locally linear" at $x$.

How can we make sense of ($\spadesuit$) when $f:\mathbb R^n \to \mathbb R^m$? \begin{equation*} f(\underbrace{x}_{n \times 1} + \underbrace{\Delta x}_{n \times 1}) \approx \underbrace{f(x)}_{m \times 1} + \underbrace{f'(x)}_{?} \underbrace{\Delta x}_{n \times 1}. \end{equation*}

It appears that $f'(x)$ should be something that, when multiplied by an $n \times 1$ column vector, returns an $m \times 1$ column vector. In other words, $f'(x)$ should be an $m \times n$ matrix.

If we prefer to think in terms of linear transformations rather than matrices, we can write \begin{equation*} f(x + \Delta x) \approx f(x) + Df(x) \Delta x. \end{equation*} Here $Df(x)$ is a linear transformation that takes $\Delta x$ as input, and returns $f'(x) \Delta x$ as output. This equation is what it means to be "locally linear" in the multivariable case.

Taking this as our starting point, it's not too hard to show that \begin{equation*} f'(x) = \begin{bmatrix} \frac{\partial f_1(x)}{\partial x_1} & \cdots & \frac{\partial f_1(x)}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m(x)}{\partial x_1} & \cdots & \frac{\partial f_m(x)}{\partial x_n} \end{bmatrix}. \end{equation*} (The functions $f_i$ are the component functions of $f$.)

If \begin{equation*} X = \begin{bmatrix} X_1 \\ \vdots \\ X_n \end{bmatrix}, \end{equation*} then \begin{equation*} f'(x) X = \begin{bmatrix} \sum_{j=1}^n \frac{\partial f_1(x)}{\partial x_j} X_j \\ \vdots \\ \sum_{j=1}^n \frac{\partial f_m(x)}{\partial x_j} X_j \end{bmatrix}, \end{equation*} as you can see just by doing the matrix-vector multiplication. This is the equation given in your question.

littleO
  • 51,938
  • OK, so as I understand it, my problems have involved notation and nomenclature. I was expecting the writer to say that the derivative was actually the matrix you have in your answer. To my mind, that matrix represents a mapping from R^m to R^(mxn). (I wrote that wrongly in my question. Oh well.) That is, to my mind, the matrix is the result of a mapping. But he is saying the matrix is a map (or a representation of ...). And the "inputs" to the map are the X's, not to be confused with the x's. The X's being sort "delta x's". How's that? – bob.sacamento Aug 31 '14 at 18:43
  • 2
    A key point is that an $m \times n$ matrix represents a linear transformation from $\mathbb R^n$ to $\mathbb R^m$. Not a mapping from $\mathbb R^m$ to $\mathbb R^{m \times n}$. The matrix is not the result of this mapping. The matrix represents a linear transformation that takes a vector as input and returns a vector as output. You are right when you say that $X$ is an input to this linear transformation. – littleO Aug 31 '14 at 19:02
1

If $f: \Bbb R^m \to \Bbb R^n$ is a differentiable mapping then at every $x \in \Bbb R^m$ its derivative $Df_x$ is a linear mapping from $\Bbb R^m$ to $\Bbb R^n$, therefore $Df_x \in \operatorname{Lin}(\Bbb R^m, \Bbb R^n)$. The elements of the matrix representing $Df_x$ are the partial derivatives of the partial functions of $f$. When you have $m=n$ you have the case of the Jacobian matrix.

Consider the implications of your question: if $Df$ were a mapping from $\Bbb R^m$ to $\Bbb R \times \Bbb R^n$ then for $m=n=1$ we'd have that the derivative of real functions of one variable is a point in $\Bbb R^2$.

Applying a vector $X \in \Bbb R^m$ to $Df_x$ you obtain a vector in $\Bbb R^n$ whose entries are the product of rows in $Df_x$ by the vector $X$. In symbols,

$$\begin{bmatrix} \partial_1 f_1 & \cdots & \partial_m f_1 \\ \vdots & \ddots & \vdots \\ \partial_1 f_n & \cdots & \partial_m f_n \end{bmatrix} \begin{bmatrix} X_1 \\ \vdots \\ X_m \end{bmatrix} = \begin{bmatrix} \sum_{j=1}^m \partial_j f_1 X_j \\ \vdots \\ \sum_{j=1}^m \partial_j f_n X_j \end{bmatrix}.$$

Mark Fantini
  • 5,523
  • I wrote incorrectly in my question. I should have said a derivative looks to me like a map from R^m to R^(mxn), kind of like a gradient of z(x,y) is a map from R^2 to R^(2x1). Or that's the way I was looking at it. Thanks for your answer. – bob.sacamento Aug 31 '14 at 18:47