Understanding the derivative as a linear transformation

Question

It's been a while now I am studying multivariable calculus and the concept of differentiation in space (or higher dimension). I saw relative posts but one question remains. I can't understand the concept of linear transformation that we use to define the Frechet derivative. In single variable the derivative is the best linear approximation of the function, so I guess this extends to multivariable but we can't use a number for this (why?) and instead we use a matrix. Can someone clears this for me in plain english?

score 40 · Accepted Answer · edited Mar 05 '23 at 10:40

The point is that for a function $f : \mathbb{R} \to \mathbb{R}$, $f'(a)$ defines a linear transformation, just like $Df({\bf a})$ does for a function $f : \mathbb{R}^n \to \mathbb{R}^m$.

In single variable calculus, we are taught that the derivative of $f(x)$ at a point $x = a$ is a real number $f'(a)$ which represents the slope of the tangent line to the graph of $f(x)$ at the point $x = a$. The equation of this tangent line is $y = f'(a)(x-a) + f(a)$; this is the best linear approximation of $f(x)$ near $x = a$, not the derivative itself.

If we do the change of variables $x^* = x - a$, $y^* = y - f(a)$, the tangent line becomes $y^* = f'(a)x^*$; this is a linear function, which is just a linear transformation $\mathbb{R} \to \mathbb{R}$, and the standard matrix (i.e, wrt the canonical base of $\mathbb{R}$) of this linear transformation is the $1\times 1$ matrix $[f'(a)]$.

In higher dimensions, we start with $f : \mathbb{R}^n \to \mathbb{R}^m$ and at a point ${\bf a} \in \mathbb{R}^n$ we have the derivative $Df({\bf a})$ which is an $m\times n$ matrix $Df({\bf a}) = \left[\frac{\partial f_i}{\partial x_j}({\bf a})\right]$ which is sometimes called the Jacobian of $f$ at ${\bf a}$. Then the best linear approximation of $f({\bf x})$ near ${\bf x} = {\bf a}$ is ${\bf y} = Df({\bf a})({\bf x}-{\bf a}) + f({\bf a})$.

If we do the change of variables ${\bf x}^* = {\bf x} - {\bf a}$, ${\bf y}^* = {\bf y} - f({\bf a})$, the tangent line becomes ${\bf y}^* = Df({\bf a}){\bf x}^*$; this is a linear transformation $\mathbb{R}^n \to \mathbb{R}^m$, and the standard matrix of this linear transformation is the $m\times n$ matrix $Df({\bf a})$.

So the derivative in single variable calculus is just a special case of the derivative in multivariable calculus; just set $m = n = 1$.

As for your question, 'why can't we use a number for the best linear approximation for a function $\mathbb{R}^n \to \mathbb{R}^m$?', note that the approximating function must be $\mathbb{R}^n \to \mathbb{R}^m$, and because it is linear, it must be of the form ${\bf y} = A{\bf x} + {\bf b}$ where $A$ an $m \times n$ matrix and ${\bf b} \in \mathbb{R}^m$. By enforcing the condition that the linear approximation must agree with the function at ${\bf x} = {\bf a}$, we find that the linear approximation must be of the form ${\bf y} = A({\bf x} - {\bf a}) + f({\bf a})$. So the only thing left to determine is the $m\times n$ matrix $A$, not a single number as in single variable calculus.

Thanks a lot, I finally understood it. I was totally confused about with the linear transformations but this great explanation made it all clear. +1 — , Dec 30 '13 at 05:56
about the derivative of$ f: \mathbb R^n \to \mathbb R^m$, you haven't explained what is Df(a), you just said that the best linear approximation is $Df(a)(x−a)+f(a)$ @MichaelAlbanese — PNT, May 29 '21 at 14:52
@Yassir: I said what it was in the sentence before the one you referred to: "we have the derivative $Df({\bf a})$ which is an $m\times n$ matrix $Df({\bf a}) = \left[\frac{\partial f_i}{\partial x_j}({\bf a})\right]$". — Michael Albanese, Aug 16 '21 at 01:30
@frhack: I don't know any book that has this specific point of view, but most multivariable calculus textbooks should mention the relationship between the Jacobian and the best linear approximation. Maybe the book mentioned in the other answer is a good place to start. — Michael Albanese, Oct 19 '22 at 01:59

score 7 · Answer 2 · edited Aug 24 '19 at 21:58

7

I have little to add to Michael's excellent answer. However, Dieudonne said it best: this is the introduction to his chapter on differentiation in Modern Analysis Chapter VIII.

The subject matter of this Chapter is nothing else but the elementary theorems of Calculus, which however are presented in a way which will probably be new to most students. That presentation, which throughout adheres strictly to our general "geometric" outlook on Analysis, aims at keeping as close as possible to the fundamental idea of Calculus, namely the "local" approximation of functions by linear functions. In the classical teaching of Calculus, the idea is immediately obscured by the accidental fact that, on a one-dimensional vector space, there is a one-to- one correspondence between linear forms and numbers, and therefore the derivative at a point is defined as a number instead of a linear form. This slavish subservience to the shibboleth of numerical interpretation at any cost becomes much worse when dealing with functions of several variables...

In other words, the confusion you face follows from thinking of the derivative wrongly in calculus I. "Wrong" in the sense that the idea does not generalize to higher dimensions directly. The derivative of a function from $\mathbb{R}^n \rightarrow \mathbb{R}^m$ is not another function from $\mathbb{R}^n \rightarrow \mathbb{R}^m$. Instead, it's a linear transformation, or if you prefer the Jacobian viewpoint, a matrix of functions.

edited Aug 24 '19 at 21:58

A_P

1,007

answered Dec 30 '13 at 06:01

James S. Cook

16,755

Thanks for answering and for your time. My false reasoning was behind this fact. I was trying inductively to expand the definition of derivative for single variable to multivariable calculus but it doesn't work since it's the inverse. Conclusion: In single variable they tell you the truth, but not the full truth. – Dec 30 '13 at 08:06
2

"The derivative of a function from $ℝ^→ℝ^$ is not another function from $ℝ^→ℝ^$." I don't understand this. Of course it is such a function; specifically, a linear one. – A_P Aug 24 '19 at 21:25
1

@A_P It is not. However the derivative of a function $\mathbb R^n\to\mathbb R^m$ at a point $x\in\mathbb R^n$ is a linear map $\mathbb R^n\to\mathbb R^m$. – Alex Mar 05 '23 at 11:35
@Alex Indeed, I should say the derivative at a point is a linear transformation. The analog of the derivative function from one dimensional calculus is a linear transformation-valued map on some subset of $\mathbb{R}^n$. In order to express the derivative as a function on $\mathbb{R}^n$ there needs to be a bijective correspondence between points in $\mathbb{R}^n$ and linear transformations on $\mathbb{R}^n$. This is possible, but not for all functions, rather, just for a particularly special class of functions. Ex: functions on $\mathbb{R}^2$ which are identified as complex differentiable. – James S. Cook Mar 08 '23 at 03:07

Understanding the derivative as a linear transformation

2 Answers2

Linked