How do I make sense of the total derivative in the limit case of $\Bbb R \to \Bbb R$ functions?

Question

In my notes it is stated as a proposition that the total derivative of a linear map $T: V \to W$ at every point $v \in V$ is T itself: $DT(v)=T$. It also says that in the particular case of $\Bbb R \to \Bbb R$ functions it is just the ordinary single-variable calculus derivative (also for instance in wikipedia: "when f is a function of a single variable, the total derivative is the same as the ordinary derivative of the function" https://en.wikipedia.org/wiki/Total_derivative)

But how can than be?

From the given proposition I can deduce that all derivatives of order $k \ge 2$ are equal to the map T itself: $D^kT=T$, because every time I get the same linear map. Right? Although this is a little odd when I think about it: in plain words it says that derivatives of a linear map are the same linear map

But then if $T:\Bbb R \to \Bbb R$ $T(x)=ax$ From single-variable calculus all derivatives of order $k\ge 2$ are $0$ :$DT(x)=a$, $D^kT=0, k \ge 2$. What is going on? Why is there no match (apparently)?

First question: do you understand the definition of derivative that for a function $f:A\subset V\to W$ ($V,W$ being vector spaces and $A$ open) and a point $a\in A$, that the (Frechet) derivative $Df_a$ is *by definition* a linear map $Df_a:V\to W$? If you don’t even agree with/understand this definition, we can’t proceed with anything else. Next comment: if $T:V\to W$ is linear, then for each $v\in V$, $DT_v:V\to W$ is a linear map (by definition) and an easy theorem tells us it is equal to the linear map $T$. So, $DT:V\to \text{Hom}(V,W)$ is the constant map $v\mapsto T$. So, $D(DT)=0$. — peek-a-boo, Jan 07 '24 at 13:05
I have several answers (which you can go to my profile and look up) talking about derivatives, including the formal definition, and the motivation, and how this new definition relates to our old familiar one, and other examples, and how this relates to the idea of $df=\frac{\partial f}{\partial x^i},dx^i$ and its precise meaning etc etc, but I’m not sure which of these to even point you to, since I’m not really sure where your confusion lies (though at the moment I’m suspecting the root lies in understanding the definition itself). — peek-a-boo, Jan 07 '24 at 13:13
@peek-a-boo I agree with the initial definition, if I just take it literally but I don't see that in the limiting case I described it matches straightforwardly with the ordinary one. Like the derivative of f(x)=3x is Df(x)=3 which I can see as a linear map where the matrix is 3x3, but diferentiating again should give me 0 if I am to treat it as the ordinary derivative. The theorem says it should give me the same linear map, but for that I would have to write Df(x)=3x then I get the absurd 3x =3. — some_math_guy, Jan 07 '24 at 13:19
@peek-a-boo This means I cannot interpret the total derivative as a rate of change? I know that if I read it as the best linear aproximation it maskes sense to always get the same linear map. But the ordinary derivative is rate of change, and the second order ordinary derivative is rate of change of the rate of change, which is 0 for a linear function, but not if I use the total derivative definition. This means I don't agree when they say that this total derivative is the same as the ordinary one in the single variable calc case. — some_math_guy, Jan 07 '24 at 13:23
what do you mean the matrix is $3\times 3$? It is a $1\times 1$ matrix. And yes, the correct interpretation of derivatives is as best linear approximations. As a special case you can recover the rate of change interpretation (but this is rather special when you restrict to one-dimensional domains). The higher-dimensional case contains the usual one as a special case. Also, why are you writing $Df(x)=3x$? That’s not at all correct. It is $Df_x(h)=3h$ for all $h\in\Bbb{R}$, i.e $Df_x=f$. I think you’re getting confused by the notation. We DO NOT get to conclude $3x=3$. — peek-a-boo, Jan 07 '24 at 13:35
@peek-a-boo Is it correct to write $Df=3$ in the same example i.e without specifying the point? I think I have seen it, but to me it would be as wrong as writing $g=x^2$, instead of $g(x)=x^2$ for example. Maybe another abuse of notation? — some_math_guy, Jan 07 '24 at 13:57
$3\text{id}{\Bbb{R}}$ would be better, but still is an abuse of notation because a-priori $Df$ has ‘two slots’ $Df_a(h)$, with $a$ being the ‘point’ and $h$ being the ‘displacement from $a$’, so when you write $Df=3\text{id}{\Bbb{R}}$, it is a-priori not clear (for a beginner) which slot is to be held fixed. But yes with practice everything goes. Anyway, see my posted answer. — peek-a-boo, Jan 07 '24 at 13:59

score 0 · Answer 1 · answered Jan 07 '24 at 13:19

(1) Total derivatives are defined at a point: for a fixed $x \in V$, the total derivative at $x$ $DF_x : V \to W$ of $F : V \to W$ is "the best linear approximation to $F$ near $x$" in the sense that $$ F(x+h) \approx F(x) + DF_x(h). \tag{$*$} $$ Giving a rigorous defininiton for $\approx$ gives a rigorous definition of $DF_x$.

When $F$ is linear, then for all $x$ we have $DF_x(h) = F(h)$.

(2) When $F : \mathbb R \to \mathbb R$, then $x, h \in \mathbb R$ and $$ DF_x(h) = \frac{\mathrm dF(x)}{\mathrm dx}h. $$ Notice the similarity the following familiar equation shares with ($*$): $$ F(x + h) \approx F(x) + \frac{\mathrm dF(x)}{\mathrm dx}h. $$

score 0 · Answer 2 · 2024-01-07T13:45:45.767

Let us just look at what a total derivative is: The wikipedia entry you mention (which works in $\mathbb R^d$'s, which is just fine) states: Let $f\colon U\to\mathbb R^m$ be a function defined on an open subset $U\subseteq\mathbb R^n$. We say it's totally differentiable at $a\in U$ if there exists a linear map $\mathrm df_a\colon\mathbb R^n\to\mathbb R^m$ such that $$\lim_{x\to a}\frac{\|f(x)-f(a)-\mathrm df_a(x-a)\|}{\|x-a\|}=0.$$ In this case, we call $\mathrm df_a$ the total differential of $f$ at $a$.
Now it is a direct check that for linear $f$ we can choose $\mathrm df_a=f$ for every $a\in U$ (by linearity).
Hence the total derivative of $T\colon\mathbb R\to\mathbb R,x\mapsto ax$ at $a\in\mathbb R$ is $\mathrm dT_a\colon\mathbb R\to\mathbb R,x\mapsto cx$ and not $\mathrm dT_a\colon\mathbb R\to\mathbb R,x\mapsto c$, as you suggest (which is not even linear unless $c=0$).
So the total differential and the single-variable calculus derivative to not directly agree in the sense that they give you the same object (linear map vs. real number) but the total differential of a linear map $\mathbb R\to\mathbb R$ is just given by multiplication with the single-variable calculus derivative of that map (without specification of a point since it's constant).
However, this is "close" (and inambigous) enough to justify the quote "when f is a function of a single variable, the total derivative is the same as the ordinary derivative of the function", at least in the opinion of some people.

This is precisely what I suspected but couldn't find a single source that actually says they are not the same object and that "when f is a function of a single variable, the total derivative is the same as the ordinary derivative of the function"" is an abuse of language in that stament. It looks like everyone takes it for granted. Going along with the definitions taking everything rigorously results in the absurd I posted. — some_math_guy, Jan 07 '24 at 13:52

peek-a-boo · Accepted Answer · 2024-01-07T14:13:19.777

Here’s our definition.

Definition.

Let $V,W$ be finite-dimensional normed vector spaces, $A\subset V$ open, $f:A\to W$ a given function and $a\in A$ a given point. We say $f$ is Frechet differentiable at $a$ if there exists a linear transformation $T:V\to W$ such that \begin{align} \lim\limits_{h\to 0}\frac{\|f(a+h)-f(a)-T(h)\|_W}{\|h\|_V}&=0. \end{align} The linear map $T$ appearing in the definition above is unique. This therefore gives us the right to denote this linear transformation as $Df_a$ (or $df_a$ or $Df(a)$ or $df(a)$, but I don’t like each of these for some small reason or other).

If $f$ is differentiable at each point $a\in A$, then we simply say $f$ is differentiable on $A$.

Now, I’m going to make a bunch of statements (all true). You tell me which one you have issue with (also, notice that I’m careful to keep a distinction between $Df_a$ as a linear map vs $f’(a)$ as a matrix representation… something which people often don’t maintain).

If $f:A\subset V\to W$ is differentiable on $A$, then for each $a\in A$, $Df_a:V\to W$ is a linear map, i.e $Df_a\in \text{Hom}(V,W)$.
In the special case $V=\Bbb{R}^n,W=\Bbb{R}^m$, we have that $Df_a:\Bbb{R}^n\to\Bbb{R}^m$ is a linear map, i.e $Df_a\in\text{Hom}(\Bbb{R}^n,\Bbb{R}^m)$.
Continuing from 2, if we choose the standard basis $\sigma_n=\{e_1,\dots, e_n\}$ on the domain, and $\sigma_m=\{e_1,\dots, e_m\}$ on the target, then we can assign an $m\times n$ matrix representation $[Df_a]_{\sigma_n}^{\sigma_m}$. It is common to denote this matrix as $f’(a)$. So, $f’(a)$ is by definition then $m\times n$ matrix representation of $Df_a$, relative to the standard ordered bases of $\Bbb{R}^n,\Bbb{R}^m$.
Specializing to $m=n=1$, $Df_a:\Bbb{R}\to\Bbb{R}$ is a linear map, and its matrix representation $f’(a)$ is a $1\times 1$ matrix, i.e simply $f’(a)\in\Bbb{R}$ is a real number.

Now, let us fix a linear map $T:V\to W$.

For each point $a\in V$, $T$ is differentiable at $a$, and $DT_a=T$, i.e for all $h\in V$, we have $DT_a(h)=T(h)$.
$DT$ is a function from $V\to \text{Hom}(V,W)$.
$DT:V\to\text{Hom}(V,W)$ is a constant function with constant value $T$, i.e $DT_a=T$ for all $a\in V$.
$DT:V\to\text{Hom}(V,W)$ is a constant function so its derivative is zero identically, i.e $D^2T=D(DT):V\to\text{Hom}(V,\text{Hom}(V,W))$ is the zero function.
For all $k\geq 2$, $D^kT=0$ identically (only thing to be mindful of is that they all have different target spaces).

Consider now the function $f:\Bbb{R}\to\Bbb{R}$ given by $f(x)=3x$.

$f$ is a linear function
The matrix representation (relative to the bases $\{1\}$ on $\Bbb{R}$) of $f$ as a linear map is $(3)$, i.e the $1\times 1$ matrix with single entry $3$, i.e $[f]= (3)$.
$f’(x)=3$ for all $x\in\Bbb{R}$.
For all $x\in\Bbb{R}$, we thus have $Df_x=f$, i.e for all $x\in\Bbb{R}$ and all $h\in\Bbb{R}$, we have $Df_x(h)=f(h)=3h$.
Taking the matrix representation of $Df_x=f$ from statement 13, we get $[Df_x]=[f]$, and thus $f’(x)=3$. So, statements 12 and 13 are consistent with each other.
$f’’(x)=0$ for all $x\in\Bbb{R}$ because $f’:\Bbb{R}\to\Bbb{R}$ is a constant function (with value $3$).
Since $Df:\Bbb{R}\to\text{Hom}(\Bbb{R},\Bbb{R})$, $x\mapsto f$ (or even more explicitly $Df_x=3\text{id}_{\Bbb{R}}$ for all $x$) is a constant mapping, its derivative $D(Df)$ is identically zero (as a mapping $\Bbb{R}\to\text{Hom}(\Bbb{R},\text{Hom}(\Bbb{R},\Bbb{R}))$).
statements 15 and 16 are completely consistent with each other.

Finally, How does the idea of a differential dx work if derivatives are not fractions? might serve as a helpful side-answer.

I observe in statement 1 that the domain of the derivative $Df_a $i s V, even if $f$ was only defined in a subset $A \subseteq V$. Is this to be accepted as a definition? — some_math_guy, Jan 07 '24 at 14:15
About 14: [Df_x]=[f]=f'(x)=3 right? Since you said f'(x) was going to denote the matrix representation at the beginning. On the other hand 12 and 13 being consistent implies for me that whenever I read "the total derivative is the same as the ordinary derivative of the function" what is meant is "the total derivative is the same as the matrix representation of the ordinary derivative sing as a linear map". — some_math_guy, Jan 07 '24 at 14:52
@some_math_guy yes for the first part. And yes (12), (13) are consistent/equivalent or whatever you want to call it, and however you wish to express this string of equalities in words. if it were me I’d phrase it as I always have (especially in statement 3): “$f’(x)$ is the matrix representation of the linear map (the Frechet derivative) $Df_x$”. — peek-a-boo, Jan 07 '24 at 15:00
If you have some minutes, could you take a look at: https://math.stackexchange.com/questions/4840403/prove-that-fx-x2-is-smooth-using-the-definition-of-total-derivative. It is about the total derivative again and I have notational issues with the second total derivative. — some_math_guy, Jan 07 '24 at 16:10
@some_math_guy everything you need is covered in chapter 3 of Loomis and Sternberg’s Advanced Calculus; this is where I learnt majority of my differential calculus in Banach spaces (this and Henri-Cartan’s Differential Calculus). — peek-a-boo, Jan 07 '24 at 16:15

score 0 · Answer 4 · answered Jan 07 '24 at 14:03

You have to be very careful what it is you are taking the derivative of, since there are many maps at play here.

The originally given map $f:\mathbb R\to\mathbb R$.
It's derivative at a point $x\in\mathbb R$, $\mathrm Df(x)$ is also a map $\mathbb R\to\mathbb R$. Note that $x$ is not the argument of the map. There are multiple derivatives, each at a different point, and $x$ just specifies which of these derivatives we are talking about, each of which is itself a map.
The derivative map $\mathrm Df:\mathbb R\to\mathrm{Map}(\mathbb R,\mathbb R)$, which takes a point $x$ as its argument and sends it to the derivative $\mathrm Df(x)$ at that point. As such, it maps numbers to maps. Note that the $x$ in $\mathrm Df(x)$ is the argument of the derivative map, but not of the derivative. These are different maps!

Now the following holds:

If $f$ is linear, then $\mathrm Df(x)=f$ for all $x$.
As a corollary: The derivative of $\mathrm Df(x)$ is equal to $\mathrm Df(x)$, since it is always linear.
The derivative of the derivative map $\mathrm Df$ is $0$ if $f$ is linear, since then $\mathrm Df$ is constant (it's always the same linear map!).

Comparing this to the 1d case to which you are used: Let's say $f(x)=3x$, which is linear. Then $f'(x)=3$, or $\mathrm Df(x)=(h\mapsto 3h)$. Now the drivative of the map $h\mapsto 3h$ is again $3$ or $h\mapsto 3h$. However, the derivative of the derivative map $f'$, which maps $x\to3$, is $0$.

How do I make sense of the total derivative in the limit case of $\Bbb R \to \Bbb R$ functions?

4 Answers4