Here’s our definition.
Definition.
Let $V,W$ be finite-dimensional normed vector spaces, $A\subset V$ open, $f:A\to W$ a given function and $a\in A$ a given point. We say $f$ is Frechet differentiable at $a$ if there exists a linear transformation $T:V\to W$ such that
\begin{align}
\lim\limits_{h\to 0}\frac{\|f(a+h)-f(a)-T(h)\|_W}{\|h\|_V}&=0.
\end{align}
The linear map $T$ appearing in the definition above is unique. This therefore gives us the right to denote this linear transformation as $Df_a$ (or $df_a$ or $Df(a)$ or $df(a)$, but I don’t like each of these for some small reason or other).
If $f$ is differentiable at each point $a\in A$, then we simply say $f$ is differentiable on $A$.
Now, I’m going to make a bunch of statements (all true). You tell me which one you have issue with (also, notice that I’m careful to keep a distinction between $Df_a$ as a linear map vs $f’(a)$ as a matrix representation… something which people often don’t maintain).
- If $f:A\subset V\to W$ is differentiable on $A$, then for each $a\in A$, $Df_a:V\to W$ is a linear map, i.e $Df_a\in \text{Hom}(V,W)$.
- In the special case $V=\Bbb{R}^n,W=\Bbb{R}^m$, we have that $Df_a:\Bbb{R}^n\to\Bbb{R}^m$ is a linear map, i.e $Df_a\in\text{Hom}(\Bbb{R}^n,\Bbb{R}^m)$.
- Continuing from 2, if we choose the standard basis $\sigma_n=\{e_1,\dots, e_n\}$ on the domain, and $\sigma_m=\{e_1,\dots, e_m\}$ on the target, then we can assign an $m\times n$ matrix representation $[Df_a]_{\sigma_n}^{\sigma_m}$. It is common to denote this matrix as $f’(a)$. So, $f’(a)$ is by definition then $m\times n$ matrix representation of $Df_a$, relative to the standard ordered bases of $\Bbb{R}^n,\Bbb{R}^m$.
- Specializing to $m=n=1$, $Df_a:\Bbb{R}\to\Bbb{R}$ is a linear map, and its matrix representation $f’(a)$ is a $1\times 1$ matrix, i.e simply $f’(a)\in\Bbb{R}$ is a real number.
Now, let us fix a linear map $T:V\to W$.
- For each point $a\in V$, $T$ is differentiable at $a$, and $DT_a=T$, i.e for all $h\in V$, we have $DT_a(h)=T(h)$.
- $DT$ is a function from $V\to \text{Hom}(V,W)$.
- $DT:V\to\text{Hom}(V,W)$ is a constant function with constant value $T$, i.e $DT_a=T$ for all $a\in V$.
- $DT:V\to\text{Hom}(V,W)$ is a constant function so its derivative is zero identically, i.e $D^2T=D(DT):V\to\text{Hom}(V,\text{Hom}(V,W))$ is the zero function.
- For all $k\geq 2$, $D^kT=0$ identically (only thing to be mindful of is that they all have different target spaces).
Consider now the function $f:\Bbb{R}\to\Bbb{R}$ given by $f(x)=3x$.
- $f$ is a linear function
- The matrix representation (relative to the bases $\{1\}$ on $\Bbb{R}$) of $f$ as a linear map is $(3)$, i.e the $1\times 1$ matrix with single entry $3$, i.e $[f]= (3)$.
- $f’(x)=3$ for all $x\in\Bbb{R}$.
- For all $x\in\Bbb{R}$, we thus have $Df_x=f$, i.e for all $x\in\Bbb{R}$ and all $h\in\Bbb{R}$, we have $Df_x(h)=f(h)=3h$.
- Taking the matrix representation of $Df_x=f$ from statement 13, we get $[Df_x]=[f]$, and thus $f’(x)=3$. So, statements 12 and 13 are consistent with each other.
- $f’’(x)=0$ for all $x\in\Bbb{R}$ because $f’:\Bbb{R}\to\Bbb{R}$ is a constant function (with value $3$).
- Since $Df:\Bbb{R}\to\text{Hom}(\Bbb{R},\Bbb{R})$, $x\mapsto f$ (or even more explicitly $Df_x=3\text{id}_{\Bbb{R}}$ for all $x$) is a constant mapping, its derivative $D(Df)$ is identically zero (as a mapping $\Bbb{R}\to\text{Hom}(\Bbb{R},\text{Hom}(\Bbb{R},\Bbb{R}))$).
- statements 15 and 16 are completely consistent with each other.
Finally, How does the idea of a differential dx work if derivatives are not fractions? might serve as a helpful side-answer.