Why is the definition of derivative what it is?

Question

In our lectures, we've been taught the following:

We say that $f:\mathbb{R}^3\to\mathbb{R}$ is differentiable at a point $X$,iff there exists $\alpha\in\mathbb{R}^3$ such that $$\epsilon (H)=\frac{f(X+H)-f(X)-\alpha\cdot H}{\|H\|}\to0$$ as $\|H\|\to0$ and the derivative is $\alpha$

But I can't understand why this should work? What is the intuition behind setting up $\epsilon(H)$ like this? Why the dot product ($\alpha\cdot H$)? What does $\alpha$ represent physically on the curve?

Please help, thanks.

The $\alpha$ is the linear transformation that best approximates $f$ in a small enough region. See Wikipedia page on Fréchet derivative. — edm, Oct 18 '16 at 09:48
Note there's no “curve” when $f\colon \mathbb{R}^3 \to \mathbb{R}$. The graph of $f$ is a three-dimensional subset of $\mathbb{R}^4$. Intuitive notions of derivative aren't as reliable in higher dimensions. — Matthew Leingang, Oct 18 '16 at 09:58
Although many mathematical concepts originated from physics, many lose their physical meaning when they become abstract and general. There are many instances when you cannot interpret a mathematical concept physically. — edm, Oct 18 '16 at 10:21
My answer here attempts to shed some light on the definition of the derivative when $f:\mathbb R^n \to \mathbb R^m$. — littleO, Oct 18 '16 at 10:46
@edm - perhaps, but this is not one of them. As artic tern has noted, $\alpha$ is the gradient of $f$. It points in the direction in which $f$ is increasing fastest, and its magnitude is the rate of increase of $f$ in that direction. More generally, any concept that truly deserves the name "derivative" will have a strong geometric (i.e. "physical") interpretation, as the point of a derivative is to give a best linear approximation of a mapping, which is a geometric concept. — Paul Sinclair, Oct 18 '16 at 14:23

score 4 · Answer 1 · answered Oct 18 '16 at 10:34

Say $\mathbf{v}$ is a unit vector and $f:\mathbb{R}^n\to\mathbb{R}$ a scalar function.

The directional derivative of $f$ at $\mathbf{x}$ in the direction of $\mathbf{v}$ is

$$ D_{\mathbf{v}}f(\mathbf{x}) = \lim_{h\to0} \frac{f(\mathbf{x}+h\mathbf{v})-f(\mathbf{x})}{h}. $$

If we interpret $f(\mathbf{x}+h\mathbf{v})$ as a function of $h$ with $\mathbf{x},\mathbf{v}$ fixed, this is

$$\begin{array}{l} \displaystyle \frac{\mathrm{d}}{\mathrm{d}h}f(\mathbf{x}+h\mathbf{v}) &= \displaystyle\frac{\partial f}{\partial x_1}\frac{\partial (x_1+hv_1)}{\partial h}+\cdots+\frac{\partial f}{\partial x_n}\frac{\partial(x_n+hv_n)}{\partial h} \\[5pt] & \displaystyle = \frac{\partial f}{\partial x_1}v_1+\cdots+\frac{\partial f}{\partial x_n}v_n \end{array} $$

at $h=0$ (so all the partials $\partial f/\partial x_i$ are evaluated at $\mathbf{x}$) by the multivariable chain rule.

This is just the dot product $D_{\mathbf{v}}f(\mathbf{x})=\nabla f(\mathbf{x})\cdot \mathbf{v}$ where $\nabla f$ is the gradient.

Rearranging, this may be written as

$$ \frac{f(\mathbf{x}+h\mathbf{v})-f(\mathbf{x})-\nabla f(\mathbf{x})\cdot(h\mathbf{v})}{h}\to0 \quad \textrm{as }h\to0. $$

With the substitution $\mathbf{h}=h\mathbf{v}$ this becomes

$$\frac{f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})-\nabla f(\mathbf{x})\cdot\mathbf{h}}{\|\mathbf{h}\|}\to0 \quad \textrm{as }\|\mathbf{h}\|\to0. $$

The derivative of $f$ at $\mathbf{x}$ in this case is a vector $\nabla f(\mathbf{x})\in\mathbb{R}^n$ depending on $\mathbf{x}$.

More generally one can do the same thing to vector functions $f:\mathbb{R}^n\to\mathbb{R}^m$, in which case a linear function $Df:\mathbb{R}^n\to\mathbb{R}^m$ will be applied to $\mathbf{h}$ instead of a dot product with a vector. (This is a generalization since any linear function $\mathbb{R}^n\to\mathbb{R}$ is just a dot product with some vector.)

score 1 · Answer 2 · answered Oct 18 '16 at 10:40

As stated in my comment, $\alpha$ is a linear transformation that approximates the function $f$ well in a small region. I can only show you why the definition makes sense, but not tell the physical meaning.

You can find the following arguments in Spivak's Calculus on Manifolds.

In single variable case, we define the derivative at a point $x$ by $$L:=\lim\limits_{h\to0}\frac{f(x+h)-f(x)}{h}$$ if such number $L$ exists. The number $L$ is called the derivative of $f$ at $x$. Equivalently, we have $$\lim\limits_{h\to0}\frac{f(x+h)-f(x)}{h}-L=0,$$ or $$\lim\limits_{h\to0}\frac{f(x+h)-f(x)-Lh}{h}=0.$$ To generalise this expression to several variables, we put the norm sign in both the numerator and the denominator (so that division makes sense), and require that $L$ is a linear transformation. The function $f$ is differentiable at $x$ if there exists a linear transformation $L$ such that $$\lim\limits_{h\to0}\frac{\lVert f(x+h)-f(x)-Lh\rVert}{\lVert h\rVert}=0.$$ The function $L$ is then the derivative of $f$ at $x$.

Why is the definition of derivative what it is?

2 Answers2