Why is differentiability defined on multivariable functions this way?

Question

I am told that a function $f:\mathbb{R}^{n}\to \mathbb{R}$ is differentiable at a point $\mathbf{x}=(x_1,x_2,...,x_n)$ if $\Delta f$ is of form

$$\sum_{i}^{n}\frac{\partial f}{\partial x_i}\Delta x_i+\sum_{i}^{n} \epsilon_i \Delta x_i$$

Where

$$\lim_{\mathbf{\Delta x}\to \mathbf{0}}\epsilon_{i}=0$$ for every $1\leq i\leq n$.

It seems arduous to me to use such a definition. Why don't mathematicians define differentiability by the existence of partial derivatives, which seems natural?

Now, I am aware that I can define things how I want to, but my goal is to understand why it is commonly defined this way. Are there some underlying properties that make this definition superior?

The existence of partial derivatives doesn't make the function nice enough as differentiability in one dimension. Take into account that you are only probing two directions out of infinitely many. The function could be discontinuous, for example. — plop, Aug 21 '22 at 12:23
Or it could be continuous, but without a tangent plane to the graph. — plop, Aug 21 '22 at 12:24
By the way, if the second summation with the $\epsilon_i$'s is the one that looks strange, know that the concept doesn't need to be presented this way. If is equivalent, but all it matters is to say that that remainder, when divided by the size of $\Delta x$ tends to zero as $\Delta x$ tends to zero. — plop, Aug 21 '22 at 12:28
Also, with fixed $x_0$ and $x$ consider function (of 1 variable t) $$g(t) = f(x_0 + tx)$$. this is a restriction of $f$ to the line through $x_0$ in the direction of $x$. As a function of variable $g(t)$ it should have derivative at $t=0$. By definition, derivative of $g$ at $0$ should be such that for small $t$ we should have $$g(t) = g(0) + g^'(0)t + o(t)$$ i.e. on the right hand side we should have a linear function plus something small. Now, using chain rule try to compute derivative of $g$, you get — Salcio, Aug 21 '22 at 12:35
The key idea of calculus is to take a nonlinear function (difficult) and approximate it locally by a linear function (easy). To say that a function $f: \mathbb R^n \to \mathbb R^m$ is differentiable at a point $x \in \mathbb R^n$ means that there exists a linear function $L$ such that $f(x + \Delta x) \approx f(x) + L(\Delta x)$ for any "small" vector $\Delta x \in \mathbb R^n$. But what does "small" mean? To make this precise, the approximation error $e(\Delta x) = f(x + \Delta x) - f(x) - L(\Delta x)$ is required to satisfy $e(\Delta x) / | \Delta x | \to 0$ as $\Delta x \to 0$. — littleO, Aug 21 '22 at 12:39
Of course, not every function can be approximated locally by a linear function. In calculus, we study functions which have the special property that they can be approximated locally by linear functions. These are called "differentiable" functions. — littleO, Aug 21 '22 at 12:41
@Salcio I get $g'(t)=\frac{df}{d(x_{0}+tx)}\cdot x$ but I can't divide by a vector... — pjq42, Aug 21 '22 at 12:48
The answer given isn't bad by any means, but I would recommend waiting a little longer to accept an answer as there are many different perspectives/reasons for this. — Mark S., Aug 21 '22 at 13:15
@MarkS. I see. I am satisfied with the multitude of comments, but I do agree that your suggestion could help seek out more answers. — pjq42, Aug 21 '22 at 13:24
Let $f(x,y) = \frac{x y}{x^2+y^2}$ when either $x$ or $y$ is non-zero and let $f(0,0)=0$. The partial derivatives of $f$ exist at all points, but $f$ is not even continuous at $(0,0)$. (This is about the simplest function with this property!) — irchans, Aug 21 '22 at 15:47
This is related to my question https://math.stackexchange.com/q/4510962. — Kritiker der Elche, Aug 21 '22 at 22:02
@pjq42 you do not divide by vector, you divide by $t$ which is scalar. — Salcio, Aug 22 '22 at 00:20

Abhishek A Udupa · Accepted Answer · 2022-08-21T12:34:51.473

The reason for this is because a point $P$ in the domain could be reached by more than one path.

We'll try to understand this using a function of two variables since it is very easy to visualise and get a good intuition.

For a function of two variables, the two partial derivatives are arrived at by intersecting the surface of the graph with vertical planes parallel to $xz$- and $yz$-planes, creating a curve in each plane called a trace at $P$.

The partial derivatives are seen as the slope of tangent to these curves at the point $P$. Now it would be natural to expect the following: if you were to turn any of those vertical planes slightly about $P$, keeping it vertical, but no longer parallel to the coordinate plane, the tangent to the new trace at $P$ slightly different to what it was before. Right?

Obviously, this is not guaranteed by the existence of just the two partial derivatives. For the function to be differentiable, it should not change abruptly as you move slightly around a point's neighborhood. Hence that definition.

Makes sense?

Abruptly? What do you mean by that? That it's discontinuous? Not differentiable by the first definition I gave? — pjq42, Aug 21 '22 at 12:38
@pjq42 Yes. That the function doesn't jump suddenly when we slightly move away from the point in question, along any path. This implies a smooth function. — Abhishek A Udupa, Aug 21 '22 at 12:50
I have retracted the check mark not because the answer is bad, but that I seek more answers. After a few days if there aren't any better answers, I will put it back. :) — pjq42, Aug 21 '22 at 13:25

Joe · Answer 2 · 2022-08-21T20:13:52.560

The definition of differentiability found in most multivariable calculus textbooks goes something like this:

A function $f:\mathbb R^n\to\mathbb R^m$ is differentiable at $\mathbf{x}\in\mathbb R^n$ if there is a linear transformation $\lambda_{\mathbf{x}}:\mathbb R^n\to\mathbb R^m$ such that $$ \lim_{\mathbf{h}\to\mathbf{0}}\frac{f(\mathbf{x}+\mathbf{h})-f(\mathbf{x})-\lambda_{\mathbf{x}}(\mathbf{h })}{||\mathbf{h}||}=\mathbf{0} \, . $$ Such a transformation, if it exists, must be unique, and so it makes sense to define $f'(\mathbf x)=\lambda_\mathbf{x}$ at every point where $f$ is differentiable.

This definition does not mention partial derivatives at all, and nor does the concept of the partial derivative have to be introduced to motivate this definition. It is in fact a theorem that if $f$ is differentiable at $\mathbf{x}\in\mathbb R^n$, then the partial derivatives with respect to each of its arguments, at that point, must exist. The converse of this statement is not at all true: consider the function $f:\mathbb R^2\to\mathbb R$ given by $$ f(x,y)=\begin{cases} \dfrac{xy}{x^2+y^2} & \text{if }(x,y)\neq(0,0) \, , \\ 0 & \text{if }(x,y)=(0,0) \, . \end{cases} $$ This function is not even continuous at $(0,0)$, and so it hardly deserves to be called "differentiable" at that point under any sensible definition of differentiability. This is in spite of the fact that $\frac{\partial f}{\partial x}(0,0)$ and $\frac{\partial f}{\partial y}(0,0)$ both exist and are equal to $0$.

So, the real question is: why does the definition of differentiability given above make sense? Many authors have already spent a great deal of time answering this question, and so I won't dwell on it for long, but the key idea is that just in the one-dimensional case, the approximation $f(\mathbf{x}+\mathbf{h})\approx f(\mathbf{x})+f'(\mathbf{x})\mathbf{h}$ ought to be "very good" for "small" $\mathbf{h}$. The formal definition is motivated by the desire to give a precise meaning to the words "very good" and "small".

You are missing the norm in the numerator of the definition. — jjagmath, Aug 21 '22 at 16:45
@jjagmath This is irrelevant. Convergence in a normed linear space is defined via the norm. That is, $\phi(x) \to a$ iff $\lVert \phi(x) - a \rVert \to 0$. — Paul Frost, Aug 21 '22 at 17:15

tryst with freedom · Answer 3 · 2022-08-21T15:36:21.807

I will give you a magic antidote definition which will cover derivative in a vastly different type of uses. The thing you need to understand is that of the Frechet Derivative.

Let $V$ and $W$ be normed vector spaces, and $U \subset V$ be an open subset of $V$. A function $f: U \to W$ is called Frechet differentiable at $x \in U$ if there exists a bounded linear operator $A: V \to W$ s.t

$$ \lim_{ || h || \to 0 } \frac{||f(x+h) - f(x) - Ah||_W} {||h||_V}=0$$

Most probably, you will not know the definition of these words:

Normed vector space: Think of it as a vector space where we it makes sense to talk about lengths.
Bounded linear operator: The ratio of length of mapped vector divided by ratio of length of vector in the domain is less than some $M$ . Note that the notion of length in map and in that of domain ened not be same.

What's the motivation here?

In the single variable case, we have the taylor expansion as:

$$ f(x+h) = f(x) + f'(x) h + o(h^2)$$

The idea is that the term which is coefficient of $h$ in expansion is the derivative. Now, we can rearrange this as:

$$ \frac{ f(x+h) - f(x) - f'(x) h}{h} = o(h)$$

As $ h \to 0 $ both side should go to zero if $f'(x)$ is well defined.

Now, similarly in the multivariable case, we want it that if we add a small displacement $\epsilon v$ in domain, how the function change, for that we can again use the idea:

$$ f(x+v) = f(x) + Av + o(\epsilon^2)$$

Now rearrange and we get the same concept.

Paul Frost · Answer 4 · 2022-08-22T16:29:48.250

As you say, you can define things how you want. But the question is whether the "existence of partial derivatives" makes sense as the only requirement. The partial derivatives are defined by $$\frac{\partial f}{\partial x_i}(\xi) = \lim_{t \to 0} \frac{f(\xi + te_i) - f(\xi)}{t} $$ where $e_i$ is the standard $i$-th basis vector of $\mathbb R^n$.

Requiring the existence of the partial derivatives means that you only consider the $n$ directions given by the basis vectors $e_i$. This is a natural choice, but it is an abritrary choice. All other directions are out of focus, nothing is required concerning the general directional derivatives $$\frac{\partial f}{\partial v}(\xi) = \lim_{t \to 0} \frac{f(\xi + tv) - f(\xi)}{t} $$ with $v \in \mathbb R^n$. In fact, we can find examples in which $\frac{\partial f}{\partial v}(\xi)$ does not exist for all $v \in \mathbb R^n \setminus \{0\}$ although the partial derivatives $\frac{\partial f}{\partial x_i}(\xi)$ exist. Note that $v = 0$ gives $\frac{\partial f}{\partial v}(\xi) = 0$ which is uninteresting, but always formally defined.
Requiring the existence of all directional derivatives means that we get a function $$Df\mid_\xi : \mathbb R^n \to \mathbb R, Df \mid_\xi(v) = \frac{\partial f}{\partial v}(\xi) .$$ What can be said about this function? It is certainly compatible with scalar multiplication since for $w = \lambda v$ with $\lambda \ne 0$ we have $$\frac{\partial f}{\partial w}(\xi) = \lim_{t \to 0} \frac{f(\xi + t \lambda v) - f(\xi)}{t} = \lambda \lim_{t \to 0} \frac{f(\xi + \lambda t v) - f(\xi)}{\lambda t} = \lambda \lim_{s \to 0} \frac{f(\xi + s v) - f(\xi)}{s} = \lambda \frac{\partial f}{\partial v}(\xi) .$$ Note that the equation is trival for $\lambda = 0$.
The simplest functions $L : \mathbb R^n \to \mathbb R$ with the property $L(\lambda v) = \lambda L(v)$ are the linear maps. Thus it is an obvious approach to require that $Df\mid_\xi$ is a linear map in order that $f$ can be called differentiable at $\xi$. This has the benefit that the $\frac{\partial f}{\partial v}(\xi)$ are uniquely determined by the partial derivatives $\frac{\partial f}{\partial x_i}(\xi)$. In fact, we can write $v = \sum_{i=1}^n v_i e_i$ and get $$\frac{\partial f}{\partial v}(\xi) = \sum_{i=1}^n v_i \frac{\partial f}{\partial x_i}(\xi) .$$
Even if we require the existence of all directional derivatives and the linearity of $Df\mid_\xi$, we still only consider what happens if we approach $\xi$ on lines through $\xi$. It is much more interesting to see what happens if we approach $\xi$ in an arbitrary way. Of course it does not make sense to require that $$\lim_{\Delta x \to 0} \frac{f(\xi + \Delta x ) - f(\xi)}{\lVert \Delta x \rVert}$$ exists because then all directional derivatives would have the same value.
What we can do is this: Writing $\Delta x = \sum_{i=1}^n \Delta x_i e_i$, we regard $\sum_{i=1}^n \Delta x_i \frac{\partial f}{\partial x_i}(\xi)$ as an approximation of $f(\xi + \Delta x ) - f(\xi)$ and require that the relative error $$\epsilon(\xi, \Delta x) = \frac{f(\xi + \Delta x ) - f(\xi) - \sum_{i=1}^n \Delta x_i \frac{\partial f}{\partial x_i}(\xi)}{\lVert \Delta x \rVert} \tag{1}$$ goes to $0$ as $\Delta x$ goes to $0$. In other words, we require $$f(\xi + \Delta x ) - f(\xi) = \sum_{i=1}^n \Delta x_i \frac{\partial f}{\partial x_i}(\xi) + \epsilon(\xi, \Delta x)\lVert \Delta x \rVert$$ with $\lim_{\Delta x \to 0} \epsilon(\xi, \Delta x) = \lim_{\lVert \Delta x \rVert \to 0} \epsilon(\xi, \Delta x) = 0$. Now it seems natural to understand $\lVert - \rVert$ as the Euclidean norm $\lVert x \rVert_2 = \sqrt{\sum_{i=1}^n x_i^2}$, but let us take the taxicab norm $\lVert x \rVert = \sum_{i=1}^n \lvert x_i \rvert$ instead. We have $$\lVert x \rVert_2 \le \lVert x \rVert \le n \lVert x \rVert_2$$ which means that in $(1)$ is irrelevant whether we consider the Euclidean or the taxicab norm: $\lim_{\lVert \Delta x \rVert \to 0} \epsilon(\xi, \Delta x) = 0$ is true for both ot for none of the norms.
Understanding $\lVert - \rVert$ as the taxicab norm, we get $$ \epsilon(\xi, \Delta x)\lVert \Delta x \rVert = \sum_{i=1}^n \epsilon_i(\xi, \Delta x)\Delta x_i$$ where $\epsilon_i(\xi, \Delta x) = \epsilon(\xi, \Delta x)$ for $\Delta x_i \ge 0$ and $\epsilon_i(\xi, \Delta x) = -\epsilon(\xi, \Delta x)$ for $\Delta x_i < 0$.

This results in the definition in your question and I hope I could explain why one uses this definiton. Finally, what is its relation to Frechet differentiability considered in the other answers? If a function is differentiable in your sense, then $$L : \mathbb R^n \to \mathbb R, L(v) = \sum_{i=1}^n v_i \frac{\partial f}{\partial x_i}(\xi) \tag{2}$$ is a linear map and $$\lim_{\Delta x \to 0} \frac{f(\xi + \Delta x) -f(\xi) - L(\Delta x)}{\lVert \Delta x \rVert} = 0 . \tag{3}$$ This is Frechet differentiability. Conversely, if $f$ is Frechet differentiable, the clearly all directional derivatives exist, in particular all partial derivatives, and by linearity we see that $L$ is given by $(2)$. This shows that $f$ is differentiable in your sense.

Why is differentiability defined on multivariable functions this way?

4 Answers4

What's the motivation here?