6

On Wikipedia, it says

When $f$ is a function from an open subset of $\mathbb{R}^n$ to $\mathbb{R}^m$, then the directional derivative of $f$ in a chosen direction is the best linear approximation to f at that point and in that direction.

I just want to check that linear functions from $\mathbb{R}^n$ to $\mathbb{R}^m$, are defined as functions of the form $f(x) = ax+b$ where $a$ is a scalar and $b$ is a vector?

Also, it seems like functions of the form above just enlarge/shrink and shift. Is this correct? I thought that if anything was going to be a counterexample, it was going to be an off center circle; under the transformation x $\mapsto$ 2x, I thought an off-center circle might map to an ellipse; but this doesn't seem to be the case. For example, if $(x, y)$ satisfies $(x-2)^2 + (y-2)^2 = 1$, then multiplying both sides by $2^2$ gives $(2x-4)^2 + (2y-4)^2 = 4$; so $(2x, 2y)$ satisfies $(X^2-4)^2 + (Y-4)^2 = 4$, which is still a circle with center at $(4, 4)$, as expected.

Ovi
  • 23,737

5 Answers5

8

A linear function in this context is a map $f: \mathbb{R}^n \to \mathbb{R}^m$ such that the following conditions hold:

  1. $f(x+y)=f(x)+f(y)$ for every $x,y \in \mathbb{R}^n$
  2. $f(\lambda x)=\lambda f(x)$ for every $x \in \mathbb{R}^n$ and $\lambda \in \mathbb{R}$.

It can be shown that every such function has the form $f(x)=Ax$ where $A \in \mathbb{R}^{m \times n}$ is an $m \times n$ matrix. If $f$ has the form $f(x)=Ax + b$ for some $b\in \mathbb{R}^m$, then it is called an affine linear function.

This generalises the notion of a linear map $f: \mathbb{R} \to \mathbb{R}$ of the form $f(x)=ax+b$, where $a,b$ are real numbers, which is probably what you had in mind. A linear affine map is a linear map, if and only if $b=0$. Note that your example is a special affine linear map from $\mathbb{R}^n \to \mathbb{R}^n$ (the dimensions have to match).

An example of a linear function from $\mathbb{R}^3$ to $\mathbb{R}^3$ would be $$f(x,y,z) = \begin{pmatrix} 1 & 2 & 7\\ 5& 3 & 7\\ 3& 8& 2 \end{pmatrix} \begin{pmatrix} x\\ y\\ z \end{pmatrix}.$$

Your example in the case of $\mathbb{R}^3$ is of the form

$$f(x,y,z) = \begin{pmatrix} a& 0 & 0\\ 0& a & 0\\ 0& 0& a \end{pmatrix} \begin{pmatrix} x\\ y\\ z \end{pmatrix} + \begin{pmatrix} b_x\\ b_y\\ b_z \end{pmatrix},$$ for some $a \in \mathbb{R}$ and $(b_x, b_y, b_z) \in \mathbb{R}^3$.

In the case of a differentiable function at a point $x_0 \in \mathbb{R}^m$ $f: \mathbb{R}^m \to \mathbb{R}^n$ we want to approximate the function by an affine linear map, that is locally around $x_0$ we have $$f(x) \approx A(x-x_0) + f(x_0),$$ where $A \in \mathbb{R}^{n \times m}$. The offset $f(x_0)$ ensures that the approximation takes the value $f(x_0)$ at the point $x_0$, and the matrix $A$ describes how the function changes linearly around $x_0$. The idea is that linear maps are really easy to handle using the tools of linear algebra.

Jannik Pitt
  • 1,980
  • Ah THANK YOU for that edit; so we are after all looking for a function with non-zero "$y-$ intercept", but my mistake was making the coefficient of $x$ a scalar, not a matrix. – Ovi Dec 14 '18 at 22:09
  • @Ovi Yeah, but usually the matrix $A$ is called the linear approximation (or derivative) of the function. – Jannik Pitt Dec 14 '18 at 22:12
3

I just want to check that linear functions from $\mathbb{R}^n$ to $\mathbb{R}^m$, are defined as functions of the form $f(x)=ax+b$ where a is a scalar and b is a vector?

No. In fact, a linear function is one with the property that $f(ax) = af(x)$ for any $x$ is whatever vector space it's defined on and any $a$ in the scalar field of that vector space. In that case, that is precisely those of the form $f(x) = Ax$ for some matrix $A$.

Also, it seems like functions of the form above just enlarge/shrink and shift. Is this correct?

No, because of the above. For an example involving a circle, take $n = 2$, $m = 2$ and $A = \left(\array{2&0\\0&1}\right)$. This turns the unit circle into an ellipse. More generally, note that $n$ and $m$ do not have to be the same. For example, there's the linear map \begin{align*}f&: \mathbb{R}^3\to\mathbb{R}\\&:\left(\array{x\\y\\z}\right)\mapsto x+y+z,\end{align*} which collapses everything down to a diagonal line (but not in the most "natural" way).

user3482749
  • 6,660
  • But when we talk in terms of linear approximations, don't we want non-zero "$y$ intercepts"? I can't picture higher dimensions, but in functions from $\mathbb{R} \to \mathbb{R}$, when we talk of linear approximations, we talk of tangent lines, which are generally of the form $ax+b$, not just $ax$. – Ovi Dec 14 '18 at 21:57
  • 1
    This is a matter of terminology. In the terminology of Wikipedia, the directional derivative is the matrix in question (or, rather, the associated linear map), which is actually linear. In the $\mathbb{R}\to\mathbb{R}$ case, that's the $a$ in your question (or the map $x \mapsto ax$). – user3482749 Dec 15 '18 at 10:33
3

This is in general not the form of a linear function. A function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ is linear if the following two equalities hold for all $\alpha\in\mathbb{R}$ and $x, y\in \mathbb{R}^n$:

$i)$ $f(x + y) = f(x) + f(y)$

$ii)$ $f(\alpha x) = \alpha f(x)$.

It turns out that all such functions are of the form $f(x) = Ax$ for some matrix $A\in\mathbb{R}^{m\times n}$ (that is, a matrix with $m$ rows, $n$ columns).

One key difference with your proposed form is that linear functions always go through the origin, that is $f(0) = 0$, where $0$ is the zero vector (rather than the scalar). This is not the case if $b\neq 0$ in your proposed form. For $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ you should think of a plane through the origin as the graph, rather than a line.

Dasherman
  • 4,206
  • 2
    I'm aware that this is the definition of a linear function in linear algebra. But when we talk in terms of linear approximations, don't we want non-zero "$y$ intercepts"? I can't picture higher dimensions, but in functions from $\mathbb{R} \to \mathbb{R}$, when we talk of linear approximations, we talk of tangent lines, which are generally of the form $ax+b$, not just $ax$. – Ovi Dec 14 '18 at 21:55
  • 1
    The way I think of it is that for the differential we translate the point to the origin and then we have a linear approximation through the origin. Note also that in higher dimensions, for example $\mathbb{R}^2 \rightarrow \mathbb{R}$, we have a tangent plane rather than a single tangent line. It is this plane that is the linear approximation. – Dasherman Dec 14 '18 at 22:00
  • As @Jannik Pitt notes, we can also view it as an affine linear approximation, which is just a translated linear function (or in this case, translated linear approximation), so that instead of passing through the origin, it passes through the point $(x, f(x))$, $x$ being the point at which we calculate the differential. – Dasherman Dec 14 '18 at 22:03
  • Thanks for the responses; I can't fully understand your second comment because I haven't done any examples or even looked at definitions, so I don't exactly know at what point we translate to the origin. But I'll actually read my book now and check back after I internalize the definitions. – Ovi Dec 14 '18 at 22:10
1

In single variable calculus the best linear approximation to a function $f$ at a point $p$ is $$ g(x) = f(p) + f'(p)(x-p). $$ You can see why that's close to $f(x)$ when $x$ is close to $p$ by looking at the definition of the derivative, and thinking about the tangent line.

In several variables $p$ and $x$ will be vectors. That formula will still be correct if you change "$f'(p)$" to "the directional derivative of $f$ at $p$ in the direction from $p$ to $x$".

As the other answers say, most of what you "want to check" isn't right.

Ethan Bolker
  • 95,224
  • 7
  • 108
  • 199
1

In general, the derivative is the best local linear approximation to a function at a point. A differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ at $x=x_0$ is locally approximated by a vector space homomorphism $Df_{x_0} \in {\cal L}(\mathbb{R}^n, \mathbb{R}^m)$, and it is in this sense that you must understand "linear".

In the direction $v \in \mathbb{R}^n$, the directional derivative is simply $Df_{x_0}(v)$ because the derivative contains all information about all local rates of change in all directions.

Basically what happens is that you attach a copy of $\mathbb{R}^{m+n}$ to $x_0$, and you approximate the curvy graph of $f$ by the flat (linear) graph of $Df(x_0)$. This is called the tangent space to the graph of $f$ at $x=x_0$. If you balance a piece of cardboard on a beach ball, you have a good model for this. The origin is where the cardboard touches the ball, which is why you don't get an additive constant.

If you draw a line on your piece of cardboard through the point where it touches, you get a model for the directional derivative in the direction of your point. Rotate your cardboard tangent plane around that point, and you get different directional derivatives.

Matthias
  • 424