0

The gradient of a scalar function $f\colon \mathbb{R}^n \to \mathbb{R}$ is a vector-valued function $\nabla f\colon \mathbb{R}^n \to \mathbb{R}^n$. Since applying a function can't increase information ($\nabla f$ can't contain information not in $f$), the $n$ dimensions in $\nabla f$ must not be independent -- they must be a relatively "diffuse" (or "redundant") representation of (a subset of) the information in $f$'s single dimension. Is this an accurate understanding?

If so, what is the pattern of dependency among the dimensions in $\nabla f$? That is, what constraints exist among them?

Several answers suggest that $\nabla$ "mixes in" information from the initial space $\mathbb{R}^n$, but I don't see precisely how that information (just a flat Euclidean topology, right?) is represented in $\nabla f$.

One answer points out that $\nabla f$ localizes information that is nonlocal in $f$ -- I get that, but it seems more like a rearrangement of information within the dimensions of $\nabla f$ than a constraint across them -- after all, when $n=1$, no additional dimensions are required for the representation of $\nabla f$.

Moderators: if these edits aren't clear enough to remove the "hold" status, please comment as to what is unclear -- 3 answers have so far accurately interpreted what I was asking, I just haven't yet fully understood them and through comments/edits am trying to prompt refinements to be more complete and/or easier to understand.

  • Because the gradient, aka Jacobian also has information about the target space. – Gary. Aug 12 '15 at 22:27
  • What do you mean by "create dimensions out of nowhere" and "increase information"? The gradient requires "more dimensions" than what? – littleO Aug 12 '15 at 23:00
  • @littleO - Jf is n dimensional, even though it contains exactly the same information as the single dimensional f. – user1441998 Aug 12 '15 at 23:05
  • The gradient isn't a function from $\mathbb{R}^n$ to itself, it operates on functions. That is where the "extra information" comes from. – vonbrand Aug 14 '15 at 01:02
  • @vonbrand - i think you mean $\nabla \colon (\mathbb{R}^n \to \mathbb{R}) \to (\mathbb{R}^n \to \mathbb{R}^n)$, but this doesn't help me understand... it's not actually adding information, right? – user1441998 Aug 14 '15 at 02:00

3 Answers3

3

Perhaps the following fact is what you are after:

Not every map $\mathbb R^n \rightarrow \mathbb R^n$ is the gradient of a function $\mathbb R^n\rightarrow\mathbb R$.

That is, even though the gradient "creates dimensions", as you put it, it is also constrained in its form, which sort of cancels out the new dimensions. For instance, in two dimensions, suppose we have a function $f(x,y)$ whose gradient is $(a(x,y),b(x,y))$. It can quickly be seen that $a_y(x,y)-b_x(x,y)=0$ given that $f_{xy}=f_{yx}$ (where the subscripts indicate partial derivatives). Thus, for instance, $(y,0)$ is not the gradient of any function since, if it were, the mixed partial derivatives wouldn't match. This can be generalized by considering exact differential forms and the exterior derivative, but I'll just leave those keywords if you wish to investigate further.

From this, we notice that, even though the gradient has two components, knowing one component already tells us almost everything we need to know about the other - if we know $a$, then we know that: $$b(x,y)=\left(\int_{0}^xa_y(x,y)\,dx\right)+c(y)$$ for some function $c$ - which is already reducing the amount of "missing" information from a function $b:\mathbb R^2\rightarrow \mathbb R$ to one merely $c:\mathbb R\rightarrow \mathbb R$. So, even though the gradient has a more complicated representation than the original function, not everything in the representation is independent, so there's no new information added.

Milo Brandt
  • 60,888
  • Great answer! +1 – wltrup Aug 12 '15 at 23:16
  • i need more time than i have right this second to study this, but it sounds good -- it's what i was trying to get at by "not full rank"... – user1441998 Aug 13 '15 at 00:53
  • this is the kind of answer i was looking for, but what are the other constraints among the dimensions -- this can't be all of them, right? – user1441998 Aug 14 '15 at 00:34
  • @user1441998 It actually is all of them (essentially this follows from Poincaré lemma). One might note that knowing the quantity $\frac{xa(x,y)+yb(x,y)}{\sqrt{x^2+y^2}}$ at every point except the origin (i.e. a derivative away from the origin) allows one to reconstruct $f$ up to a constant so long as it is continuous - so there is a map $\mathbb R^2\rightarrow \mathbb R$ which tells us everything the gradient can. – Milo Brandt Aug 14 '15 at 01:45
  • well i don't see how we have gotten rid of all the redundancy then -- you show we need a (R^2 -> R) and c (R -> R), but that's more than was in f (R^2 -> R)... – user1441998 Aug 14 '15 at 01:58
  • @user1441998 Well, that's more of a subtlety of infinite dimensional spaces than anything else - having $a$ and $c$ is clearly not redundant since we can reconstruct an $f$ inducing both. The thing is that $\mathbb R^2\rightarrow\mathbb R$ has strict subspaces isomorphic to itself, so we can one function in the space contain as much information as multiple objects $\mathbb R^2\rightarrow\mathbb R$. (cont.) – Milo Brandt Aug 14 '15 at 02:12
  • For instance, consider defining an operator $D$ on functions $\mathbb Z\rightarrow\mathbb R$ such that $Df=(g,h)$ where $g(x)=f(2x)$ and $h(x)=f(2x+1)$. It spits out two functions $\mathbb Z\rightarrow\mathbb R$ (representing even and odd terms), and these functions are completely independent of one another and unconstrained in themselves, but it’s not creating more information. – Milo Brandt Aug 14 '15 at 02:12
  • this reasoning seems to depend on the lower cardinality of Z than R. – user1441998 Aug 14 '15 at 20:30
  • @user1441998 It doesn't. We're just using that two copies of $\mathbb Z$ embed into $\mathbb Z$. You could try $Df=(g,h)$ where $g(x)=f(e^x)$ and $h(x)=f(-e^x)$, which is the same idea with maps $\mathbb R\rightarrow\mathbb R$ (or we could do this with any infinite set) – Milo Brandt Aug 14 '15 at 22:29
  • hrm ok. i have the intuition that there is some maximum information capacity in R -> R that is more than can be squeezed into an "already full" R^2 -> R. is there a field establishing "how many" R^n -> R can be packed into an R^m -> R for n<m, or that it cannot be done for m>n, etc? incidentally, i thought of a simple way to pack them into one for the 2-d gradient example -- since it is an angle and a magnitude, angle has finite range, and magnitude is positive, you can just sum them without losing information... – user1441998 Aug 14 '15 at 23:54
  • in Z, you can diagonalize to pack any Z^n -> R into just one Z -> R. i thought that didn't hold for R, but your e^x example makes it look possible... – user1441998 Aug 14 '15 at 23:56
  • 1
    @user1441998 Are you aware of the fact that $\mathbb R^2$ has the same cardinality as $\mathbb R$? Because once you have a bijection between them, you can be pretty sure the vector spaces are also just as large (also, at present, we're using $\mathbb R\rightarrow\mathbb R^2$ not $\mathbb R^2\rightarrow\mathbb R$ - our domain is the same) – Milo Brandt Aug 15 '15 at 00:28
  • i wasn't aware of that, in fact i thought it wasn't true (was mistaken that the aleph number n was the cardinality of R^n). i will look into it, is there a proof on wikipedia or anything? so there's no relationship between dimension and information capacity? that seems crazy. – user1441998 Aug 15 '15 at 03:05
  • @user1441998 It is very counterintuitive when one first encounters it, but it is true. You might do well to read this answer which gives proof or explore the notion of cardinal arithmetic if you want to get a little more involved. Dimension can measure "information capacity" - but the dimension of space of the functions $\mathbb R\rightarrow\mathbb R$ equals the dimension of the space of functions $\mathbb R\rightarrow\mathbb R^2$ - so they can carry the same information. – Milo Brandt Aug 15 '15 at 03:17
1

The dimensions are already there. It's $\mathbb{R}^n$ after all. The fact that your function is mapping it onto $\mathbb{R}$ doesn't change the fact that there's plenty of information from every one of the $n$ directions. In fact, that's exactly what the gradient is revealing to you. It's not creating any dimensions nor is it creating any more information; it's just revealing the structure of how your function changes along the $n$ axes. So, in a sense, it's the other way around. It's not that the gradient is creating dimensions or information; rather, it's that your scalar function is compressing or hiding data, in a manner of speaking.

wltrup
  • 3,983
  • i agree there's information in each of the n directions, but that information is already present in f, right? i don't see how f can be "hiding" anything, other than maybe the (flat Euclidean) topology of the input space. i agree f seems to be a "compression" of Jf, but then isn't Jf somehow not full rank (ie, not truly n dimensional, but still rank 1)? – user1441998 Aug 12 '15 at 22:54
  • "that information is already present in f, right?" - Yes and no. Yes, in that you can compute the gradient from $f$ but no in that the information is not local (in the sense of being isolated to a single point). If all you had was the value of $f$ at a single point, you couldn't find its gradient. The fact that you have a scalar field (a scalar value at every point in $f$'s domain, or at least in some neighbourhood of a point) says a lot about $f$ and much of that information is contained in the Jacobian. – wltrup Aug 12 '15 at 23:03
  • regarding locality, i get that, but this seems more like a rearrangement of information within the dimensions of ∇f than a constraint across them -- after all, when n=1, no additional dimensions are required for the representation of ∇f. – user1441998 Aug 14 '15 at 00:35
0

The gradient, aka, Jacobian of a map from $f: \mathbb R^n \rightarrow \mathbb R^m : f:(x_1, x_2,..,x_n) \rightarrow (f_1(x_1,..,x_n), f_2(x_1,x_2,..,x_n),...,f_m(x_1,x_2,..,x_n))$ is the $m \times n$-matrix $Jf: (Jf(i,j):= \frac {\partial f_i}{\partial x_j} )$

In that sense, it contains information from both the initial space and the target space.

Gary.
  • 2,432