0

This question is about the basic chain rule (and I think of it when I read about calculation of variation in defining distance in manifold using usual Riemannian metrics) and is related to the another (temporarily deleted) post https://math.stackexchange.com/q/3769640/577710 I cite it here for my reference, as a reminder of the oringinal question.

The context of the question is as follows: it seems Riemannian metrics are defined as a kind of inner product or 2-tensor so that we can define inner product and norm of tangent vectors, particularly ones along the curve segment (with two ends $p, q$ fixed) whose length is used to define the distance between any two points $p, q$ in $M$.

When we calculate the length of the shortest curve $\gamma$ between $p, q$ in $\mathbb{R}^2$, say $\gamma={(t, f(t))}$, using the usual metric, $L_\gamma=\int \sqrt{\gamma_1'(t)^2+\gamma_2'(t)^2} =\int \sqrt{1+(f'(t))^2}dt$, we may define $F(t, f(t), f'(t))=1+(f'(t))^2$.


My question is,

  1. in my eyes, the three 'independent' variables of $F$ are obvious not independent, then why we define an $F$ as such, instead of defining $F$ to have less variables? Is it, for example, just for the convenience of calculation?
  2. And even if independent variables are not independent, we can still use chain rule to calculate $dF/dt$, i.e. $$\frac{dF}{dt}=\frac{\partial F}{\partial t}+\frac{\partial F}{\partial f}\frac{df}{dt}+\frac{\partial F}{\partial f'}\frac{d(f')}{dt}?$$

If we think further, the 2nd questions can be broken down to two more fundamental aspects.

2-1. Actually that practice seems common when we decompose a function into a composition of functions, for example, $r=1$ is the radius of a unit circle, we can decompose $r$ into $r=\sqrt{x^2+y^2}$ and $x=\cos \theta, y =\sin \theta$, where $r(x,y)$ is a function of two 'dependent' variables. And using the chain rule we get $$\frac{dr}{d\theta}=\frac{\partial r}{\partial x}\frac{dx}{d\theta}+\frac{\partial r}{\partial y}\frac{dy}{d\theta}=-\cos \theta\sin\theta+\cos \theta\sin\theta=0.$$ So an aspect of the 2nd question may be restated as follows: can we always decompose a function into the composition of a function of 'dependent' variables and some other functions and still use the chain rule?

2-2. We notice that $F$ here is decomposed into $f$ and $f'$, which are obviously more 'dependent' than normal 'dependent' variables like the above $x$ and $y$. It causes some convolution. I will use an example to illustrate the point.

$h=x^2+2x, u=x^2, v=2x$, and so $u'=v$, if so there is obvious not a single way to write h as a function of $u$ and $v$ (similarly there can be more than a way to write $F$ as a function of $t, f(t), f'(t)$), as (1) algebraic expressions of $u, v$ (2) as differential and integral equations of $u, v$, e.g. $$h=u+v, h=v^2/4+v, h=(\int v)+v, h=u+u',h=v^2/4+u'.$$

Such non-uniqueness of decomposition makes me wonder, can we still use chain rule and get the same result? and how we know, given $h, u, v$, how to write $h$ as a function of $u,v$? Will the case (2) cause more complicated issues than case (1)? And would anyone name specific fields dealing with these issues, if there is any?

  • 1
    i. Your third paragraph seems to be missing some ^2s on $\gamma_i'(t)$, and at least one square root. Perhaps you meant to say that $L^2$ equals the given expression. ii. It appears that English is not your first language. I find reading what you wrote very difficult, perhaps because many of your sentences are long. Consider splitting into shorter sentences to aid clarity. – John Hughes Jul 27 '20 at 11:44

1 Answers1

2

Let me go to your first example, but I'm going to rewrite it:

Define $$ F: \Bbb R^3 \to \Bbb R : (u, v, w) \mapsto 1 + w^2. $$ While it's conventional to denote the partial derivatives of $F$ with symbols like $$ \frac{\partial F}{\partial u}, $$ etc., this can lead to considerable confusion, esp. when we let $G(u,v,w) = F(v, w, u)$, for instance. I propose for now to write the derivatives of $F$ with respect to the "slots" in which arguments appear, so that the thing written above is now written $$ D_1 F, $$ i.e., $D_1 F$ denotes the derivative of $F$ with respect to its first argument, regardless of the temporary variable used to name that first argument when $F$ was defined. Clear?

When we do this, the chain rule is no longer quite as pretty. But at least in one case, it retains some of its niceness. If $g_1, g_2, g_3 : \Bbb R \to \Bbb R$, and we define $$ H(t) = F(g_1(t), g_2(t), g_3(t)), $$ then the chain rule becomes $$ H'(t) = D_1 F(g_1(t), g_2(t), g_3(t)) g_1'(t) + D_2 F(g_1(t), g_2(t), g_3(t)) g_2'(t) + D_3 F(g_1(t), g_2(t), g_3(t)) g_3'(t). $$

Now in the particular case you're looking at, we have the function $F$; it's a function defined on all of 3-space, and has nothing to do with the function $f$. Let's go ahead and compute its derivatives: $$ D_1 F(u,v,w) = 0\\ D_2 F(u,v,w) = 0\\ D_3 F(u, v, w) = 2w. $$ Not so bad, right?

If we define $$ H(t) = L(1, f(t), f'(t)) $$

(notice that I'm using a new name here, because $H$ is a function of a single variable, while $F$ is a function of three variables), then we can use the chain rule to compute \begin{align} H'(t) &= D_1 F(1, f(t), f'(t)) 1'(t) +D_2 F(1, f(t), f'(t)) f'(t) +D_3 F(1, f(t), f'(t)) (f')'(t)\\ &= D_1 F(1, f(t), f'(t)) 0 +0~f'(t) +2(f'(t)) (f')'(t)\\ &= 2f'(t) f''(t) \end{align}

Now if you compare this simple computation to the confusion you describe in the "My question is" section, you'll see a couple of things.

  1. You've used the letter $F$ to denote two different things: a function of three variables, and a function of one variable. Sadly, this is very common, and eventually with practice you get used to it. But for beginners, it's just a nightmare. So when I encounter things like this, I rewrite them more clearly, even if it involves more writing

  2. The author may have chosen to write the function $F$ with three arguments because later in the exposition there will be a need to make parallel constructions --- things involving some other function of three variables where each of the three variables enter into the formula for $F$, not just the third one. If I'm guessing correctly, you're looking at a Calculus of Variations explanation, and the author is explaining how to minimize arclength. But what if the thing you wanted to minimize was something involving not only the derivative of $f$, but $f$ itself? Then your formula for $F$ would involved both $v$ and $w$.

I don't believe I've answered all your questions, but perhaps I've helped you to get onto the right track.

John Hughes
  • 93,729
  • I see, 1.so to make things clear, we may define partial derivatives according to positions of arguments, i.e.to the original arguments of the function $F$ (before we decompose $F$ with other functions); since if we define partial derivatives of $F$ according to 'forms' of the arguments, then we get ourselves into an extra (and somehow unnecessary) problem to infer $F$ from a function $H$ composed of $F,t, f, f'$, where $H, f, f'$ are known but $F$ is unknown, and we get various forms of and actually different $F$s which we can't easily pick up one. – Charlie Chang Jul 27 '20 at 12:46
  • a symbol can be used to denote a variable or a function (a pair of variables satisfying certain relation)--actually the two are very different concepts, as different as elements of different dimensions--so we often mix up the two usages, if we are aware that function is a pair of variables or denote it to be so, then we see (in my previous notation) variables $F$,$f$,$f'$, functions $(t,f') ,(t,f),((t,f,f'), F), (t,F)$ are different. (it's clearer to change notations of these functions to 'position'-based arguments instead of using $f, f', F$ directly)
  • – Charlie Chang Jul 27 '20 at 12:46
  • 2
    Item 1. I think that "before we decompose $F$..." is the wrong way to think: $F$ is a function, plain and simple. What you apply it to is a separate matter. I don't understand what you're saying about $H$ in the last sentence, but I assume that this misunderstanding comes from the same "decomposition" mindset. Item 2: I think you've got the right idea here. Physicists often write things like $z(x, y)$ and $z(r, \theta)$ for "the same $z$", but if you ask them what $z(3, 0)$ denotes (is it $x = 3, y = 0$, or $r = 3, \theta = 0$), they tend to call you nasty names. :) – John Hughes Jul 27 '20 at 12:53
  • I should've written 'compose' instead 'decompose' (meaning composition of functions). But never mind, I agree that 'decomposing' a function into composition of other functions, the reverse process of composition, causes confusions. – Charlie Chang Jul 27 '20 at 13:13
  • @JohnHughes Not sure if it is intentional, but $z(3, 0)$ corresponds to the same points in either cartesian and polar coordinates. – Arctic Char Jul 28 '20 at 07:49
  • 1
    @ArcticChar: you're exactly right, and while I wrote it, I thought to myself "I have to be sure these points are different!" Sigh. Rather than delete and re-enter my comment (because it's a day later), I'll just say "replace that $z = (3,0)$ with $z = (0, 3)$, please". – John Hughes Jul 28 '20 at 10:02