Decomposition of a function and chain rule.

Question

This question is about the basic chain rule (and I think of it when I read about calculation of variation in defining distance in manifold using usual Riemannian metrics) and is related to the another (temporarily deleted) post https://math.stackexchange.com/q/3769640/577710 I cite it here for my reference, as a reminder of the oringinal question.

The context of the question is as follows: it seems Riemannian metrics are defined as a kind of inner product or 2-tensor so that we can define inner product and norm of tangent vectors, particularly ones along the curve segment (with two ends $p, q$ fixed) whose length is used to define the distance between any two points $p, q$ in $M$.

When we calculate the length of the shortest curve $\gamma$ between $p, q$ in $\mathbb{R}^2$, say $\gamma={(t, f(t))}$, using the usual metric, $L_\gamma=\int \sqrt{\gamma_1'(t)^2+\gamma_2'(t)^2} =\int \sqrt{1+(f'(t))^2}dt$, we may define $F(t, f(t), f'(t))=1+(f'(t))^2$.

My question is,

in my eyes, the three 'independent' variables of $F$ are obvious not independent, then why we define an $F$ as such, instead of defining $F$ to have less variables? Is it, for example, just for the convenience of calculation?
And even if independent variables are not independent, we can still use chain rule to calculate $dF/dt$, i.e. $$\frac{dF}{dt}=\frac{\partial F}{\partial t}+\frac{\partial F}{\partial f}\frac{df}{dt}+\frac{\partial F}{\partial f'}\frac{d(f')}{dt}?$$

If we think further, the 2nd questions can be broken down to two more fundamental aspects.

2-1. Actually that practice seems common when we decompose a function into a composition of functions, for example, $r=1$ is the radius of a unit circle, we can decompose $r$ into $r=\sqrt{x^2+y^2}$ and $x=\cos \theta, y =\sin \theta$, where $r(x,y)$ is a function of two 'dependent' variables. And using the chain rule we get $$\frac{dr}{d\theta}=\frac{\partial r}{\partial x}\frac{dx}{d\theta}+\frac{\partial r}{\partial y}\frac{dy}{d\theta}=-\cos \theta\sin\theta+\cos \theta\sin\theta=0.$$ So an aspect of the 2nd question may be restated as follows: can we always decompose a function into the composition of a function of 'dependent' variables and some other functions and still use the chain rule?

2-2. We notice that $F$ here is decomposed into $f$ and $f'$, which are obviously more 'dependent' than normal 'dependent' variables like the above $x$ and $y$. It causes some convolution. I will use an example to illustrate the point.

$h=x^2+2x, u=x^2, v=2x$, and so $u'=v$, if so there is obvious not a single way to write h as a function of $u$ and $v$ (similarly there can be more than a way to write $F$ as a function of $t, f(t), f'(t)$), as (1) algebraic expressions of $u, v$ (2) as differential and integral equations of $u, v$, e.g. $$h=u+v, h=v^2/4+v, h=(\int v)+v, h=u+u',h=v^2/4+u'.$$

Such non-uniqueness of decomposition makes me wonder, can we still use chain rule and get the same result? and how we know, given $h, u, v$, how to write $h$ as a function of $u,v$? Will the case (2) cause more complicated issues than case (1)? And would anyone name specific fields dealing with these issues, if there is any?

i. Your third paragraph seems to be missing some ^2s on $\gamma_i'(t)$, and at least one square root. Perhaps you meant to say that $L^2$ equals the given expression. ii. It appears that English is not your first language. I find reading what you wrote very difficult, perhaps because many of your sentences are long. Consider splitting into shorter sentences to aid clarity. — John Hughes, Jul 27 '20 at 11:44

score 2 · Answer 1 · answered Jul 27 '20 at 12:03

Let me go to your first example, but I'm going to rewrite it:

Define $$ F: \Bbb R^3 \to \Bbb R : (u, v, w) \mapsto 1 + w^2. $$ While it's conventional to denote the partial derivatives of $F$ with symbols like $$ \frac{\partial F}{\partial u}, $$ etc., this can lead to considerable confusion, esp. when we let $G(u,v,w) = F(v, w, u)$, for instance. I propose for now to write the derivatives of $F$ with respect to the "slots" in which arguments appear, so that the thing written above is now written $$ D_1 F, $$ i.e., $D_1 F$ denotes the derivative of $F$ with respect to its first argument, regardless of the temporary variable used to name that first argument when $F$ was defined. Clear?

When we do this, the chain rule is no longer quite as pretty. But at least in one case, it retains some of its niceness. If $g_1, g_2, g_3 : \Bbb R \to \Bbb R$, and we define $$ H(t) = F(g_1(t), g_2(t), g_3(t)), $$ then the chain rule becomes $$ H'(t) = D_1 F(g_1(t), g_2(t), g_3(t)) g_1'(t) + D_2 F(g_1(t), g_2(t), g_3(t)) g_2'(t) + D_3 F(g_1(t), g_2(t), g_3(t)) g_3'(t). $$

Now in the particular case you're looking at, we have the function $F$; it's a function defined on all of 3-space, and has nothing to do with the function $f$. Let's go ahead and compute its derivatives: $$ D_1 F(u,v,w) = 0\\ D_2 F(u,v,w) = 0\\ D_3 F(u, v, w) = 2w. $$ Not so bad, right?

If we define $$ H(t) = L(1, f(t), f'(t)) $$

(notice that I'm using a new name here, because $H$ is a function of a single variable, while $F$ is a function of three variables), then we can use the chain rule to compute \begin{align} H'(t) &= D_1 F(1, f(t), f'(t)) 1'(t) +D_2 F(1, f(t), f'(t)) f'(t) +D_3 F(1, f(t), f'(t)) (f')'(t)\\ &= D_1 F(1, f(t), f'(t)) 0 +0~f'(t) +2(f'(t)) (f')'(t)\\ &= 2f'(t) f''(t) \end{align}

Now if you compare this simple computation to the confusion you describe in the "My question is" section, you'll see a couple of things.

You've used the letter $F$ to denote two different things: a function of three variables, and a function of one variable. Sadly, this is very common, and eventually with practice you get used to it. But for beginners, it's just a nightmare. So when I encounter things like this, I rewrite them more clearly, even if it involves more writing
The author may have chosen to write the function $F$ with three arguments because later in the exposition there will be a need to make parallel constructions --- things involving some other function of three variables where each of the three variables enter into the formula for $F$, not just the third one. If I'm guessing correctly, you're looking at a Calculus of Variations explanation, and the author is explaining how to minimize arclength. But what if the thing you wanted to minimize was something involving not only the derivative of $f$, but $f$ itself? Then your formula for $F$ would involved both $v$ and $w$.

I don't believe I've answered all your questions, but perhaps I've helped you to get onto the right track.

I see, 1.so to make things clear, we may define partial derivatives according to positions of arguments, i.e.to the original arguments of the function $F$ (before we decompose $F$ with other functions); since if we define partial derivatives of $F$ according to 'forms' of the arguments, then we get ourselves into an extra (and somehow unnecessary) problem to infer $F$ from a function $H$ composed of $F,t, f, f'$, where $H, f, f'$ are known but $F$ is unknown, and we get various forms of and actually different $F$s which we can't easily pick up one. — Charlie Chang, Jul 27 '20 at 12:46
Item 1. I think that "before we decompose $F$..." is the wrong way to think: $F$ is a function, plain and simple. What you apply it to is a separate matter. I don't understand what you're saying about $H$ in the last sentence, but I assume that this misunderstanding comes from the same "decomposition" mindset. Item 2: I think you've got the right idea here. Physicists often write things like $z(x, y)$ and $z(r, \theta)$ for "the same $z$", but if you ask them what $z(3, 0)$ denotes (is it $x = 3, y = 0$, or $r = 3, \theta = 0$), they tend to call you nasty names. :) — John Hughes, Jul 27 '20 at 12:53
I should've written 'compose' instead 'decompose' (meaning composition of functions). But never mind, I agree that 'decomposing' a function into composition of other functions, the reverse process of composition, causes confusions. — Charlie Chang, Jul 27 '20 at 13:13
@JohnHughes Not sure if it is intentional, but $z(3, 0)$ corresponds to the same points in either cartesian and polar coordinates. — Arctic Char, Jul 28 '20 at 07:49
@ArcticChar: you're exactly right, and while I wrote it, I thought to myself "I have to be sure these points are different!" Sigh. Rather than delete and re-enter my comment (because it's a day later), I'll just say "replace that $z = (3,0)$ with $z = (0, 3)$, please". — John Hughes, Jul 28 '20 at 10:02

Decomposition of a function and chain rule.

1 Answers1

Linked