How can $y$ and $y'$ be independent in variational calculus?

Question

In variational calculus, functionals are written as \begin{eqnarray} F = \int f(x,y,y') dx \end{eqnarray}

Where $F$ depends upon choice of $y,y'$. But for smooth regular functions specifying the $y$ also specifies $y'$, so how can they be interpreted as independent?

I also read that functional can be assumed to be function of infinite variables so that derivatives can also be independent infinite variables but this reason seems hard to digest. Can someone please explain me why we don't write just \begin{eqnarray} F = \int f(x,y) dx \end{eqnarray}

and also, why \begin{eqnarray} \frac{\partial}{\partial y}y' &=& 0 \\ \frac{\partial}{\partial y'}y &=& 0 \end{eqnarray}

$ F $ is a functional, $ f $ is not. $ f $ takes in vectors only. So this form restricts the class of functionals you can consider. — Ian, Mar 25 '15 at 11:26
There are a couple of questions about this on physics.SE: http://physics.stackexchange.com/questions/885/why-does-calculus-of-variations-work and http://physics.stackexchange.com/questions/119992/what-do-the-derivatives-in-these-hamilton-equations-mean . But a question on math.SE is a good idea to get some more mathematical answers. — Jack M, Mar 27 '15 at 15:26
@JackM, thanks for the links, as you said some mathematical explanation will be of great value. — chatur, Mar 29 '15 at 18:48
Imo this is one of cases where the traditional notation conflating the function $x \mapsto y(x)$ and the current value $y(x)$ is most disastrous. They're both just written $y$. So it would sure seem plausible that $f$ takes the function as input, right? But no, it actually only takes the current value. So $f$ needs to be told the current values $y$ and $y'$ separately. — echinodermata, Apr 04 '15 at 17:55
One of the ways that I think about it is: "Is there any reason that the value of a function and the value of its derivative at a particular point should be related?" — Michael Burr, Apr 05 '15 at 13:35
Does this answer your question? Why is $\frac{\operatorname dy'}{\operatorname dy}$ zero, since $y'$ depends on $y$? — Hans Lundmark, Sep 30 '21 at 20:23

score 12 · Answer 1 · answered Mar 29 '15 at 19:15

We are given a function $f$ of three variables: $$(u,v,w)\mapsto f(u,v,w)\ .$$ When functions $$x\mapsto u:=\phi(x), \quad x\mapsto v:=\psi(x),\quad x\mapsto w:=\chi(x)$$ are supplied then a pullback $$\Phi(x)=f\bigl(\phi(x),\psi(x),\chi(x)\bigr)$$ results that can be integrated over $x$ from $a$ to $b$: $$F:=\int_a^b\Phi(x)\>dx\ .$$ The value of the quantity $F$ depends on the functions $\phi$, $\psi$, $\chi$ in $(1)$; therefore $F$ is called a functional, and one is intended to write $F(\phi,\psi,\chi)$ instead of just $F$.

Now in the case of variational calculus the three functions $\phi$, $\psi$, $\chi$ are $x\mapsto x$, $x\mapsto y(x)$, $x\mapsto y'(x)$ for a single function $y:\>[a,b]\to{\mathbb R}$. Therefore the functional $F$ in question only depends on this $y(\cdot)$. It is therefore allowed to write $$F(y):=\int_a^b f\bigl(x,y(x),y'(x)\bigr)\>dx\ .\tag{2}$$ When arguing about this functional $F$ we look at increments $F(y+\epsilon u)-F(y)$ where $\epsilon u$ is a small variation of $y$. We then have to differentiate $$F(y+\epsilon u):=\int_a^b f\bigl(x,y(x)+\epsilon u(x),y'(x)+\epsilon u'(x)\bigr)\>dx$$ with respect to $\epsilon$, and by the chain rule this involves computing partial derivatives of $f$ with respect to the second and third variable. It is pure lazyness that these partial derivatives are denoted by ${\partial f\over\partial y}$ and ${\partial f\over \partial y'}$ instead of $f_{.2}$ and $f_{.3}$.

One question though. I get the argument for not considering $y'(x)$ for $J$ to depend on it, but the first paragraph convinces me to see that $J$ depends on $\phi$, where $\phi(x) = x$. — rainman, Sep 03 '21 at 22:05

texasflood · Answer 2 · 2015-04-09T21:13:11.493

As far as I understand, and I may be wrong, I think your confusion is just about notational technicality.

While $y'$ is a function of $y$, from the point of view of the function $f$, it only takes in a vector of real numbers and maps it to a real number. Therefore, having $x$ and $y$ as an input isn't enough. Some problems require $y'$ in the integral, e.g. let's take a simple problem, say we want to the shortest function that connects two points $a$ and $b$. Obviously the answer is a straight line, but let's do this as a variational calculus problem

enter image description here

We have $$ds=\sqrt{dx^2+dy^2}=dx\sqrt{1+y'^2}$$ Therefore: $$F=\int ds=\int_0^a(1+y'^2)^{1/2}dx$$

As you can see, $y'$ appears in the integral. "But wait!" I hear you exclaim, "$y'$ is just a function of $y$, so we can just say $f(y,x)=(1+y'^2)^{1/2}$ and everything is fine!"

The real point here is that $f$ does not take the whole function $y(x)$ as an input, if it did, $f$ would "know" what $y'$ was and so you could write $f(x,y)$ instead as you said. But this is not the case. $f$ takes a vector in $\mathbb{R}^3$ and gives out a scalar in $\mathbb{R}$. So for rigorous notation, you must say $f(y,y',x)$ and not $f(y,x)$.

The fact that $f$ doesn't "know" the whole function $y(x)$ and can only take in a vector in $\mathbb{R}^3$ might sound problematic, but it isn't. The standard technique for dealing with a problem in this form is to take the directional derivative of $F$ in an arbitrary direction $z$, let's say $DF[z]$, then the solution is a $y(x)$ that makes $DF[z]=0$ for all possible $z$.

$$DF[z]=\lim_{\epsilon\rightarrow 0}\left[\frac{d}{d\epsilon}\int_0^a f(y+\epsilon z, y'+\epsilon z', x) dx\right]\tag{1}$$

You can think of dealing with a calculus of variations problem in the following inelegant heuristic way:

Go through all possible functions $y(x)$
See which ones lead to $DF[z]=0$ for any and every $z$. These are the solutions to the problem
(Optional) Apply boundary conditions to get a unique solution

Therefore, $f$ doesn't need to "know" the whole function $y$, we explicitly give it that information when we do the work in computing $f(y+\epsilon z, y'+\epsilon z', x)$. If we did as you suggested and used $f(y,x)$, then for any particular input to $f$, we only have the value $y$. Can you work out the value $y'(x)$ if you only know the value of $y(x)$ at $x$ only and nowhere else? No, of course not.

In the example above, at every point of $x$ between $0$ and $a$, $f$ takes $y'$ at that point (unfortunately this is a degenerate example and only depends on $y'$ and not $y$ or $x$) and pops out a scalar, integrating this gives the length of $y(x)$. This integral is the functional $F$ and all you need to do is find $DF[z]$ and apply the three steps above.

To your other question, why $\frac{\partial y'}{\partial y}=0$ and vice-versa is kind of like a notational convention in variational calculus. Of course, $y'$ isn't independent of $y$ but in the context of this function $f:\mathbb{R}^3\rightarrow\mathbb{R}$ which we are taking partial derivatives of when working out $d/d\epsilon$, we want to find out how a small change in the function $y$ of $\epsilon z$ makes a difference to $F$, and obviously this will also make a small change of $\epsilon z'$ in $y'$. Now look at $(1)$, specifically $f(y+\epsilon z, y'+\epsilon z', x)$. We deal with this via a multivariable Taylor series as $\epsilon$ is small. Now when we apply the Taylor series to $f$, $y$ and $y'$ are just separate inputs to the function $f$, they are just real numbers. Not functions. So in this context, $y$ and $y'$ are independent.

The point is, and I really want to emphasise this, we just want to know how changes to the first and second inputs ($y$ and $y'$) of $f$ cause a change in $f$. So we don't care that $y$ and $y'$ are dependent. We see how $f$ is being perturbed by the perturbation in $y$ through the 1st and 2nd inputs and we can work out from this that the stationary point must occur when this perturbation of $f$ is such that the gradient of the resulting change in $F$ is zero, i.e. $DF[z]=0$

I hope this is clear, I think I fully understand and empathise with your concern about the notation but I can also understand why $f(y,x)$ is definitely not the correct notation for some problems, I hope I have made it clear what I mean. If not just ask for clarification. And hopefully everything I've said is accurate, I'm pretty sure it is, but if not let me know.

The three instances of $x+\epsilon z$ should read $x$ instead, no? — epimorphic, Apr 04 '15 at 23:32

Ian · Answer 3 · 2015-04-05T21:55:16.033

First of all, $F$ is a functional while $f$ is not. $f$ depends only on finite dimensional vectors. So when you write

$$F[y]=\int_0^1 f(x,y(x),y'(x)) dx$$

you are defining $F$ to be a functional of a rather particular form. For instance it is impossible to choose $f$ such that $F[y]=y(0)$.

Now you are correct that you cannot alter $y'$ without also somehow altering $y$, and can't do very much to $y$ without altering $y'$. However, it is possible to change $y'$ in a "large" fashion while only altering $y$ in a "small" fashion, and functionals of the form above can "detect" when you have done this. (Precisely speaking, the map $y \mapsto y'$ is not bounded.)

For instance, consider the sequence $y_n(x)=\frac{1}{n \pi} \sin(n \pi x)$, $y_0(x)=0$. $y_n$ converges uniformly to $y_0$. Yet $y'_0=0$ while $y'_n=\cos(n \pi x)$: $y'_0$ and $y'_n$ are quite far apart. So the fact that $f$ can depend explicitly on $y'$ means that we can have functionals like

$$F[y]=\int_0^1 y'(x)^2 dx$$

which can see the difference between $y_0$ and $y_n$ for $n>0$ (you will find that $F[y_0]=0$ while $F[y_n]=\frac{1}{2}$ for $n>0$).

More rigorously, if $F$ has the form we started with and $y_n \to y$ uniformly, then we may not have that $F[y_n] \to F[y]$. This would have to happen if $f$ were a continuous function depending only on $x$ and $y(x)$.

How can $y$ and $y'$ be independent in variational calculus?

3 Answers3

Linked