As far as I understand, and I may be wrong, I think your confusion is just about notational technicality.
While $y'$ is a function of $y$, from the point of view of the function $f$, it only takes in a vector of real numbers and maps it to a real number. Therefore, having $x$ and $y$ as an input isn't enough. Some problems require $y'$ in the integral, e.g. let's take a simple problem, say we want to the shortest function that connects two points $a$ and $b$. Obviously the answer is a straight line, but let's do this as a variational calculus problem

We have
$$ds=\sqrt{dx^2+dy^2}=dx\sqrt{1+y'^2}$$
Therefore:
$$F=\int ds=\int_0^a(1+y'^2)^{1/2}dx$$
As you can see, $y'$ appears in the integral. "But wait!" I hear you exclaim, "$y'$ is just a function of $y$, so we can just say $f(y,x)=(1+y'^2)^{1/2}$ and everything is fine!"
The real point here is that $f$ does not take the whole function $y(x)$ as an input, if it did, $f$ would "know" what $y'$ was and so you could write $f(x,y)$ instead as you said. But this is not the case. $f$ takes a vector in $\mathbb{R}^3$ and gives out a scalar in $\mathbb{R}$. So for rigorous notation, you must say $f(y,y',x)$ and not $f(y,x)$.
The fact that $f$ doesn't "know" the whole function $y(x)$ and can only take in a vector in $\mathbb{R}^3$ might sound problematic, but it isn't. The standard technique for dealing with a problem in this form is to take the directional derivative of $F$ in an arbitrary direction $z$, let's say $DF[z]$, then the solution is a $y(x)$ that makes $DF[z]=0$ for all possible $z$.
$$DF[z]=\lim_{\epsilon\rightarrow 0}\left[\frac{d}{d\epsilon}\int_0^a f(y+\epsilon z, y'+\epsilon z', x) dx\right]\tag{1}$$
You can think of dealing with a calculus of variations problem in the following inelegant heuristic way:
- Go through all possible functions $y(x)$
- See which ones lead to $DF[z]=0$ for any and every $z$. These are the solutions to the problem
- (Optional) Apply boundary conditions to get a unique solution
Therefore, $f$ doesn't need to "know" the whole function $y$, we explicitly give it that information when we do the work in computing $f(y+\epsilon z, y'+\epsilon z', x)$.
If we did as you suggested and used $f(y,x)$, then for any particular input to $f$, we only have the value $y$. Can you work out the value $y'(x)$ if you only know the value of $y(x)$ at $x$ only and nowhere else? No, of course not.
In the example above, at every point of $x$ between $0$ and $a$, $f$ takes $y'$ at that point (unfortunately this is a degenerate example and only depends on $y'$ and not $y$ or $x$) and pops out a scalar, integrating this gives the length of $y(x)$. This integral is the functional $F$ and all you need to do is find $DF[z]$ and apply the three steps above.
To your other question, why $\frac{\partial y'}{\partial y}=0$ and vice-versa is kind of like a notational convention in variational calculus. Of course, $y'$ isn't independent of $y$ but in the context of this function $f:\mathbb{R}^3\rightarrow\mathbb{R}$ which we are taking partial derivatives of when working out $d/d\epsilon$, we want to find out how a small change in the function $y$ of $\epsilon z$ makes a difference to $F$, and obviously this will also make a small change of $\epsilon z'$ in $y'$. Now look at $(1)$, specifically $f(y+\epsilon z, y'+\epsilon z', x)$. We deal with this via a multivariable Taylor series as $\epsilon$ is small. Now when we apply the Taylor series to $f$, $y$ and $y'$ are just separate inputs to the function $f$, they are just real numbers. Not functions. So in this context, $y$ and $y'$ are independent.
The point is, and I really want to emphasise this, we just want to know how changes to the first and second inputs ($y$ and $y'$) of $f$ cause a change in $f$. So we don't care that $y$ and $y'$ are dependent. We see how $f$ is being perturbed by the perturbation in $y$ through the 1st and 2nd inputs and we can work out from this that the stationary point must occur when this perturbation of $f$ is such that the gradient of the resulting change in $F$ is zero, i.e. $DF[z]=0$
I hope this is clear, I think I fully understand and empathise with your concern about the notation but I can also understand why $f(y,x)$ is definitely not the correct notation for some problems, I hope I have made it clear what I mean. If not just ask for clarification. And hopefully everything I've said is accurate, I'm pretty sure it is, but if not let me know.