OK, to continue from discussion in the comments. I think that the confusion could be that they are using the language of matrix calculus, which is just a compressed notation for taking derivatives with respect to elements of matrices, combined with Lagrange multipliers, to derive PCA from what some people would call an "intuitive cost function." However, I think that the authors of what you are reading have been pretty hand-wavey and in fact what they wrote actually does not make much sense. Anyway...
So, there are kind of a couple of different questions that could be separated out here. A few of them will be handled better on their own, so I'll link to other SO answers in those cases.
Optimization problem
It seems like this part is pretty clear to you. We've set up an optimization problem: find $P$ to maximize the trace of $C_Y$
\[f(P) = \operatorname{tr}(P^T C P)\]
subject to the constraint that the columns of $P$ be orthonormal vectors, or in other words subject to\[P^TP=I.\]
Here $C=\frac{1}{m}X^TX$ is the empirical covariance of $X$ (usually after centering!).
Lagrangian
As written, the Lagrangian $f(P)$ can't be right -- you can see this by noticing that $P^TP-I$ is a matrix, so what is the value of the RHS supposed to be, also a matrix? We can try to fix it, but I want to argue that this is actually hard to do -- if you look at this answer:
you'll see that it's not so simple to solve the problem, at least for $r>1$. I think that whoever wrote what you are working with was going for more of a qualitative understanding, and they seem to have ignored some of the complications for the sake of intuition, but this might be what was making things confusing.
In the $r=1$ case, it's not too hard. Our constraint just becomes $P^TP=1$, i.e. $P$ is really just a unit column vector. Then we get the Lagrangian
\[L(P,\lambda) = \operatorname{tr}(P^T C P) - \lambda (P^TP - 1).\]
This is not so hard to solve, and it gives the first principal component -- I'll show that in a second, but first I just want to note that extending this to more components is hard. The complications of doing that are addressed in the question I linked to above, but to get a feel, think about it: what are our constraints? We need all of the unit length constraints $P_i^T P_i=1$ for $i=1,\dots,r$ and all of the orthogonality constraints $P_i^TP_j=0$ for all $i,j$. But now we have more dual variables than were present in what you were given.
Anyway, back to $r=1$. To solve for $P$, take the derivative with respect to the vector $P$ and set equal to 0 using the vector analogies of the matrix calculus identities that you were given:
\[\frac{\partial L}{\partial P} = \frac{\partial \operatorname{tr}(P^T C P)}{\partial P} - \lambda \frac{\partial P^TP}{\partial P}.\]
Note that this is basically what you had above but with a sign change, since the Lagrangian should really be written how I have it here with the $-$ sign in front of $\lambda$. The vector partial derivatives here are just a different notation for gradients, so think of them that way if they are confusing. But the identities you wrote down hold and can help us solve this:
The gradient $\frac{\partial\operatorname{tr}(P^TCP)}{\partial P}$ for $P$ and $B$ both column vectors is the row vector $2CP$.
Similarly the derivative of a the dot product $P^T P$ with respect to $P_i$ is just $P_i$, so we can write the gradient with respect to the whole vector as $2P$ (factor of 2 because $P_i$ shows up on the right also).
Plugging in, we get
\[\frac{\partial L}{\partial P}= 2CP - 2\lambda P\]
Setting this equal to 0 to find the critical point, we get that $CP=\lambda P$, or in other words $P$ is an eigenvector of $C$ with eigenvalue $\lambda$.
Now we have to optimize over $\lambda$, since it's still a free variable -- but to maximize $L(P,\lambda)$, we see that we just take the largest possible $\lambda$, but since we have learned that $\lambda$ must be an eigenvalue, that means taking the largest eigenvalue.
I hope this helped with some intuition, but understanding the full case with $r>1$ as I said earlier takes more work, I guess.