Let us focus on finite-dimensional Hilbert spaces $H$. Suppose we have a self-adjoint linear map $T:H\to H$. In order to prove the existence of an orthonormal basis of eigenvectors, one starts off by proving that there is one eigenvalue-eigenvector pair $(\lambda,v)$ (and one can normalize $v$). Then, one shows that $T$ restricts to a linear map on the orthogonal complement $\{v\}^{\perp}\to\{v\}^{\perp}$ and the restriction is still self-adjoint. Then, one can continue inductively, and the process will terminate in a finite number of steps (since $H$ is finite-dimensional), which means we have found an orthonormal basis of eigenvectors, and hence we’re done.
So, it all boils down to proving the existence of a single eigenvalue-eigenvector pair. Now, consider the quantity
\begin{align}
\lambda:=\sup\limits_{\|y\|=1}\langle T(y),y\rangle=\sup\limits_{y\in H\setminus\{0\}}\frac{\langle T(y),y\rangle}{\|y\|^2}.
\end{align}
Keep in mind that since $T$ is self-adjoint, we’re taking the supremum of real-valued quantities. Next, since $T$ is a linear map on a finite-dimensional space, it is automatically a bounded operator (equivalently a continuous linear map). Hence, $y\mapsto \langle T(y),y\rangle$ is a continuous function on the compact (by finite-dimensionality!) unit sphere $S$ of $H$. Hence, by the extreme value theorem, this supremum is finite (i.e $\lambda\in\Bbb{R}$ is well-defined) and in fact this supremum is actually attained by some vector $v\in S$. We now claim that the number $\lambda$ we have defined above is actually an eigenvalue of $T$ and that this vector $v$ is an eigenvector.
One way to prove this is by differential calculus. Consider the function $f:H\setminus\{0\}\to \Bbb{R}$ defined as $f(y)=\frac{\langle T(y),y\rangle}{\|y\|^2}$. We have argued above that $\lambda$ is the maximum value of $f$ and that the unit vector $v$ is a maximum point. By basic differential calculus, it follows that $Df_v=0$ “derivative at a maximum point vanishes”. Hence, for all $z\in H$, we must have $Df_v(z)=0$. This is where the Wiki page seems to get tripped up with the computations. By the chain rule, the equation $Df_v(z)=0$ is equivalent to perhaps the more familiar directional derivative $\frac{d}{dt}\bigg|_{t=0}f(v+tz)=0$. I guess people are more comfortable with directional derivatives, which is why Wiki decided to write things in that manner, so here goes with the computation (keep in mind that $\|v\|=1$): for all $z\in H$,
\begin{align}
0&=\frac{d}{dt}\bigg|_{t=0}f(v+tz)\\
&=\frac{d}{dt}\bigg|_{t=0}\frac{\langle T(v+tz),v+tz\rangle}{\|v+tz\|^2}\\
&=\frac{\|v+tz\|^2\left[\langle T(z),v+tz\rangle + \langle T(v+tz), z\rangle\right] - \langle T(v+tz),v+tz\rangle \cdot 2\text{Re}(\langle v,z\rangle)}{\|v+tz\|^4}\bigg|_{t=0}\\
&= \langle T(z),v\rangle+\langle T(v),z\rangle-2\lambda\text{Re}\left(\langle v,z\rangle\right)\tag{$*$}\\
&=2\text{Re}\left(\langle T(v),z\rangle\right)
-2\lambda\text{Re}\left(\langle v,z\rangle\right)\tag{$T$ self adjoint}\\
&=2\text{Re}\left\langle T(v)-\lambda v,z\right\rangle.
\end{align}
I have used things like the quotient rule and ‘product rule’ (which is valid since the inner product is bilinear; see here for a general product rule).
Once again, note that I’ve used $\|v\|=1$ and the definition $\lambda=\langle T(v),v\rangle$ in $(*)$, and in the last line I used that $\lambda$ is real, hence I was able to move it inside the real part. Notice that this equality holds for all $z$, so by applying this to $iz$, we get that the imaginary part also vanishes (of course on a real vector space we don’t need this extra step). Hence, for all $z\in H$ the inner product vanishes, and thus we must have $T(v)-\lambda v=0$, which proves that (since $v\neq 0$) $v$ is an eigenvector of $T$ with eigenvalue $\lambda$.
See Loomis and Sternberg’s Advanced Calculus for a slight variant of the proof.
Once we define $v$ and $\lambda$ using the extreme-value theorem, they prove (page 258) that $(v,\lambda)$ are an eigenvalue-eigenvector pair slightly more algebraically, using a Cauchy-Schwarz type trickery (the necessary variant of Cauchy-Schwarz is proven on page 249, Theorem 1.1).