It looks like he is taking the gradient of equation 1 with respect to $y$ and then taking the minimum:
$$
\nabla_y\left(\nabla f(x)^T (y-x) + \frac{\mu}{2} \| y - x \|^2\right)=
\nabla f(x)
+ \mu(y-x)=0,
$$
where the final "$=0$" comes from the minimization, and all the terms in $x$ drop out, so $$\nabla f(x)=-\mu(y - x)$$ at the minimum. If you insert that back into his equation 1 again, you get
$$
\begin{align}
f(x^*) &\geq f(x) - {1\over\mu}\nabla f(x)^T \nabla f(x) + \frac{\mu}{2}{1\over\mu^2}\Vert \nabla f(x)\Vert^2\\
&=f(x)-{1\over 2\mu}\Vert \nabla f(x)\Vert^2\\
\end{align}
$$
(Just in case people are having difficulty understanding this answer, the top equation is the gradient (vector partial derivative) with respect to the $y$ variable, so terms like $\nabla_y \nabla f(x)= 0$ because there are no terms in $y$ in $\nabla f(x)$. The equation is a vector equation, so $y-x$ and $0$ here both mean the vectors. The next equation is also in vectors, then I substitute the vector $y-x$ with $\nabla f/\mu$, and the final inequality is a scalar one.)