Some references for the rate of $L^p$ approximation in terms of Sobolev norms are given
here.
Some insightful remarks on the rate of convergence of mollified
functions are found here.
Below I give a proof of the estimate (not in your interpretation, which misses important points).
Let $\varphi$ be as in the paper you are reading. For $h>0$ let $\varphi_h(x)=h^{-1}\varphi(x/h)$. That the support of $\widehat{\varphi_h}$ is contained in $\{\xi: |\xi|\le (\pi+\varepsilon)/h\}$ is not of much help. What really matters
is that $\widehat {\varphi_h}=1$ on $\{\xi: |\xi|\le c/h\}$ where $c=\pi-\varepsilon$. Since the Fourier transform of $f*\varphi_h$ agrees with $\widehat f$ when
$|\xi|\le c/h$, the transform of $f-f*\varphi_h$ vanishes for such $\xi$.
As usual, the case $p=2$ is easier to deal with. Let $M$ be the supremum of $|\widehat \varphi| $ (which can be
arranged to be $1$). Then
$$
\begin{split}
\|f-f*\varphi_h\|_{L^2}^2 & = \int_{|\xi|\ge c/h} |\widehat f_1(\xi)|^2 d\xi
\\ & \le (h/c)^{2k} \int_{\mathbb R} |\xi|^{2k} |\widehat f_1(\xi)|^2 d\xi
\\ &\le (h/c)^{2k} M \int_{\mathbb R} |\xi|^{2k} |\widehat f(\xi)|^2 d\xi
\\ &= (h/c)^{2k} M |f|_{W^{k,2}}^2
\end{split}
\tag1$$
The idea for the general case $1<p<\infty$ is about the same: $f-f*\varphi_h$ has only high frequencies of
$f$ (those above $c/h$), which are magnified in $f^{(k)}$ by factors of at least $(c/h)^k$. But to relate the Fourier transform to the $L^p$ norm, we need the Littlewood-Paley decomposition of $f$:
$$f=\sum_{j\in \mathbb Z} f_j$$
where $ f_j =f*\varphi_{2^{-j}}-f*\varphi_{2^{-j+1}}$.
The important point here is that $\widehat{f_j}$ is supported in a roughly dyadic annulus of size $2^{j}$. These frequencies get magnified by about $2^{jk}$ in the $k$th derivative. The Littlewood-Paley theorem says that $\|f\|_{L^p}$ is comparable to the
$L^p$ norm of the square function of $f$:
$$\|f\|_{L^p}\approx \left\|\left(\sum_{j\in\mathbb Z} |f_j|^2\right)^{1/2}\right\|_{L^p} \tag{LP}$$
Use (LP) for $f^{(k)}$ and for $f-f*\varphi_h$:
$$
\|f^{(k)}\|_{L^p} \approx \left\|\left(\sum_{j\in\mathbb Z} 2^{2jk}|f_j|^2\right)^{1/2} \right\|_{L^p}
\gtrsim (c/h)^k \left\|\left(\sum_{2^j\ge c/h} |f_j|^2\right)^{1/2} \right\|_{L^p}
\tag2$$
and
$$
\|f-f*\varphi_h\|_{L^p} \lesssim \left\|\left(\sum_{2^j\ge c/h} |f_j|^2\right)^{1/2} \right\|_{L^p}
\tag3$$
because the low frequencies ($2^j<c/h$) cancel out in $ f-f*\varphi_h$. Comparing (2) and (3) we conclude with $\|f-f*\varphi_h\|_{L^p}\lesssim h^k \|f^{(k)}\|_{L^p}$.