Ok, let's start out by looking at the hydrogenic atom. The wave function $\Psi(r, \theta, \varphi)$ can be split into a radial part $R_{n,\ell}(r)$ and an angular part $Y_{\ell,m} (\theta, \varphi )$, so that $\Psi(r, \theta, \varphi) = R_{n,\ell}(r) Y_{\ell,m} (\theta, \varphi )$, where $n$, $\ell$ and $m$ are the Principal, Azimuthal and Magnetic quantum number, respectively, and the functions $R_{n,\ell}(r)$ are the solutions of the radial Schroedinger equation
\begin{equation}
\bigg( \frac{ - \hbar^{2} }{ 2 m_{\mathrm{e}} } \frac{ \mathrm{d}^{2} }{ \mathrm{d} r^{2} } + \frac{ \hbar^{2} }{ 2 m_{\mathrm{e}} } \frac{ \ell (\ell + 1) }{ r^{2} } - \frac{ Z e^{2} }{ 2 m_{\mathrm{e}} r } - E \bigg) r R_{n,\ell}(r) = 0
\end{equation}
with the nuclear charge $Z$ and the mass of an electron $m_{\mathrm{e}}$.
The probability of finding the electron in the hydrogen-like atom, with the
distance $r$ from the nucleus between $r$ and $r + \mathrm{d}r$, with angle $\theta$ between $\theta$ and $\theta + \mathrm{d}\theta$, and with the angle $\varphi$ between $\varphi$ and $\varphi + \mathrm{d}\varphi$ is
\begin{align}
| \Psi(r, \theta, \varphi) | \, \mathrm{d} \tau = \left[R_{n,\ell}(r) \right]^{2} \left|Y_{\ell,m} (\theta, \varphi ) \right|^{2} r^{2} \sin \theta \, \mathrm{d}r \, \mathrm{d}\theta \, \mathrm{d}\varphi \ .
\end{align}
To find the probability $D_{n, \ell}(r) \, \mathrm{d}r$ that the electron is between $r$ and $r + \mathrm{d}r$ regardless of the direction, you integrate over the angles $\theta$ and $\varphi$ to obtain
\begin{align}
D_{n, \ell}(r) \, \mathrm{d}r &= r^{2} \left[R_{n,\ell}(r) \right]^{2} \, \mathrm{d}r \underbrace{\int_{0}^{\pi} \! \! \int_{0}^{2\pi} \left|Y_{\ell,m} (\theta, \varphi ) \right|^{2} \sin \theta \, \mathrm{d}\theta \, \mathrm{d}\varphi}_{= \, 1} \\
&= r^{2} \left[R_{n,\ell}(r) \right]^{2} \, \mathrm{d}r \ .
\end{align}
Since the spherical harmonics are normalized, the value of the double integral
is unity.
The radial distribution function $D_{n, \ell}(r)$ is the probability density for the
electron being in a spherical shell with inner radius r and outer radius $r + \mathrm{d}r$.
For the $2\ce{s}$ and $2\ce{p}$ states, these functions are
\begin{align}
D_{2, 0}(r) &= \frac{1}{8} \left( \frac{Z}{a_{0}} \right)^{3} r^{2} \left( 2 - \frac{Z r}{a_{0}} \right)^{2} \mathrm{e}^{-Zr/a_{0}} \\
D_{2, 1}(r) &= \frac{1}{24} \left( \frac{Z}{a_{0}} \right)^{5} r^{4} \mathrm{e}^{-Zr/a_{0}} \ ,
\end{align}
where $a_{0}$ is the Bohr radius.
The graph of those two function can be seen here:

The most probable value $r_{\mathrm{mp}}$ of $r$ is found by setting the derivative of $D_{n, \ell}(r)$ with respect to $r$ equal to zero. From the graph above it can be seen that $r_{\mathrm{mp}}(2\ce{p})$ is smaller than $r_{\mathrm{mp}}(2\ce{s})$ (and if you calculate it you can check that for yourself).
But $r_{\mathrm{mp}}$ is not the quantity you are actually interested in. Because although $r_{\mathrm{mp}}$ gives you the most probable distance of the electron from the nucleus you want to know the average distance $\langle r \rangle$ between the electron and the nucleus as that is the distance you will get when you measure $r$ repeatedly and average over your results.
The radial distribution functions may be used to calculate the expectation values
of functions of the radial variable r. You can get them via
\begin{align}
\langle r \rangle_{n, \ell} &= \int_{0}^{\infty} r D_{n, \ell}(r) \, \mathrm{d} r \ .
\end{align}
If you evaluate this integral for the $2\ce{s}$ and $2\ce{p}$ orbitals you get
\begin{align}
\langle r \rangle_{2\ce{s}} = \langle r \rangle_{2, 0} &= \frac{6 a_{0}}{ Z } \\
\langle r \rangle_{2\ce{p}} = \langle r \rangle_{2, 1} &= \frac{5 a_{0}}{ Z } \ .
\end{align}
You find that the $2\ce{s}$ electrons are on average further away from the nucleus than the $2\ce{p}$ electrons.
So, for the hydrogenic atom your claim that $2\ce{p}$ electrons are further away from the nucleus than the $2\ce{s}$ electrons is wrong. But in real atoms you have more than one electron and those electrons will influence each other. One important effect here is, that the inner electrons (or electron density) screens nuclear charge from the outer electrons. Thus the outer electrons feel a lower effective nuclear charge $Z^{\mathrm{eff}}$ than the inner electrons. The extend of this screening is distance dependent, so $Z^{\mathrm{eff}}
= Z^{\mathrm{eff}} (r)$.
Now, if you look back at the formula for $\langle r \rangle_{n, \ell}$ things will certainly change since $D_{n, \ell}(r)$ will have a different form and the additional $r$-dependence of $Z^{\mathrm{eff}}$ will make things more difficult. Assuming that the general form of $D_{n, \ell}(r)$ will be similar in the real atom, it can be expected that the parts of $D_{n, \ell}(r)$ for small $r$ now have a higher weight in the integral of $\langle r \rangle_{n, \ell}$ than before since these parts of the electron density suffer less from the screening effect. Since $2\ce{s}$ electrons have a small portion of the electron density that lies very close to the nucleus ("penetrates deep into the core region") and $2\ce{p}$ electrons don't this will lead to the observation that $2\ce{p}$ electrons are on average further away from the nucleus than the $2\ce{s}$ electrons.
A more involved explanation of the effects at work in multi-electron atoms can be found at this answer of mine. The important bit starts at the paragraph "Why do states with the same $n$ but lower $\ell$ values have lower energy eigenvalues?"