Just to make sure you have some intuition for why a function which is "infinity at 0" and 0 elsewhere should act like the delta function (act as an identity for the convolution operator).
The fundamental theorem of calculus says that $\frac{d}{dx} \displaystyle \int_a^x f(t)dt = f(x)$. In other words, $f(x) \approx \displaystyle \int_{x-h}^{x+h} \frac{1}{2h} f(t)dt$. Letting $g_h(u) = \frac{1}{2h}$ on the interval $(-h,h)$ and 0 elsewhere, we see by pure algebraic manipulation that $f(x) \approx \displaystyle \int_{-\infty}^\infty g_h(x-t)f(t)dt$. So the fundamental theorem of calculus can very naturally be rephrased in terms of convolution with a bump function. Differentiation under the integral sign immediately gives the differentiation formula for convolutions, and thus that convolutions of two functions are at least as smooth as both factors. Thus finding good smooth approximations to the rectangular bump functions $g_h$ automatically gives us smooth approximations to any integrable function we like, just by convolving against these "smooth molifiers".
If we let $h \to 0$, the functions $g_h$ do not approach any function, but the fundamental theorem says that convolving against the $g_h$ gives you something close to what you started with. The delta function is something like the limit of these $g_h$.
The definition on wikipedia plays out more nicely with fourier analysis. You choose $g_h$ so that its fourier transform is 1 on something like $[-1/h,1/h]$ and 0 elsewhere. Then by the convolution theorem, the fourier transform of $f \ast g_h$ will just be the fourier transform of $f$ truncated so that it has support $[-1/h,1/h]$. This is a great sequence of mollifiers to choose, with much nicer properties than the $g_h$ I chose above, but it requires more analysis than just the fundamental theorem to make sense of.