The Dirac distribution really is a function – specifically, a functional
$$
\tilde\delta : (\mathbb{R} \to \mathbb{R}) \to \mathbb{R}, \qquad \tilde\delta(f) := f(0).
$$
That definition is perfectly simple and uncontroversial.
The funny thing is, nobody's actually using it this way! For a reason I find strange, physicists and also many mathematicians actually seem more suspicious about such a simple, but “higher-order” function than about a function on the real axis itself, even if it requires “infinite function values” to work.
What's actually going on with the standard definition is this: the functions $\mathbb{R} \to \mathbb{R}$ form a vector space. If you narrow it down to only functions whose square is integrable over the entire domain, you get the $L^2(\mathbb{R})$ Hilbert space.
One of the nice things in Hilbert spaces is the Riesz representation theorem. It says roughly that a Hilbert space is isomorphic to its dual space; in this case meaning, the space of linear functionals† $L^2(\mathbb{R}) \to \mathbb{R}$ is isomorphic to $L^2(\mathbb{R})$ itself. IOW, any square-integrable function has a canonical correspondent functional vice versa. These corresponding pairs are always basically given by imitating the integral over the product. For instance, $g(x) = e^{-x^2/2}$ has the corresponding functional
$$
\tilde g(f) = \int_\mathbb{R}\!\!\mathrm{d}x \: g(x)\cdot f(x).
$$
That choice is canonical because you can reconstruct $g$ from that functional, as the unique unit-norm function which maximises the $L^2$ scalar product. (That this is possible in a Hilbert space – thanks to the completeness property – is the interesting bit about the Riesz representation theorem.)
Naïvely, we could follow from this that $\tilde\delta$ has a corresponding function $\mathbb{R}\to\mathbb{R}$. It is after all a functional on functions, and then we can as well consider it only on square-integrable ones... what's the problem?
Well, the problem† is that $L^2(\mathbb{R})$ is not really just an integrability-restriction of the space of functions. It's actually a space of equivalence classes of such functions: when two functions only differ on a Lebesgue null set, they're considered the same element of $L^2(\mathbb{R})$. And that means $\tilde\delta$ isn't actually defined on $L^2(\mathbb{R})$, because if you change the function only on the point 0 you'd get a different result, but from the “same” argument. And that would be your wrong results from naïve use of $\delta$ as a “real-valued function”: if you evaluate it with functions that are tweaked at a single spot, you can get wrong results.
The reason this isn't usually an issue in physics is the “all functions are continuous” paradigm. Because while every element of $L^2(\mathbb{R})$ contains many functions, each differing only in a null set (e.g. only in discrete points), there is always at most a single continuous such function. So, $\tilde\delta$ is actually well defined as a functional $L^2(\mathbb{R}) \cap \mathcal{C}^0(\mathbb{R}) \to \mathbb{R}$. Then again, that is not a Hilbert space, but it's certainly an actual subset of one, so the physicists are doing ok.
†To be precise (as Hans reminds me to be), the dual space in question is only the space of bounded linear functionals (or equivalently, continuous linear functionals, though I'd remark that continuity on functionals should not be confused with continuity on corresponding functions). So even if $\tilde\delta$ was a well-defined functional – which in fact you can make it by restricting yourself further to the $H^1$ Sobolev space, in which each equivalence-class has exactly one continuous member – you wouldn't be able to apply Riesz, because the functional would not be bounded, i.e. you would be able to construct a sequence of $L^2$ functions that have all the same $L^2$ norm but give infinitely-growing results of $\tilde\delta$.