If we view the $\delta$ as the limit of something called "nascent" delta functions (which I think can be understood just fine in this context without much advanced math).
For any (positive for convenience) $\eta$ with $\int_\mathbb{R}\eta=1$ on $\mathbb{R}$, we define
$$
\eta_\epsilon=\frac{1}{\epsilon}\eta\left(\frac{x}{\epsilon}\right)
$$
which still integrates to $1$ (by u substitution).
Then,
$$
\lim_{\epsilon\to 0}\int_\mathbb{R}\eta_\epsilon(x-y)f(y)\mathrm dy=f(x)
$$
i.e. these nascent deltas behave like the dirac delta when $\epsilon>0$ is small; they forget parts of $f$ far away from $x$, and remember things better and better as you get closer to $x$.
Examples of the above:
The heat kernel/Gaussian,
$$
\eta_\epsilon(x)=\frac{1}{\sqrt{2\pi \epsilon}}\exp\left(\frac{-x^2}{2\epsilon}\right)
$$
or a very nice example where we use a sequence of integers to approximate the delta at $0$,
$$
\eta_n(x)=\begin{cases}\frac{n}{2}\cos(nx)& |x|<\frac{\pi}{2n}\\
0&\text{otherwise}
\end{cases}
$$
where we have
$$
\lim_{n\to \infty}\frac{n}{2}\int_{-\frac{\pi}{2n}}^{\frac{\pi}{2n}}\eta_n(y)f(y)\mathrm dy=f(0)
$$
for $f$ nice (it's easy to see for smooth $f$ using IBP, still not so bad to prove for $f$ continuous). I like this one, since you can really see the approximation happening explicitly.
Anyway, to your question, with this definition becomes not so bad, since taking $f\equiv 1$, we have
$$
\int_{\mathbb{R}}\delta(x)\mathrm dx=\lim_{\epsilon\to 0}\int_\mathbb{R}\eta_{\epsilon}(x-y)f(y)\mathrm dy=
\lim_{\epsilon\to 0}\int_\mathbb{R}\eta_{\epsilon}(x-y)\mathrm dy=1
$$
since the integral of each $\eta_\epsilon=1$.