I find it helps to look at the problem from multiple perspectives, and maybe tell a story. Let me know if it helps.
The first thing, to address your question: the $\delta$ function is not a function. You are absolutely correct that $"\delta" \equiv 0$ in $L^1$. In fact, there is no $L^1(\mathbb{R})$ function $f$ such that
$$\int_{\mathbb{R}} f(x)\phi(x) dx = \phi(0)$$
for any function $\phi \in C^\infty(\mathbb{R})$. This is not a starting point for understanding -- it is where late 19th early 20th century analysis comes in (measure theory, PDE theory, etc.).
You sound confused because $\delta$ doesn't fit into any structure that you might've seen before. But in fact, it's in a lot of the structure you've seen throughout mathematics. Consider group theory. You know that for any group $G$ there is an element $1$ such that $1\cdot g = g$ for all $g \in G$.
Well, consider the vector space $L^1$ of functions. Vector spaces are abelian groups under addition, and the identity for addition is just the 0 function.
That's a little too easy. Let's complicate things. Let's introduce multiplication on $L^1$ by
$$(f*g)(x) = \int_{-\infty}^\infty f(x-y)g(y)dy.$$
This is known as convolution. Motivation #1 for $\delta$: it is the "approximate identity" to convolution.
That's a bit of mouthful, so take a step back. Consider the discrete case. That is, consider functions $f(x)$ where $G = \{1,\dots, n\}$, and
$$(f*g)(x) = \sum_{y \in G} f(x-y)g(y).$$
I want $(f*g)(x) = f(x)$ for all $x$. How do I do this? Well, you can "pick out" the value $f(x)$ by choosing a specific $g$ such that $f(x-0)g(0) = f(x)g(0) = f(x)$ (so this forces $g(0) = 1$), and you want all the other terms in the sum above to vanish, i.e. $f(x-y)g(y) = $ for all $y \neq 0$. This is math, so things are true because you write them down. Let:
$$g(x) = \begin{cases} 1 & if x = 0 \\ 0 & if x \neq 0\end{cases}.$$
Bam. You have an identity.
The $\delta$ (measure, distribution, unit mass --- anything but function) guy is the continuous analogue of $g(x)$. However, the continuous analogue of the sum is an integral, so you when you integrate arbitrarily close around $0$, you want $\int_{B(0,\epsilon)} \delta = 1$. Ain't gonna happen. The best you can do is approximate $\delta$ (these are known as kernels or approximations to the identity). You can in fact choose these guys smoothly -- I'll let you google that.
There are a few important facts about these guys: for any $L^1$ function and approximation $k_n$, (note that $\int k_n = 1$ for all $n$),
$$\lim_{n\rightarrow\infty} (f*k_n)(x) = f(x).$$
Also, if $k_n$ is $C^\infty$, $f*k_n$ is $C^\infty$. Look up mollifiers.
This should give you enough to chew. Now ask yourself about convergence in $L^1$ of these guys and what this says about $\delta = 0$ in $L^1$.