This is going to be a long post, but I think it will be useful. Imagine the following discussion, in the Socratic style:
Teacher: What does it mean when we write $$\lim_{x \to a} f(x) = L?$$
Student: It means that the limit of the function $f(x)$ as $x$ approaches $a$ equals $L$.
Teacher: Yes, but what does that actually mean? What are we saying about the behavior of $f$?
Student: [Pauses to think.] Well, I guess what we are saying is that for values of $x$ "close to" $a$, the function $f(x)$ becomes "close to" $L$.
Teacher: Okay. So how are you defining the concept of "close to?" In particular, what is the notion of "closeness" in a mathematical context? Does it mean that $x = a$?
Student: No--well, maybe sometimes! Of course, if $f(a)$ is well-defined, then we just have $f(a) = L$, but that's not interesting. The whole point of limits is to have a way to describe the function's behavior around the point $x = a$ even when $f$ is not defined at $a$.
Teacher: Right. So..."closeness." How would you define this idea mathematically?
Student: [Very long pause.] I'm not sure. Well, hold on. I think I have it, but it's a sort of geometric argument. When a number $x$ is "close to" another number $a$, we are really talking about the distance between these numbers being small. Like, $2.00001$ is "close to" $2$ because the difference is $0.00001$.
Teacher: But that difference, which you call "distance," isn't necessarily "small" in and of itself, is it? After all, isn't $10^{-10^{100}}$ much, much smaller than $10^{-5}$? "Small" is relative.
Student: [With a little irritation] Yeah, but you know what I mean! If the difference is small enough, then the limit exists!
Teacher: [Chuckles] Yes, I see what you're getting at, but so far, all you've been doing is choosing different vocabulary to describe the same concept. What is "distance?" What is "small enough?" We are mathematicians--when language is insufficiently precise, how do we communicate? Take your time to think about this.
Student: [Sighs] So...what I was doing before, I was calculating a difference between $x$ and $a$ and calling it "small" if I thought it looked like a small number. But really, it's not the signed difference, but the absolute difference $|x - a|$ that matters; and since, as you put it, "small is relative," let's instead use a variable, say $\delta$ (for the "difference"), to represent some bound.... [trails off]
Teacher: Go on....
Student: All right. So if $|x-a| < \delta$, then $x$ is "close to" $a$. Where $\delta$ is some number that we choose in some way that quantifies the extent of closeness.
Teacher: Okay. Is $\delta$ allowed to be zero?
Student: Oh, of course not, no. I forgot. No, we need $$0 < |x - a| < \delta.$$ Then $x$ is, say, "delta-close" to $a$, or in a "delta-neighborhood" of $a$.
Teacher: All right. Now how are you going to tie that to the behavior of $f$?
Student: [Exasperated] Yes, yes, I'm getting to that part. Well, as I had said before, the limit is something where if $x$ is "close to" to $a$, then $f(x)$ is "close to" $L$. Obviously, it's not necessarily the case that $f(x)$ has to have the same extent of "closeness" to $L$ as $x$ does to $a$. For example, if $f(x) = 2x$, then when $x$ is within $\delta$ units of, say, $1$, then $f(x)$ is only bounded within $2\delta$ units of $2$, since $0 < |x-1| < \delta$ implies that $0 < |2x - 2| = |f(x) - 2| < 2\delta$. But functions can be arbitrarily (although not infinitely) steep. I don't see how we can quantify the relationship between the closeness of $x$ to $a$ as it impacts the closeness of $f(x)$ to $L$.
Teacher: You actually kind of touched on it already when you said that functions can be arbitrarily but not infinitely steep. Stated informally another way, it means that the function's value can change very rapidly--in fact, as rapidly as you please--but only finitely so, for some fixed change in $x$. So if you wanted to ensure that the difference between $f(x)$ and $L$, while not necessarily zero, can be made as small you please, how would you do it?
Student: [Long pause.] I think I need a little more help.
Teacher: So far, you've been thinking about using, as you put it, "delta-closeness" to force $f(x)$ to be "close to" $L$. But what if you turned it around and instead said, "I'm going to force $f(x)$ to be as close as I please to $L$; then what does that say about how close $x$ is to $a$? That way, you are guaranteeing that $f(x)$ becomes close to $L$, but the cost of that guarantee is that we need to be sure that--
Student: [Interrupts] Oh, oh! I get it now! Yes. What we need to say is that for a given amount of "closeness" of $f(x)$ to $L$, there is a $\delta$-neighborhood around $a$ where, if you pick any $x$ in that neighborhood, then $f(x)$ will be..."close enough" to $L$--that it will be within that given amount of closeness. In other words, we pick some "tolerance" or error bound between $f(x)$ and the limit $L$ that is our criterion for "close enough." And for that closeness, there is some set of corresponding $x$-values close to $a$ for which we are guaranteed that $f(x)$ meets the closeness criterion.
Teacher: Good, good. But how do we formalize this?
Student: Well it's clear that we need another variable to describe the extent of closeness between $f(x)$ and $L$...let's use $\epsilon$, for "error." And as we did before, we use the absolute difference $|f(x) - L|$ to describe the "distance" between $f(x)$ and $L$. So our criterion has to be $$|f(x) - L| < \epsilon,$$ and this time, we get to pick $\epsilon$ freely, because it represents our tolerance for how much error we will accept between the function's value and its limit, and we must be able to choose this to be arbitrarily small, but not zero.
Teacher: [Looks on silently, smiling]
Student: So let's define a procedure. Pick some $\epsilon > 0$. Then whenever $0 < |x - a| < \delta$--in other words, for every $x$ in a $\delta$-neighborhood of $a$, then $|f(x) - L| < \epsilon$. But I feel like something is missing, because there might not be such a $\delta$. Like if $$f(x) = \begin{cases}-1, & x < 0 \\ 1, & x > 0 \end{cases}$$ then if I pick $\epsilon = 1/2$, the "jump" in $f$ at $x = 0$ is of size $2$. So no matter how small I make the $\delta$-neighborhood around $a = 0$, it will always contain $x$-values that are negative, as well as $x$-values that are positive, and that means any such $\delta$-neighborhood will have points where the function has values $1$ and $-1$. It would be impossible to pick a limit $L$ that is simultaneously within $1/2$ unit of $1$ and $-1$, let alone simultaneously arbitrarily close to $1$ and $-1$.
Teacher: Correct. So if there's an example of a function that has no such $\delta$, what made it so?
Student: I don't get what you mean.
Teacher: Remember how we were talking about ensuring that the (absolute) difference between $f(x)$ and $L$ can be made as small as you please? What consequence or implications does that have on the $\delta$-neighborhood?
Student: Well, there has to be some relationship there. I mean, as our error tolerance decreases, we have to imagine that, in general, there would be fewer $x$-values around $a$ that will satisfy that tolerance, right? So $\delta$ must depend in some way on our choice of $\epsilon$. Well, except in trivial cases like if $f(x)$ is a constant, then any $\delta$ works. But the point is the existence of a $\delta$. It doesn't have to be the largest one, or even unique. We just have to be able to find a sufficiently "small" neighborhood for which all $x$-values in that neighborhood around $a$ will have function values $f(x)$ within the error tolerance we specified to $L$.
Teacher: Right. So if you were to put all of this together, how would you propose we define the concept of a limit?
Student: I'd say something like this:
We say that $$\lim_{x \to a} f(x) = L$$ if, for any $\epsilon > 0$, there exists some $\delta > 0$ such that for every $x$ satisfying $0 < |x - a| < \delta$, one also has $|f(x) - L| < \epsilon$.