The Y combinator is a concept in functional programming, borrowed from the lambda calculus. It is a fixed-point combinator. A fixed point combinator $G$ is a higher-order function (a functional, in mathematical language) that, given a function $f$, returns a fixed point of $f$. In mathematical language,
$$f(G(f)) = G(f)$$
This can be considered the defining equation of a fixed-point combinator.
Note that $f$ might be a function whose range and domain are themselves function spaces -- in fact this is the most common use of a fixed-point combinator: you can define a function $\alpha$ by specifying that it is the fixed point of another function $f$, and then compute $\alpha$ as $G(f)$.
As mathematicians we're used to functions having names, eg $f:x\mapsto x^2$ is the function called $f$ that maps $x$ to $x^2$. But there's no reason why you can't have anonymous function. Since the lambda calculus deals with these a lot, there's a special notation for them:
$$\lambda x.x^2$$
is the function that takes $x$ to $x^2$, so that e.g. $(\lambda x.x^2)(2) = 4$. When there's no ambiguity, we can write function application by concatenation: $(\lambda x.x^2) 2 = 4$, and if we defined $f = \lambda x.x^2$ then $f\; 2 = 4$.
Okay, now we get to the meat of the question. The Y combinator is a higher-order function (functional) defined as
$$Y = \lambda f. (\lambda x. f (x\;x)) \; (\lambda x. f (x\;x))$$
I can follow through the algebra and see that this is indeed a fixed-point combinator:
$$\begin{align} Y\; g & = (\lambda f. (\lambda x. f (x\;x)) \; (\lambda x. f (x\;x))) \; g \\ & = (\lambda x. g (x\;x)) \; (\lambda x. g (x\;x)) \\ & = (\lambda y. g (y\;y)) \; (\lambda x. g (x\;x)) \\ & = g \; (\lambda x. g (x\;x)) (\lambda x. g (x\;x)) \\ & = g\; (Y\; g) \end{align}$$
but I have no intuition as to why it works, or how someone might have come up with it. More to the point, I don't see how it can be practically used to compute functions as fixed-points of functionals.
Anyone got a good 'intuitive' explanation?