37

The Y combinator is a concept in functional programming, borrowed from the lambda calculus. It is a fixed-point combinator. A fixed point combinator $G$ is a higher-order function (a functional, in mathematical language) that, given a function $f$, returns a fixed point of $f$. In mathematical language,

$$f(G(f)) = G(f)$$

This can be considered the defining equation of a fixed-point combinator.

Note that $f$ might be a function whose range and domain are themselves function spaces -- in fact this is the most common use of a fixed-point combinator: you can define a function $\alpha$ by specifying that it is the fixed point of another function $f$, and then compute $\alpha$ as $G(f)$.

As mathematicians we're used to functions having names, eg $f:x\mapsto x^2$ is the function called $f$ that maps $x$ to $x^2$. But there's no reason why you can't have anonymous function. Since the lambda calculus deals with these a lot, there's a special notation for them:

$$\lambda x.x^2$$

is the function that takes $x$ to $x^2$, so that e.g. $(\lambda x.x^2)(2) = 4$. When there's no ambiguity, we can write function application by concatenation: $(\lambda x.x^2) 2 = 4$, and if we defined $f = \lambda x.x^2$ then $f\; 2 = 4$.

Okay, now we get to the meat of the question. The Y combinator is a higher-order function (functional) defined as

$$Y = \lambda f. (\lambda x. f (x\;x)) \; (\lambda x. f (x\;x))$$

I can follow through the algebra and see that this is indeed a fixed-point combinator:

$$\begin{align} Y\; g & = (\lambda f. (\lambda x. f (x\;x)) \; (\lambda x. f (x\;x))) \; g \\ & = (\lambda x. g (x\;x)) \; (\lambda x. g (x\;x)) \\ & = (\lambda y. g (y\;y)) \; (\lambda x. g (x\;x)) \\ & = g \; (\lambda x. g (x\;x)) (\lambda x. g (x\;x)) \\ & = g\; (Y\; g) \end{align}$$

but I have no intuition as to why it works, or how someone might have come up with it. More to the point, I don't see how it can be practically used to compute functions as fixed-points of functionals.

Anyone got a good 'intuitive' explanation?

MJD
  • 65,394
  • 39
  • 298
  • 580
Chris Taylor
  • 28,955
  • 3
    A question about the history of the Y combinator: http://mathoverflow.net/questions/31893/what-is-the-history-of-the-y-combinator – Carl Mummert Jul 13 '11 at 16:18
  • 3
    The Y combinator is informally quite similar to the proof of Kleene's recursion theorem and to the proof of the fixed-point theorem in formal arithmetic. Here is a question where people tried to explain the version in arithmetic: http://mathoverflow.net/questions/30874/arithmetic-fixed-point-theorem/30878#30878 – Carl Mummert Jul 13 '11 at 16:21
  • Actually I'd say $x\mapsto x^2$ is perfectly fine as unnamed function. Indeed, I personally consider it more readable than $\lambda x.x^2$. – celtschk Jan 29 '20 at 15:13

3 Answers3

25

The $Y$ combinator is a function that takes a function $f$ and returns something applied to itself (specifically $\lambda x.f(xx)$). So if we want to make $Y(f)$ a fixed point of $f$, $Y(f)$ has to be equal to $f(Y(f))$. So we want some $a$ such that $aa = f(aa)$. Now, $a$ has access to itself (it is applied to itself). Because of this, we can directly create such an $a$. $$aa=f(aa)$$ $$a=\lambda a.f(aa)$$ $$a=\lambda x.f(xx)$$ $$Y=\lambda f.aa=\lambda f.(\lambda x.f(xx))(\lambda x.f(xx))$$ Essentially, by applying $a$ to itself, you are giving $a$ a reference to itself, allowing it to use itself in a recursive manner. However, $a$ is only an intermediate value - it is not the recursive function itself, as it still needs a reference to itself to do the recursion. The $Y$ combinator completely eliminates this need by finding the fixed point - giving a function its final, recursive form.

gereeter
  • 543
18

If you know Cantor's diagonal argument, then you can discover the Y combinator.

Cantor's Theorem Let $A,B$ be sets and suppose that there exists a fixpoint-free function $f \colon B \to B$. Then there is no surjection $A \to (A \to B)$.

Remark. Usually, $B$ is a $2$-element set, so that $A \to B = \mathcal{P}(A)$, and $f$ is the function that flips the two values.

Proof. Let $g \colon A \to (A \to B)$ be an arbitrary function. We will prove that $g$ is not surjective by constructing a function $h \colon A \to B$ such that $h$ is not in the image of $g$.

We use the diagonal argument: define $h$ to be given by $$ \begin{align*} h(a) = f(g(a)(a)) && (a \in A)\,. \end{align*} $$ Then if $a'$ is any element of $A$, we have that $$ h(a') = f(g(a')(a')) \ne g(a')(a')\,, $$ (since $f$ is fixpoint-free), and therefore that $h\ne g(a')$. Therefore, since $a'$ was arbitrary, $h$ must be outside the image of $g$. $\Box$

In the proof above, $h$ is constructed such that, for any $a' \in A$, $h$ disagrees with the function $g(a')$ on $a'$.

Now it's not said this way often, but Cantor's diagonal argument actually applies to all sorts of type theories other than that of sets and functions, including to the untyped $\lambda$-calculus. The consequences are very different, however.

Write $\Lambda$ for the collection of all $\lambda$-terms. (Technically, we are employing the trick of thinking of the untyped $\lambda$-calculus as a typed $\lambda$-calculus with a single type $\Lambda$ satisfying $\Lambda \to \Lambda = \Lambda$.)

We want to take $A=B=\Lambda$ in our argument above. But what should a function from $\lambda$-terms to $\lambda$-terms be? Why, it is nothing but a $\lambda$-term itself! Therefore, there absolutely is a surjection $$ \Lambda \to (\Lambda \to \Lambda)\,. $$ It is the identity function!

Doesn't this contradict Cantor's argument? The answer is no - since the conclusion of the argument is false, then one of the hypotheses must be false. And in this case, the only hypothesis that can fail is the existence of a fixpoint-free function $\Lambda \to \Lambda$.

That's right - there is no fixpoint-free $\lambda$-term. But since the argument above is completely constructive, we can do even better: we can construct that fixpoint ourselves by carrying out the argument in reverse.

Let's do this in general first, and then we'll see how things work in the untyped $\lambda$-calculus. Everything in the following argument comes about as a direct result of writing out the steps of the previous proof in reverse.

Contrapositive to Cantor's Theorem Let $g \colon A \to (A \to B)$ be a surjection. Then there is no fixpoint-free function $f \colon B \to B$.

Proof. Consider the function $h \colon A \to B$ given by $$ h(a) = f(g(a)(a))\,. $$ Since $g$ is surjective, there is some $a'$ such that $g(a')=h$. Then we have $$ g(a')(a') = h(a') = f(g(a')(a'))\,, $$ and therefore $g(a')(a')$ is a fixpoint of the function $f$.$\Box$

How does this work for the untyped $\lambda$-calculus? Well, this time our surjection $g$ is the identity function, so $h$ becomes $$ h = \lambda x.f(x\,x)\,. $$ Since $g$ is the identity function, we can choose $a' = h$. Then our final expression for the fixpoint is $$ h\,h = (\lambda x.f(x\,x))(\lambda x.f(x\,x))\,. $$ And indeed we have $$ (\lambda x.f(x\,x))(\lambda x.f(x\,x)) = f((\lambda x.f(x\,x))(\lambda x.f(x\,x)))\,. $$ There's more! We can $\lambda$-abstract out the $f$ to give us a general-purpose fixpoint combinator. $$ \lambda f.(\lambda x.f(x\,x))(\lambda x.f(x\,x)) $$ Hello old friend.

jII
  • 3,050
John Gowers
  • 24,959
  • 1
    Related: https://math.stackexchange.com/questions/1002540/fixed-points-in-computability-and-logic – Hanno Jan 29 '20 at 15:09
  • 3
    This is rare and the best answer IMO. While most tutorials teach why $Y$ sends a function to fixed point, this actually teaches you how to reinvent $Y$! – Student May 12 '20 at 18:26
7

The $Y$ combinator can be defined also Turing way. Let us use less formal but more descriptive variable names to define the combinator: $$ \begin{align} A &= \lambda \text{self}.\lambda f.f\ (\text{self}\ \text{self}\ f);\\ Y &= A\ A. \end{align} $$

The $A$ combinator can be considered as a function that takes itself as its first argument and the target function, fixed point of which is to be found, as the second one. It returns the function applied to $A\ A\ f$, just like we wanted.

Then, an arbitrary combinator F which is defined in the terms of itself like $$ F\ x\ y = F\ y\ x\ F $$ can be defined in terms of $Y$ and some other combinator $F_r$. Specifically, $$ F\ x\ y = F\ y\ x\ F $$ is equivalent to $$ F = \lambda x.\lambda y.F\ y\ x\ F, $$ which in turn follows from $$ F = (\lambda \text{self}.\lambda x.\lambda y.\text{self}\ y\ x\ \text{self}) F. $$ Finally, let us call the resulting subexpression $$ F_r = \lambda \text{self}.\lambda x.\lambda y.\text{self}\ y\ x\ \text{self}. $$ Then $F = Y\ F_r$.