Y-combinator

Posted by Nicholas Chen Thu, 13 Jul 2006 17:02:02 GMT

The Y-Combinator is something that has taken up a significant portion of my life these past few days. There are a lot of examples online describing it. However, most of them have not been presented in an intuitive sense (maybe because the Y-combinator is rather counterintuitive?). Also, most of them have not really explain the need for the Y-combinator when using Scheme/Lisp/etc. since we already have stuff like (define...), (letrec...) at our disposal.

Anyway, the explanation given inside the book The Little Schemer was pretty interesting though I believe that it can be made clearer. As aforementioned, the intent of the Y-combinator could have been presented. Nonetheless, the best derivation of the Y-combinator that I can find on the web belongs to this page. Since I have spent the past few days mulling over this, I -- rather presumptuously-- feel that I can do a good job at trying to explain the derivation of the Y-combinator. This derivation is not going to be very formal but will hinge on making things as intuitive as possible. And even if no one else benefits from this derivation, I have the satisfaction of trying to write this clearly to enlighten myself in the future should I need to derive this again. Most of the work here will be inspired and influenced by the link above.

First question: Why do we need the Y-combinator since we can basically accomplish all our programming needs without ever making use of it? After all, we have (define...) which is more than sufficient to make all our functions. Right? Precisely. The Y-combinator is more of an exercise in lambda calculus. It belongs to a class of fixed point combinators. Just like other mathematical theorems, it is something to be admired and to be viewed as beautiful.

The Y-combinator is a fancy way of making recursive functions, without giving a real name to the function. Sounds confusing? Read on.

Let's start with the simple function below.

1  (define length
2    (lambda (l)
3       (if (null? l)
4           0
5              (+ 1 (length (cdr l))))))
6
7  (length '(a b c d e))
8
Listing 1


In the function above, we made use of (define ...) to give a name to the recursive function. The function length takes a list and returns the number of items in that list. It does this recursively, adding 1 to each item it encounters. As you can see above, inside the function length, we made a call to length itself passing it (cdr l) as its argument. This is the recursive step. It continues until it (null? l) evaluates to true (this is the base case). For most people who have at least taken a course in programming, this is usually what recursion means for them: having a function call itself repeatedly until it reaches a base case. The question now: Can we create the function length without being able to give it a name?

But first, a simple refresher course. Scheme has the ability to define functions without giving them names. Here is an example of an anonymous function and how we use that function.

 1  ; Here is the anonymous function
 2  ; It's basically useless since we are not using it.
 3  ; And because it has no name, we cannot refer to it later.
 4  (lambda (x) (+ x 1))
 5
 6  ; Here is how we actually use it
 7  ((lambda (x) (+ x 1)) 6)
 8
 9  ; And if we want to use it anywhere else, we need to type it out.
10  ((lambda (x) (+ x 1)) 7)
Listing 2


The code snippet above hints at the fact, that if we are going to use anonymous functions, everywhere that we use them, we need to type it out all over again. There is another way we can use them though, by passing them in as arguments to another function, but we are getting ahead of ourselves here. So here is the plan: We are going to define the recursive function length without actually giving it a name! And here is the first step. We are applying this function to the list '(a b c d e). Other than that '(a b c d e) does not play any other role in shaping our function.

1  ((lambda (f l)
2     (if (null? l)
3       0
4       (+ 1 (f f (cdr l)))))
5   (lambda (f l)
6     (if (null? l)
7       0
8       (+ 1 (f f (cdr l)))))
9   '(a b c d ee))
Listing 3


There is a quite a bit of stuff going on in there, so I will try to explain it part by part. Below is our anonymous function. Notice how much it looks like the original length function except now it takes an additional argument f. The argument f takes the place of the call to length earlier. Remember, now we have no name for length so we cannot call it length. Also, notice how there is only one '(' before the lambda. this is important. It means that we are not "applying" any arguments to this anonymous function. Which actually means that what we have here is pretty much useless for the time being.

1  (lambda (f l)
2     (if (null? l)
3       0
4       (+ 1 (f (cdr l) f)))
Listing 4


However, notice that instead of just taking in the list l, it also take another argument f. That other argument f should be a function that we want it to apply to itself. As we discussed earlier, recursive functions are functions that call themselves. So in that case, what do we pass in as f? Why, it must be a copy of the same identical function of course! If this confuses you, recall that recursion is calling the same function over and over. Another line that might be confusing is line 4. (f (cdr l) f) is calling the function f on the remaining items on the list and passing f again as the second function. Always repeat to yourself, we also want to pass the same function over and over. That is the mantra.

In case you have not tried it, the code in listing 3 really does work! It is basically the function length defined without every giving the name length to it. And that is the basis of the Y-combinator. If you get the part above, the rest will be simple. If you have done any software engineering before, you will be familiar with the concept of refactoring. What we are going to do below will be a special kind of refactoring: extract method. We want to extract the trick behind what we just did in listing 4 and get it to work with other functions. That way we can define recursive functions without giving the functions any names.

What we want to do now is begin extracting the Y-combinator. Most of these steps might seems arbitrary: we should be able to apply any form of lambda calculus to simplify and extract the Y-combinator. Just like solving algebra equations, doing it in this order makes the extraction that much simpler.

 1  (((lambda (f)
 2      (lambda (l)
 3        (if (null? l)
 4          0
 5          (+ 1 ((f f) (cdr l))))))
 6    (lambda (f)
 7      (lambda (l)
 8        (if (null? l)
 9          0
10          (+ 1 ((f f) (cdr l)))))))
11   '(a b c d e))
Listing 5


What we have just done is called "currying". Basically, it means that any function of two or more arguments can be expressed as a cascade of functions of one argument. More information can be found here. At this stage, the parentheses begin to really makes things a bit harder to read. I suggest you open the code snippet above in DrScheme. It will show the regions where each for each pair of parentheses. Trust me, this makes it much easier to follow. Here is an attempt at explaining what currying is.

 1  ; This is an anonymous function that takes 2 arguments
 2  ((lambda (x y)
 3     (+ x y)) 1 2)
 4
 5  ; Here we curry the function above
 6  ; We have 2 levels of lambda nested together
 7  (lambda (x)
 8    (lambda (y)
 9      (+ x y)))
10
11  ; Think of it as applying the arguments in stages
12  ; Stage 1 -- this returns a function
13  ; It's as though you had written
14  ; (lambda (y)
15  ;  (+ 1 y))
16
17  ((lambda (x)
18     (lambda (y)
19       (+ x y))) 1)
20
21  ; Stage 2
22  (((lambda (x)
23      (lambda (y)
24        (+ x y)))1) 2)
25
Listing 6


Listing 6 should help clarify why there are more parentheses surrounding the lambda in listing 5. But listing 5 is far from complete. The "function" that looks like the original length function is still there. We need to extract that as well. Here is what we will be working on from now on. Notice that all we did is just remove the application of our function to the list '(a b c d e)'

 1  ((lambda (f)
 2     (lambda (l)
 3       (if (null? l)
 4         0
 5         (+ 1 ((f f) (cdr l))))))
 6   (lambda (f)
 7     (lambda (l)
 8       (if (null? l)
 9         0
10         (+ 1 ((f f) (cdr l)))))))
Listing 7


Look carefully at Listing 7, lines 6-10 are being repeated twice. That repetition is just hinting for some refactoring. What we are going to do is extract lines 6-10 and pass it in as a parameter. Here is how we do it:

 1  ; This is the function that calls the parameter twice.
 2  ; We call this parameter mklength since it is 
 3  ; used to "make the function length"
 4  ; This is essentially what we did above.
 5  (lambda (mklength)
 6      (mklength mklength))
 7
 8  ; Here is how we apply it
 9  ((lambda (mklength)
10        (mklength mklength))
11    (lambda (f)
12        (lambda (l)
13            (if (null? l)
14                  0
15                  (+ 1 ((f f) (cdr l)))))))
16
17  ; As a sanity check, here is to test that it still works
18  (((lambda (mklength)
19          (mklength mklength))
20      (lambda (f)
21          (lambda (l)
22              (if (null? l)
23                    0
24                    (+ 1 ((f f) (cdr l)))))))
25    '(a b c d e

Before the next step, we are going to do a simple transformation that preserves identity. It's kind of like substituting (a + b)2 = a2 + 2ab + b2. This step is important because of something called applicative order evaluation which is the default evaluation in Scheme. In a nutshell, applicative evaluation will try to evaluate all the arguments to a function first before actually calling the original function. For instance, (f (+ 2 4) (- 5 3)) will evaluate both (+ 2 4) and (- 5 3) regardless of whether or not the function f actually uses them. This is the case if f is defined as (define f (lambda x y) (if (= x 6) 6 y)). Since (+ 2 4) is actually 6, there is no need for us to evaluate the second argument, (+ 5 3). Our transformation is necessary so that we do not end up in an infinite recursion step when Scheme tries to evaluate the arguments first. For more information about the order of evaluation, please visit this link. Anyway, here is the transformation we are going to do:

 1  ; If f is a function of one argument, then
 2  ; the following is equal to f
 3  (lambda (x) (f x))
 4
 5  ; Similarly, if (f f) returns a function of one argument
 6  ; Then (f f) is identical to
 7  (lambda (x) ((f f) x))
 8
 9  ; Bottom line, wrapping a one-argument function in a lambda
10  ; does not change that function. Here is another example
11  (define add1
12      (lambda (x)
13          (+ 1 x)))
14
15  (add1 2)
16
17  ; Here we wrap the functiona add1 in a lambda first and then 
18  ; evaluate it. They both yield the same answer.
19  ((lambda (x) (add1 x)) 2)

Listing 9

This is what listing 8 looks like after we have performed the transformation. Also notice since I have defined add1 I will be using it from now on in lieu of (+ 1 ...).

1  ((lambda (mklength)
2        (mklength mklength))
3    (lambda (f)
4        (lambda (l)
5            (if (null? l)
6                  0
7                  (add1 ((lambda (x) ((f f) x)) (cdr l)))))))
Listing 10


We are almost done. As our final step, we are going to extract the function that looks like length (a very bastardized copy of length, albeit one that still maintains the structure of length). Follow closely what we are doing to do. We are going to remove ((lambda (x) ((f f) x)) and pass it in as a parameter. How do we do that? We surround lines 3-7 with another lambda that takes one single argument. That argument will be ((lambda (x) ((f f) x)). Here is what it looks like:

1  ((lambda (mklength)
2     (mklength mklength))
3   (lambda (f)
4     ((lambda (length)
5        (lambda (l)
6          (if (null? l)
7            0
8            (add1 (length (cdr l))))))
9      (lambda (x) ((f f) x)))))
Listing 11

Remember, all we are doing it just extracting ((lambda (x) ((f f) x)) by passing it as a parameter. Notice how lines 4-8 now look like the original function length that we had earlier. Finally, here is the end product.

 1  ((lambda (length)
 2        ((lambda (mklength)
 3              (mklength mklength))
 4          (lambda (f)
 5              (length
 6                (lambda (x) ((f f) x))))))
 7    ; This part gets passed in as the argument length in line 1  
 8    (lambda (length)
 9        (lambda (l)
10            (if (null? l)
11                  0
12                  (add1 (length (cdr l)))))))
Listing 12


And now, we can generalize lines 1-6 from listing 12 above. That is the Y-combinator that we were seeking. Here it is in all its glory.

 1  ; Here is the Y-combinator
 2
 3  (lambda (function-to-make-recursive)
 4    ((lambda (mkfunction)
 5       (mkfunction mkfunction))
 6     (lambda (f)
 7       (function-to-make-recursive
 8        (lambda (x) ((f f) x))))))
 9
10  ; And here is how we use it to make the factorial function
11  ; and also apply it to some argument
12
13  (((lambda (function-to-make-recursive)
14      ((lambda (mkfunction)
15         (mkfunction mkfunction))
16       (lambda (f)
17         (function-to-make-recursive
18          (lambda (x) ((f f) x))))))
19
20    (lambda (factorial)
21      (lambda (x)
22        (if (zero? x)
23            1
24            (* x (factorial (- x 1)))))))
25
26   5)
27
28  ; If you wish, you can even define the Y-combinator
29
30  (define Y
31    (lambda (function-to-make-recursive)
32      ((lambda (mkfunction)
33         (mkfunction mkfunction))
34       (lambda (f)
35         (function-to-make-recursive
36          (lambda (x) ((f f) x)))))))
37
38  ; And use it in the future to make more recursive functions.

Hopefully, what I have writing has made some of the aspects of the Y-combinator clearer for some of you. In the process of writing this, I myself have become better acquainted with Scheme's parentheses. And I am now definitely more comfortable with doing the transformations now and extracting common code out from functions. Of course, I feel even more comfortable if I can do those extractions in an application like DrScheme that shows the boundaries clearly.

Links to other interesting articles discussing the derivation of the Y-combinator:

Trackbacks

Use the following link to trackback from your own site:
http://blog.vazexqi.com/trackbacks?article_id=y-combinator&day=13&month=07&year=2006

Comments

Leave a comment

Comments