Y-combinator
The Y-Combinator is something that has taken up a significant portion of my life these past few days. There are a lot of examples online describing it. However, most of them have not been presented in an intuitive sense (maybe because the Y-combinator is rather counterintuitive?). Also, most of them have not really explain the need for the Y-combinator when using Scheme/Lisp/etc. since we already have stuff like (define...), (letrec...) at our disposal.
Anyway, the explanation given inside the book The Little Schemer was pretty interesting though I believe that it can be made clearer. As aforementioned, the intent of the Y-combinator could have been presented. Nonetheless, the best derivation of the Y-combinator that I can find on the web belongs to this page. Since I have spent the past few days mulling over this, I -- rather presumptuously-- feel that I can do a good job at trying to explain the derivation of the Y-combinator. This derivation is not going to be very formal but will hinge on making things as intuitive as possible. And even if no one else benefits from this derivation, I have the satisfaction of trying to write this clearly to enlighten myself in the future should I need to derive this again. Most of the work here will be inspired and influenced by the link above.
First question: Why do we need the Y-combinator since we can basically accomplish all our programming needs without ever making use of it? After all, we have (define...) which is more than sufficient to make all our functions. Right? Precisely. The Y-combinator is more of an exercise in lambda calculus. It belongs to a class of fixed point combinators. Just like other mathematical theorems, it is something to be admired and to be viewed as beautiful.
The Y-combinator is a fancy way of making recursive functions, without giving a real name to the function. Sounds confusing? Read on.
Let's start with the simple function below.
1 (define length 2 (lambda (l) 3 (if (null? l) 4 0 5 (+ 1 (length (cdr l)))))) 6 7 (length '(a b c d e)) 8
In the function above, we made use of (define ...) to give a name to the recursive function. The function length takes a list and returns the number of items in that list. It does this recursively, adding 1 to each item it encounters. As you can see above, inside the function length, we made a call to length itself passing it (cdr l) as its argument. This is the recursive step. It continues until it (null? l) evaluates to true (this is the base case). For most people who have at least taken a course in programming, this is usually what recursion means for them: having a function call itself repeatedly until it reaches a base case. The question now: Can we create the function length without being able to give it a name?
But first, a simple refresher course. Scheme has the ability to define functions without giving them names. Here is an example of an anonymous function and how we use that function.
1 ; Here is the anonymous function 2 ; It's basically useless since we are not using it. 3 ; And because it has no name, we cannot refer to it later. 4 (lambda (x) (+ x 1)) 5 6 ; Here is how we actually use it 7 ((lambda (x) (+ x 1)) 6) 8 9 ; And if we want to use it anywhere else, we need to type it out. 10 ((lambda (x) (+ x 1)) 7)
The code snippet above hints at the fact, that if we are going to use anonymous functions, everywhere that we use them, we need to type it out all over again. There is another way we can use them though, by passing them in as arguments to another function, but we are getting ahead of ourselves here. So here is the plan: We are going to define the recursive function length without actually giving it a name! And here is the first step. We are applying this function to the list '(a b c d e). Other than that '(a b c d e) does not play any other role in shaping our function.
1 ((lambda (f l) 2 (if (null? l) 3 0 4 (+ 1 (f f (cdr l))))) 5 (lambda (f l) 6 (if (null? l) 7 0 8 (+ 1 (f f (cdr l))))) 9 '(a b c d ee))
There is a quite a bit of stuff going on in there, so I will try to explain it part by part. Below is our anonymous function. Notice how much it looks like the original length function except now it takes an additional argument f. The argument f takes the place of the call to length earlier. Remember, now we have no name for length so we cannot call it length. Also, notice how there is only one '(' before the lambda. this is important. It means that we are not "applying" any arguments to this anonymous function. Which actually means that what we have here is pretty much useless for the time being.
1 (lambda (f l) 2 (if (null? l) 3 0 4 (+ 1 (f (cdr l) f)))
However, notice that instead of just taking in the list l, it also take another argument f. That other argument f should be a function that we want it to apply to itself. As we discussed earlier, recursive functions are functions that call themselves. So in that case, what do we pass in as f? Why, it must be a copy of the same identical function of course! If this confuses you, recall that recursion is calling the same function over and over. Another line that might be confusing is line 4. (f (cdr l) f) is calling the function f on the remaining items on the list and passing f again as the second function. Always repeat to yourself, we also want to pass the same function over and over. That is the mantra.
In case you have not tried it, the code in listing 3 really does work! It is basically the function length defined without every giving the name length to it. And that is the basis of the Y-combinator. If you get the part above, the rest will be simple. If you have done any software engineering before, you will be familiar with the concept of refactoring. What we are going to do below will be a special kind of refactoring: extract method. We want to extract the trick behind what we just did in listing 4 and get it to work with other functions. That way we can define recursive functions without giving the functions any names.
What we want to do now is begin extracting the Y-combinator. Most of these steps might seems arbitrary: we should be able to apply any form of lambda calculus to simplify and extract the Y-combinator. Just like solving algebra equations, doing it in this order makes the extraction that much simpler.
1 (((lambda (f) 2 (lambda (l) 3 (if (null? l) 4 0 5 (+ 1 ((f f) (cdr l)))))) 6 (lambda (f) 7 (lambda (l) 8 (if (null? l) 9 0 10 (+ 1 ((f f) (cdr l))))))) 11 '(a b c d e))
What we have just done is called "currying". Basically, it means that any function of two or more arguments can be expressed as a cascade of functions of one argument. More information can be found here. At this stage, the parentheses begin to really makes things a bit harder to read. I suggest you open the code snippet above in DrScheme. It will show the regions where each for each pair of parentheses. Trust me, this makes it much easier to follow. Here is an attempt at explaining what currying is.
1 ; This is an anonymous function that takes 2 arguments 2 ((lambda (x y) 3 (+ x y)) 1 2) 4 5 ; Here we curry the function above 6 ; We have 2 levels of lambda nested together 7 (lambda (x) 8 (lambda (y) 9 (+ x y))) 10 11 ; Think of it as applying the arguments in stages 12 ; Stage 1 -- this returns a function 13 ; It's as though you had written 14 ; (lambda (y) 15 ; (+ 1 y)) 16 17 ((lambda (x) 18 (lambda (y) 19 (+ x y))) 1) 20 21 ; Stage 2 22 (((lambda (x) 23 (lambda (y) 24 (+ x y)))1) 2) 25
Listing 6 should help clarify why there are more parentheses surrounding the lambda in listing 5. But listing 5 is far from complete. The "function" that looks like the original length function is still there. We need to extract that as well. Here is what we will be working on from now on. Notice that all we did is just remove the application of our function to the list '(a b c d e)'
1 ((lambda (f) 2 (lambda (l) 3 (if (null? l) 4 0 5 (+ 1 ((f f) (cdr l)))))) 6 (lambda (f) 7 (lambda (l) 8 (if (null? l) 9 0 10 (+ 1 ((f f) (cdr l)))))))
Look carefully at Listing 7, lines 6-10 are being repeated twice. That repetition is just hinting for some refactoring. What we are going to do is extract lines 6-10 and pass it in as a parameter. Here is how we do it:
1 ; This is the function that calls the parameter twice. 2 ; We call this parameter mklength since it is 3 ; used to "make the function length" 4 ; This is essentially what we did above. 5 (lambda (mklength) 6 (mklength mklength)) 7 8 ; Here is how we apply it 9 ((lambda (mklength) 10 (mklength mklength)) 11 (lambda (f) 12 (lambda (l) 13 (if (null? l) 14 0 15 (+ 1 ((f f) (cdr l))))))) 16 17 ; As a sanity check, here is to test that it still works 18 (((lambda (mklength) 19 (mklength mklength)) 20 (lambda (f) 21 (lambda (l) 22 (if (null? l) 23 0 24 (+ 1 ((f f) (cdr l))))))) 25 '(a b c d e
Before the next step, we are going to do a simple transformation that preserves identity. It's kind of like substituting (a + b)2 = a2 + 2ab + b2. This step is important because of something called applicative order evaluation which is the default evaluation in Scheme. In a nutshell, applicative evaluation will try to evaluate all the arguments to a function first before actually calling the original function. For instance, (f (+ 2 4) (- 5 3)) will evaluate both (+ 2 4) and (- 5 3) regardless of whether or not the function f actually uses them. This is the case if f is defined as (define f (lambda x y) (if (= x 6) 6 y)). Since (+ 2 4) is actually 6, there is no need for us to evaluate the second argument, (+ 5 3). Our transformation is necessary so that we do not end up in an infinite recursion step when Scheme tries to evaluate the arguments first. For more information about the order of evaluation, please visit this link. Anyway, here is the transformation we are going to do:
1 ; If f is a function of one argument, then 2 ; the following is equal to f 3 (lambda (x) (f x)) 4 5 ; Similarly, if (f f) returns a function of one argument 6 ; Then (f f) is identical to 7 (lambda (x) ((f f) x)) 8 9 ; Bottom line, wrapping a one-argument function in a lambda 10 ; does not change that function. Here is another example 11 (define add1 12 (lambda (x) 13 (+ 1 x))) 14 15 (add1 2) 16 17 ; Here we wrap the functiona add1 in a lambda first and then 18 ; evaluate it. They both yield the same answer. 19 ((lambda (x) (add1 x)) 2)
This is what listing 8 looks like after we have performed the transformation. Also notice since I have defined add1 I will be using it from now on in lieu of (+ 1 ...).
1 ((lambda (mklength) 2 (mklength mklength)) 3 (lambda (f) 4 (lambda (l) 5 (if (null? l) 6 0 7 (add1 ((lambda (x) ((f f) x)) (cdr l)))))))
We are almost done. As our final step, we are going to extract the function that looks like length (a very bastardized copy of length, albeit one that still maintains the structure of length). Follow closely what we are doing to do. We are going to remove ((lambda (x) ((f f) x)) and pass it in as a parameter. How do we do that? We surround lines 3-7 with another lambda that takes one single argument. That argument will be ((lambda (x) ((f f) x)). Here is what it looks like:
1 ((lambda (mklength) 2 (mklength mklength)) 3 (lambda (f) 4 ((lambda (length) 5 (lambda (l) 6 (if (null? l) 7 0 8 (add1 (length (cdr l)))))) 9 (lambda (x) ((f f) x)))))
Remember, all we are doing it just extracting ((lambda (x) ((f f) x)) by passing it as a parameter. Notice how lines 4-8 now look like the original function length that we had earlier. Finally, here is the end product.
1 ((lambda (length) 2 ((lambda (mklength) 3 (mklength mklength)) 4 (lambda (f) 5 (length 6 (lambda (x) ((f f) x)))))) 7 ; This part gets passed in as the argument length in line 1 8 (lambda (length) 9 (lambda (l) 10 (if (null? l) 11 0 12 (add1 (length (cdr l)))))))
And now, we can generalize lines 1-6 from listing 12 above. That is the Y-combinator that we were seeking. Here it is in all its glory.
1 ; Here is the Y-combinator 2 3 (lambda (function-to-make-recursive) 4 ((lambda (mkfunction) 5 (mkfunction mkfunction)) 6 (lambda (f) 7 (function-to-make-recursive 8 (lambda (x) ((f f) x)))))) 9 10 ; And here is how we use it to make the factorial function 11 ; and also apply it to some argument 12 13 (((lambda (function-to-make-recursive) 14 ((lambda (mkfunction) 15 (mkfunction mkfunction)) 16 (lambda (f) 17 (function-to-make-recursive 18 (lambda (x) ((f f) x)))))) 19 20 (lambda (factorial) 21 (lambda (x) 22 (if (zero? x) 23 1 24 (* x (factorial (- x 1))))))) 25 26 5) 27 28 ; If you wish, you can even define the Y-combinator 29 30 (define Y 31 (lambda (function-to-make-recursive) 32 ((lambda (mkfunction) 33 (mkfunction mkfunction)) 34 (lambda (f) 35 (function-to-make-recursive 36 (lambda (x) ((f f) x))))))) 37 38 ; And use it in the future to make more recursive functions.
Hopefully, what I have writing has made some of the aspects of the Y-combinator clearer for some of you. In the process of writing this, I myself have become better acquainted with Scheme's parentheses. And I am now definitely more comfortable with doing the transformations now and extracting common code out from functions. Of course, I feel even more comfortable if I can do those extractions in an application like DrScheme that shows the boundaries clearly.
Links to other interesting articles discussing the derivation of the Y-combinator:
- Y-combinator - by Jim Marshall
- What is the Y-combinator at Spark's Pensieve
- Deriving the Y-combinator in Ruby
- The Little Javascripter
Scheme-ing
Nice Spring Break reading: The Little Schemer and The Seasoned Schemer. The Little Schemer is just like The Little Lisper with all the exercises removed. I read The Little Lisper last summer and would like to refresh my memory on some Lisp/Scheme since I have not been using it seriously for a while. I think the authors might have revised some of the contents but there was nothing major. Looking forward to getting the third installment of this series, The Reasoned Schemer.
What is a nice environment for trying out scheme on OS X? There are a couple of implementations as listed here. DrScheme is the only one so far that runs on OS X with a nice little IDE and some help files. The only other one that seems viable is SchemeWay but hooks itself up as a plug-in for Eclipse. And I really do not feel like starting up Eclipse every time I want to try something.
An interesting feature that DrScheme offers -- among the other innovative features that I will not be using yet -- is the ability to embed test cases inside the code. While this might seem like a trivial thing, DrScheme has a nice facility for doing so as can be seen from the screenshot below.
The list of features for DrScheme can be found here.
Revival: Ruby + Lisp
"First, I'm sure you've read a lot of the ruby-lisp equivalence discussions over the weekend. This is great stuff, I hope it gets deeper and we can find some good middleground. The way I feel about Lisp is the way I feel about socialism. Neat ideals and I really wish they worked on humans!"
I will definitely want to read at least two of the pages bookmarked on the link above. This is definitely an interesting revival of my interest in ruby+lisp -- not that it ever died or anything but it must have diminished during the past few months as I ventured into other stuff. Heck, if this ruby+lisp thing is interesting enough, I might even be able to write a thesis on it.
Errr... so why the post? Nothing much yet, just reserving a place for it here so that I do remember to come back and write about it.
Lisp exercises
My temporary foray with Lisp will take a short break. Previously, I had hoped that my Artificial Intelligence class will be utilizing Lisp as the language of choice for its machine problems. Alas, it did not come to be. Instead the class is wallowing in the revolting scent of Java. Well, not that it matters much since I have officially dropped the class after being nettled by the lecturer's use of catechism to teach.
So, here are the source files for the lisp code that I have played around with. Most of these are from the chapter exercises in the book Little Lisper by Daniel Friedman. This book is no longer in print but you should be able to obtain one from your library. Here are the files, nicely formatted in html for anyone's perusal:
And here are come code from two chapters which I found particularly interesting:
Interesting site: Lisperati
Finally finished going through the Casting SPELs in Lisp tutorial. Really interesting to find something along the lines of why's poignant guide to ruby complete with its own illustrations and metaprogramming (macros in Lisp). It is also similar to what Python Programming for the Absolute Beginner does for Python.
Another next thing about this tutorial is the use of Allegro Common Lisp via telnet. This would make it really convenient for people who do not wish to install Lisp on their machines.
It gives a nice introduction to what you can do in Lisp. However, I am not sure if someone who does not know Lisp (or any other functional language) can appreciate this tutorial. The author, Conrad Barski, seems to skim over some thing that I view as important. For instance while using mapcar he explains it tersely as:
"mapcar simply applies another function to every object in the list..."Most programming languages do not provide the luxury of using something similar to
mapchar as an underlying characteristic of the language. You could hack it in but most people would not do it this way since the language itself does not expose this functionality.
Conrad's explanation of macros is a bit short too. A bit too short for people to actually grasp the power of it. He also makes defspel as an alias to defmacro because he believes that most people who think they know macros are being delusional about it. Or as he says it:
"Often, when I try to explain the concept of macros to somebody who has only used other languages, I'll get a response like "Oh yeah! There's macros like that in C++, too!". The moment this happens, it becomes very difficult to explain "true macros", because of the semantic load on the word "macro". After all, "true macros" really are a lot like C++ macros, in that they are a way to talk to the compiler with modified code..."
I have not been using Lisp long enough to decide whether the liberal use of list (this sounds a bit ironic since Lisp is all about lists!) to store information is a good idea. For this simple example it seems decent enough since you can easily remember that the second list in the location list tells you where to go next. If we were to do this in Java, using arrays to store such things could easily put you as having Primitive Obsession - not using objects for "little" things as coined by Martin Fowler in Refactoring.
Anyway, this short tutorial did expose me to some Lisp concepts which I have not encountered in the Little Lisper (one of the best books possible on Lisp, I think). And it was rather fun to see how one would use Lisp to create a text based game.
Video tutorials: New Way of Learning Programming Languages and Frameworks?
My first encounter with video tutorials was actually with the original Ruby on Rails movie. This version did not have sound at all. Instead, whatever important things that needed to be said was done by typing text into the terminal for the viewers to read. For me, this sort of increased its geekiness level. True, sites such as Lynda.com have been offering video tutorials for many years now. However, (let me rephrase that) my first encounter with free, open-source technology was with Ruby on Rails.
Since then, many have jumped on the bandwagon. Personally, this seems to be a good move. The simplest common sense reason being that you can explain so much more using pictures and videos. It's actually quite hard to understand the common nomenclature that a group of developer uses while dealing with their framework. Secondly, (also another common sense reason), videos can create an great first impression. Joel Spolsky says in his book Joel on Software that even if you have the world's greatest program, you will lose to the other competitor if you do not have enough screenshots. Yup, screenshots are that important.
Some of you might snigger now and think that people who fall for screenshots are nothing more than gullible people who eat hype and drink the kool-aid. Not true. In a field where there are so many competitors, gaining a lasting first impression is really important. And, not every technology out there can do it. I doubt that you could create a Java powered blog within 10 minutes! So, as you can tell, the length of the presentation also plays an important role in determining how popular it will be. Make it too long and people actually lose interest in it all together. Too short, and you leave your viewers wondering is that it?
I guess video tutorials are not limited to teaching people about programming languages and frameworks. Some companies are also using them to teach their users how to use the new technologies. Now, this is an excellent thing because it really shows that you want your users to use those tools. 37signals uses videos for all their web applications.
If you are looking for tools to make video tutorials, here are some:
So, what video tutorials out there are worth watching? Here is my list:
I just found several of them today about Lisp so I have yet to watch those.Also, if you are looking for some free videos, this Del.icio.us feed has them. Thanks to LifeHacker for the tip.
Learning LISP
After reading Paul Graham's essays and his bias in favor of Lisp, I finally joined in the bandwagon. Surprisingly, WikiPedia has this to say about Lisp:
Having declined somewhat in the 1990s, Lisp has experienced a regrowth of interest since 2000, partly due to the writings of Paul Graham.I never knew that he had that much influence. Well, anyway, I picked up a copy of the Little LISPer from the library. Unfortunately, this book is no longer available for sale (it has been replaced by the Little Schemer, which I hear has about the same contents). Now, the Little LISPer is an interesting book. Instead of having page after page of words, it has page after page of conversations! Each page is formatted into two columns, one for the dialogue of the teacher and the other for the student. And their conversation revolves around LISP, recursion (lots of it) and food! For a sample of what these conversations sound like, I direct you to Brian Marick's implementation of this style of writing for Ruby.
LISP seems to be an interesting language. So far the Little LISPer has focused more on recursion than features of the language, so I am not able to see how to use LISP to do more complex stuff. There are libraries written for it that can do GUI because it was in the tutorial for my IDE. I have also ordered a copy of ANSI Common LISP from the library. Paul Graham promises that it has the best explanation for macros. And it even has 20 substantial examples for showcasing the power of LISP.
Anyway, my motivation for learning LISP is to verify how much of what Paul Graham says is true. He believes that languages are evolving more toward what LISP is. Here is a diagram that shows how Ruby came about. I pilfered it from the book
Game Programming with Lua, Python and Ruby. I sincerely hope that you do not buy this book. I have browsed through it and am really disappointed with its contents. Then again, I have never been pleased with the Game Developer series; the book lacks depth and tries to make up for it by having a CD that is loaded with all kinds of crap. The reviews for it on Amazon agrees with me. But, anyway, here is the diagram:

No doubt, lambda (or Proc, as they are called in Ruby) seems to the common ground for this languages. Not to mention the fact that metaprogramming is an important feature for LISP. One, that paved the way for its popularity in Artificial Intelligence. Why (yes he is called Why or rather Why, the lucky stiff) has written two nice articles on metaprogramming on Ruby: Seeing Metaclass Clearly and A Sponsored Dragon-Slaying.
So, right now, the method that I am deploying for learning LISP falls along the lines of writing little programs and some test cases for them. This methodology is similar to the one suggested by Mike Clark. And for my IDE, I chose LISPWorks Personal Edition as it was recommended on the LISP newsgroup.
Design Patterns and Languages
The original authority on Design Patterns is the Gang of Four book. The Gang of Four includes my Software Engineering professor, Dr. Ralph Johnson. Design Patterns is still a good book and all; however, it focuses a lot more of implementing it using C++. Thus, I bought the Head First Design Patterns book. Except for its startling front cover, Head First Design Patterns is an excellent book. I read through 5 chapters of it last night after the book arrived. The exercises in there are fun, with the exception of the crossword puzzles. Why are there crossword puzzles?
Anyway, from what I have read -- and this is still very preliminary -- most of the design patterns seem to be used for delaying initialization to runtime. That is, all functionality should be freely dynamic and can be added and changed at runtime. Thus, design patterns seem more like little hacks (pardon me for a lack of a better word) to get the static type-based languages to be more dynamic.
I do not think that I am the only one to think this. Peter Norvig has some slides that show how to use design patterns in dynamic languages. In fact, he says that 16 of the 23 patterns discussed in Design Patterns are more easily implemented using Lisp than C++. Needless to say, this is something that I am really interested in finding out more on. And as a side note, this brings me back to what Paul Graham thinks: there is indeed the dream language. It would not do justice to his excellent essay on how he goes on to describe why some languages are indeed better. If you excuse him for his bigotry toward other languages beside Lisp, you might come to agree with him that there is indeed an evolution of languages. New languages tend to be more dynamic and not as type-based. Also, new languages tend to be flexible enough that you are able to define new constructs for the languages so that it models your software better.
Maybe, I am just language-addicted. Dave Thomas, one of the Pragmatic Programmers, mentions in a recent Ruby on Rails podcast that he is interested in languages too. And that he always strives to spend 3 minutes on each new language that he encounters, devoting more time to it if he finds it interesting.
In short, what I would really want to find out: Are Design Patterns modeled after the limitations of the language? Or do their usefulness transcend between languages?
