flickrPreviewR
For the past couple of days I have been creating an application that I think I would use. And maybe some other people might find it useful too (possibly as an example for doing things in RubyCocoa). For the lack of a better name, I decided to call it flickrPreviewR and hopefully I would not be sued.
flickrPreviewR screen. It even has the typical flickr logo design from flickrlogomakr. The line to the right is the the side of a window that apparently shifted a bit to my other screen and included in the image capture.
It all started when Flock released a new version of their browser sometime last week. I downloaded it since I was eager to see what the perpetually-in-beta browser has created this time. This time I was really disappointed. The only reason for my using Flock was its integration with Flickr. I just like to quickly browse photos without having to use Flickr's interface. Unfortunately, this new version of Flock really left a sour taste in my mouth. Its interface is hideous and the newly revamped Flickr integration seems buggy and really slow. What does it need to do that requires that much computation power? It just needs to fetch the damn photos damnit!
Anyway, this provided the motivation and gave me another reason to play with RubyCocoa. The last time I dabbled with RubyCocoa, I created a version of the RaiseMan application from the book Cocoa Programming for Mac OS X with bindings. Since then, RubyCocoa has had additional changes and I was curious whether it is now easier to develop programs for it. My conclusion: nothing much has changed in terms of the cosmetics of language (there are underlying changes for optimizations, etc), but overall there was nothing new to pick up.
For the backend of this application, I relied on rflickr, the Ruby API kit for accessing flickr. I needed to make some slight changes to the code to accommodate some changes in the flicrk API. rflickr source is bunbled as part of the source code. I did want to not rely on it being a gem since I am not sure how RubyCocoa handles gems and I did not want to add a dependency for the gem; so I just bundled it. flickrPreviewR is under the Creative Commons Attributions License. This might violate the rflickr license and if it does, then let me know and I will see what I can do.
You can find the set of images on my flickr account. I do not have a pro account so I cannot create a set for it. You should read the description of each photo if you are planning on using this. As usual, use this at your own risk. I am doing this as a personal application that I would use and as the developer, I am aware of its limitations and I do not push them while using the application.
One very important limitation that I need to stress is the inability for it to update itself.
There is a weird quirk with using threads in RubyCocoa and NSNotifications. I have not figured out how to get the view to refresh once all the images have been downloaded. Right now, there is the VERY cumbersome need to click on ANOTHER favorite or photoset to force a refresh. The best way to see if the photos have finished loading is to view the progress bar at the drawer.
THIS NEEDS TO BE FIXED once I figure out how.
Here are some things that I picked up while doing this:
- Interface Builder is a great tool for quickly laying out the interface. I cannot imagine how I would have done this with some other GUI toolkit. Even with my familiarity with Java, it would taken much longer just to get all the pieces aligned properly.
- CocoaDev is the site for almost all the questions that you can have on how to do things in Cocoa. Without CocoaDev, it would have taken me a longer time to get things done.
- Cocoa Programming for Mac OS X is a decent reference book once you know your way around how certain things in Cocoa work. That still does not make it a good book to start learning Cocoa programming.
- You can save yourself great pains if you make all your classes subclasses of OSX::NSObject explicitly instead of normal Ruby classes. This will save you trouble when you get weird behavior and crashes. Like me, you might want to use normal Ruby classes sometimes because YAML works well with the regular Ruby classes. In that case, make sure that your class is being used as simple data structure with simple functions that do not rely on any of the Cocoa libraries.
- Threading is hard to do with RubyCocoa and there is little reference on what works and what does not.
- Downloading images with Ruby net/http library did not work for me. The image will always end up corrupted. Thus, I had to rely on curl to download the images.
- WebKit is an excellent piece of work. I used it as the primary backend for displaying the images. To intercept the handlers, I made each preview image be a link an imaginary 'flickr://' protocol. I then overwrote
webView_decidePolicyForNavigationAction_request_frame_decisionListenerto intercept the call and perform the appropriate actions.
For those who are interested, the source code is here. I did not package it as a stand-alone application because I am not sure what you actually need to run this application. You probably need RubyCocoa and the Ruby 1.8.5. So for simplicity, I just provided the project folder for XCode. If you can build it then you can probably run it.
Meta-Object Protocol implementation for programming languages 1
The past week I had the opportunity to play around with Groovy. Groovy is an "agile dynamic language for the Java Platform". In short, you can think of Groovy as adding dynamic features to the Java language with some syntactic sugar to the Java syntax. Like JRuby and Jython, Groovy runs on the JVM and plays well with Java code.
Groovy offers functionality similar to that of Ruby, Python and other dynamic languages. Of course, Groovy has its own way of doing things but some of the most interesting things that I have seen are:
- Integrated support for XML processing directly via GPath (akin to XPath)
- Support for multimethods. The type of an object is determined at run-time. This leads to the interesting behavior that even if you write
Object foo = "Hello", foo actually has the typeStringinstead ofObject. - Option to declare the type for variables. Though not required, you have the option of declaring the static type of an object. The only other system that I have seen that offers this option is Strongtalk.
While Groovy is certainly an interesting and useful language (personally, any language that reduces the verbosity of Java is useful), what is more interesting is the way in which the dynamic features are done. Groovy relies on an internal implementation of the Metaobject Protocol. As far as I know, the term Metaobject Protocol was first made popular in the book The Art of the Metaobject Protocol. You can read more about this concept from any of the previous links but in a nutshell, designing a programming language so that it conforms to some Metaobject Protocol is a good thing. It allows the programmer (and not only the designer) to change the semantics of the language so that it can not only modify the current features of the language but also incorporate new features easily.
This might seem like an absurd thing to do if you are used to programming in more traditional languages like C which does not even offer the ability for reflection. However, the ability for actually modifying the programming language itself is becoming more useful. Modifying does not only mean adding some form of syntactic sugar on top of the existing language but also the ability to add new features to the programming language itself. For instance, the foreach control structure could be viewed as some form of syntactic sugar. foreach could be easily implemented as a macro that is expanded before execution into the normal for loop or while loop. On the other hand, adding object-oriented features on top of a "non-object-oriented" language would be viewed as adding a new feature (and not just syntactic sugar) to the language itself. This was what was done with CLOS in Lisp.
One could always argue that there is no need for such a feature to be an inherent part of the programming language. A valid argument against implementing something like the Metaobject Protocol is the issue of speed and performance. Having the infrastructure to modify the behavior of the programming language is going to have some form of overhead on the execution of the program. Furthermore, the ability to change the programming language would lead to problems with a proliferation of different versions of the same programming language with subtle differences. Moreover, one might also argue that if a feature is that important, why not just add it in the next version of the programming language. That is exactly the case with Java and C#; newer editions keep adding features to the language.
While those arguments are certainly valid, the ability to be able to modify the semantics of the programming language should not be discounted. As computers get faster, the speed overhead from having the Metaobject Protocol built-in to the language is negligible save for the most performance intensive machines. And how many people actually want to wait for the designers of the programming language to add some new feature? The desired feature might not be important for everyone and might not be added to the core of the language. Should that be a reason why it is not there if your application actually requires it? For instance, as the ability to express domain knowledge clearly in programming languages become more important, more and more forms of domain-specific languages are surfacing. These domain-specific languages provide a concise way to express essential concepts of the domain which might otherwise be hidden through the syntax and semantics of a traditional programming language. No sane programming language designer is going to design multiple domain languages and give them multiple names such as C-Telecommmunication, C-Accounting, C-Transactions1.
While the idea of having a full-fledged Metaobject Protocol might be too idealistic for some, some of the newer ideas in the software engineering world are taking advantage of a more moderate form of it. The designers of programming languages can certainly vary the degree of dynamic behavior. For instance, Ruby does not have a fully modifiable underlying semantics but it does provide enough for programmers to accomplish most of what needs to be done.
In conclusion, I feel that more and more languages would incorporate some form of the Metaobject protocol. Dynamic languages are becoming increasingly prominent in the software engineering world today and the logical next step would be to expand on their dynamic behavior to improve on the extensibility2. It should be possible for both programming language designers and programmers to modify and/or features to the programming language in a hygienic way without resorting to ugly hacks.
For an example use of the Metaobject Protocol in Groovy, take a look at Pratically Groovy: Of MOPs and mini-languages.1 An alternative would be to design your own language from scratch if it is simple enough. Tools like Antlr and even Maude have made this process less painful. Even then, designing a language from scratch -- albeit being fun -- requires a lot more devotion than modifying an existing language.
2 Some features might also be extremely hard if not impossible to add to an exiting system whether or not it has an Metaobject Protocol. For instance, (I think) having the ability to do multi-stage programming might require extensive changes to the language itself and is not accomplishable with just some hacking on the language itself.
Meta-Object Protocol implementation for programming languages 1
The past week I had the opportunity to play around with Groovy. Groovy is an "agile dynamic language for the Java Platform". In short, you can think of Groovy as adding dynamic features to the Java language with some syntactic sugar to the Java syntax. Like JRuby and Jython, Groovy runs on the JVM and plays well with Java code.
Groovy offers functionality similar to that of Ruby, Python and other dynamic languages. Of course, Groovy has its own way of doing things but some of the most interesting things that I have seen are:
- Integrated support for XML processing directly via GPath (akin to XPath)
- Support for multimethods. The type of an object is determined at run-time. This leads to the interesting behavior that even if you write
Object foo = "Hello", foo actually has the typeStringinstead ofObject. - Option to declare the type for variables. Though not required, you have the option of declaring the static type of an object. The only other system that I have seen that offers this option is Strongtalk.
While Groovy is certainly an interesting and useful language (personally, any language that reduces the verbosity of Java is useful), what is more interesting is the way in which the dynamic features are done. Groovy relies on an internal implementation of the Metaobject Protocol. As far as I know, the term Metaobject Protocol was first made popular in the book The Art of the Metaobject Protocol. You can read more about this concept from any of the previous links but in a nutshell, designing a programming language so that it conforms to some Metaobject Protocol is a good thing. It allows the programmer (and not only the designer) to change the semantics of the language so that it can not only modify the current features of the language but also incorporate new features easily.
This might seem like an absurd thing to do if you are used to programming in more traditional languages like C which does not even offer the ability for reflection. However, the ability for actually modifying the programming language itself is becoming more useful. Modifying does not only mean adding some form of syntactic sugar on top of the existing language but also the ability to add new features to the programming language itself. For instance, the foreach control structure could be viewed as some form of syntactic sugar. foreach could be easily implemented as a macro that is expanded before execution into the normal for loop or while loop. On the other hand, adding object-oriented features on top of a "non-object-oriented" language would be viewed as adding a new feature (and not just syntactic sugar) to the language itself. This was what was done with CLOS in Lisp.
One could always argue that there is no need for such a feature to be an inherent part of the programming language. A valid argument against implementing something like the Metaobject Protocol is the issue of speed and performance. Having the infrastructure to modify the behavior of the programming language is going to have some form of overhead on the execution of the program. Furthermore, the ability to change the programming language would lead to problems with a proliferation of different versions of the same programming language with subtle differences. Moreover, one might also argue that if a feature is that important, why not just add it in the next version of the programming language. That is exactly the case with Java and C#; newer editions keep adding features to the language.
While those arguments are certainly valid, the ability to be able to modify the semantics of the programming language should not be discounted. As computers get faster, the speed overhead from having the Metaobject Protocol built-in to the language is negligible save for the most performance intensive machines. And how many people actually want to wait for the designers of the programming language to add some new feature? The desired feature might not be important for everyone and might not be added to the core of the language. Should that be a reason why it is not there if your application actually requires it? For instance, as the ability to express domain knowledge clearly in programming languages become more important, more and more forms of domain-specific languages are surfacing. These domain-specific languages provide a concise way to express essential concepts of the domain which might otherwise be hidden through the syntax and semantics of a traditional programming language. No sane programming language designer is going to design multiple domain languages and give them multiple names such as C-Telecommmunication, C-Accounting, C-Transactions1.
While the idea of having a full-fledged Metaobject Protocol might be too idealistic for some, some of the newer ideas in the software engineering world are taking advantage of a more moderate form of it. The designers of programming languages can certainly vary the degree of dynamic behavior. For instance, Ruby does not have a fully modifiable underlying semantics but it does provide enough for programmers to accomplish most of what needs to be done.
In conclusion, I feel that more and more languages would incorporate some form of the Metaobject protocol. Dynamic languages are becoming increasingly prominent in the software engineering world today and the logical next step would be to expand on their dynamic behavior to improve on the extensibility2. It should be possible for both programming language designers and programmers to modify and/or features to the programming language in a hygienic way without resorting to ugly hacks.
For an example use of the Metaobject Protocol in Groovy, take a look at Pratically Groovy: Of MOPs and mini-languages.1 An alternative would be to design your own language from scratch if it is simple enough. Tools like Antlr and even Maude have made this process less painful. Even then, designing a language from scratch -- albeit being fun -- requires a lot more devotion than modifying an existing language.
2 Some features might also be extremely hard if not impossible to add to an exiting system whether or not it has an Metaobject Protocol. For instance, (I think) having the ability to do multi-stage programming might require extensive changes to the language itself and is not accomplishable with just some hacking on the language itself.
Rubyfication of Raise Manager
I spent a couple of hours transforming the RaiseMan application from Cocoa Programming for Mac OS X (2nd Edition). In the RubyCocoa examples folder, there is a version of this but it is based on the first edition of the book. My version includes key-value-binding, undo and redo and also alert panels. I also implemented some of the end-of-chapter exercises that I felt were useful. I skipped the part on Localization though.
Rereading the book makes me think of how much I do not like it. There is very little rationale behind each of the examples. Most of the time, the author just says do this or that. And his anecdotes are pretty annoying (I am not sure where he got his stories from). I felt that the book could have been better if the author spent more time explaining why things are done that way instead of listing the API and describing what it does (it's almost identical to the documentation).
Things actually made more sense to me this time around because I was exposed to some design patterns and could see the rationale behind the way of doing things. I am not sure if a beginning programmer would appreciate the way of doings things just from reading this book. I have heard better things about Cocoa Programming but that book is old and has not been updated. I have not read that book yet so I cannot offer my opinions on it.
Anyway, here are somethings I learned from this effort that may be useful to people:
- Before starting, you should read this page on RubyCocoa to get acquainted with the conventions. You can choose to use
somemethodwithargument0_withargument1_withargument2(arg0,arg1,arg2)orsomemethodwithargument0(arg0, :withargument1, arg1, :withargument2, arg2). Once you have decided on one, it is best to stick with it. - Since you cannot drag-and-drop your MyClass.h (we don't have one, we only have MyClass.rb) into Interface Builder to get it to update any new ib_outlets, you need to do this by hand. The easiest way I can think of is to click on the "Classes" tab on the nib file, locate your MyClass and right click to add actions or outlets. That being said, Interface Builder is pretty good for creating user interfaces. The separation between code and user interface is pretty clean and it is not too hard to get things to work the way you want it to.
- For key-value-binding to work on an array, use
kvc_array_accessor. For an example, look at my MyDocument.rb file. More information on key-value-bindings can be found in oc_import.rb in the RubyCocoa source. - Always, always, always, build and run regularly. There is virtually no good debugging support for RubyCocoa. Sometimes the error message can tell you which file (and in the optimistic case, which line the error occurred at). But in general, it is going to be cryptic. By testing early and frequently, you can at least narrow the error down to the last edit that you made.
- Remember to qualify your ib_outlets when you used them with the '@' symbol. For instance, if you have
ib_outlet :some_objectthen in your methods, you refer to that object as@some_object. I am not sure why I keep forgetting this but it has been the cause of many problems. - Remember to always prefix Cocoa classes with
OSX::. You can avoid this by usinginclude OSX. Also be careful that you check the spelling for the Cocoa classes (you need the NS prefix, etc). Misspellings have bitten me quite a few times. - Read oc_attachment.rb in the RubyCocoa source code to find out how you can use Ruby idioms like [], []=, etc for accessing arrays and dictionaries. Also decide if you want to use those notations or just stick with
objectForKey(), etc. - There are some idiosyncrasies with
OSX::NSRunAlertPanel. You cannot do Ruby string substitution in the arguments. If you need string substitution, you can do it using the special @ symbol as such:choice = OSX::NSRunAlertPanel("Delete", "Do you want to delete %@ records?","Delete","Cancel",nil, selected.size).
One problem that I had was that I was not able to build it for release. I had to build it for debug. I think there could be something wrong with the way the project is setup. Anyway, the file is available from here.
Update: I just installed RubyCocoa 0.9 from Subversion. Instructions can be found here. I can now successfully built it as a universal binary. The next test I did was to run this RaiseManager application. I was greeted by a successful build... but the application could not create new employees. The error logs report that there is something wrong with the NSUndoManager but I suspect that it has something to do with key-bindings since there is supposed to be some change to how that is done in RubyCocoa 0.9. I will have to take a look at that. On the bright side, RubyCocoa is approaching the 1.0 mark after so many years!
Update (Jan 3, 2007): The latest version from the Subversion repository has addressed the issues that I reported above. The current working version (revision 1325) was checked in today by Laurent. Suffice to say that there are major additions from RubyCocoa 0.5 that are worth checking out. It would be good to see how RubyCocoa plays with the new Objective-C 2.0 that is included with Leopard. On a side-note, there is a new Ruby/Objective-C bridge out by Tim Burks here.
ri/fastri with rubygems
For some time, I was not able to get ri to read the documentation for the gems that I have installed. There seems to be multiple ways of doing this. I found this site that gives details on the common ways. Some gems are kind enough to automatically install the documentation files by themselves. This makes it possible for ri to detect the documentation for that gem without any further configuration. However, the gems I am most interested in --the Rails gems-- do not do so automatically.
The simplest way to get documentation for all your gems is to run sudo gem rdoc --all. Though this is the simplest method, it is also the one that is most fragile. When you run that command, it must finish the process of generating all the documentation for all your gems. A rogue gem that is improperly configured can mess the entire process up. I discovered this the hard way: the terminal would echo the fact that the documentation for the relevant Rails gems (activesupport, activerecord, etc) have been generated but because there was a failure toward the end, none of them were accessible. Most people will be able to run this command just fine and get the documentation for all the gems. However, when it fails, you can try the method below.
What I am proposing is a more conservative route for generating the documentation: Generate the documentation for each gem on a as-needed-basis. If the gem can automatically generate the documentation for you then you need not run this step at all. For most people, all they need is the documentation for Rails. And if you are using TextMate, having the documentation detected by ri can be most convenient.
Here is what I did. Open two terminal windows. On one do gem list. This gives you a list of all the gems that you have installed. On the other terminal window, do sudo gem rdoc [name_of_gem] where you replace [name_of_gem] with the gem that you are interested in. You can use wilcards such as active* to match activerecord and activesupport. Do this for the gems that you are interested in. This way you avoid generating redundant documentation for gems that are installed as dependencies for other gems.
Finally, if you are still using ri, you might want to take a look at fastri. fastri is supposed to be much faster and more intelligent in its searching capabilities. If you are using fastri, you might need to rebuild the index for the server after generating the documentation. This can be done by running fastri-server -b.
RubyCocoa
Since I had some free time this week, I wanted to brush up on some Cocoa programming. However, I have not been using Objective-C for some time and would rather not have to program in it. Instead, I decided to use Ruby via the RubyCocoa. I have always had great difficulties installing RubyCocoa on my machine. It never seems to just work. I recall getting version 0.4.3 to work somehow over Spring but the steps seem to elude me now.
I tried searching on the web but there was no clear answer. Even the installation instructions for RubyCocoa seems incomplete (and might even be auto-translated from the Japanese version). Anyway, I downloaded the source code for version 0.5 and follow the instructions to do ruby install.rb config and ruby install.rb setup. The installation failed -- it was not able to locate some of the symbols.
So I reread the instructions on the website. There was some mention of using Ruby 1.8.5. and I only had version 1.8.4 installed. Since Ruby 1.8.5 has been out for some time, I decided to upgrade my version. There are various discussions on what needs to be done to get Ruby 1.8.5 to compile on OS X but I stuck with the instructions from HiveLogic and just substituted the latest version Ruby into the instructions. Everything works fine, even readline. I had to recompile RMagick though since the links were broken after the upgrade. Some might also find the instructions here useful for upgrading.
I followed the steps on the RubyCocoa website and tried to build RubyCocoa. It still was not working. Then I realized something. I was actually issuing the following command: sudo ruby install.rb config --build-universal. I removed the build-universal just to try the default settings. And it WORKED! I am not sure how the developers themselves got the universal binary to build but it was not working on my machine. Right now, RubyCocoa is not a universal binary but that matters little. At least it is finally working.
It's always the simple things that make life so complicated. It also shows that one should always try the default values for configure. After all this, I really think I should have just stuck with rereading some Objective-C. RubyCocoa is a very nice framework but it takes a lot of guess work to get it working. However, as a consolation, I finally got Ruby 1.8.5 installed.
RubyOSA 1
RubyConf 2006 had a lot of interesting new ideas for Ruby. There is a nice summary of the main points here on the InfoQ website. One feature that actually caught my eye was how Apple was taking part in RubyOSA.
"RubyAEOSA is a Ruby wrapper for Apple events, that was started in 2001, but not active since 2003. The code required is unnecesarrily verbose. Instead, you could wrap and execute with AppleScript, but it's slower and limited by knowledge of AppleScript.
RubyOSA is a new project created by Apple, intended to be a successor to RubyAEOSA, under active development and used today. It has a much more Rubyish API, generating Ruby code on the fly and sending events lazily. Apple events are completely hidden."
This is an interesting thing for me because I have always wanted to interact more with the applications on OS X but I really do not like the syntax of AppleScript nor the tools that are provided to support it.
A side effect of this is the realization that Apple and other companies such as Sun and Microsoft have actually taken a very strong interest in the Ruby community. Java 6.0 is supposed to have scripting support built-in. However, seeing how the Mac version of Java always lacks behind its other counterparts, we probably will not be using much of this yet.
All the more why I should really try to do some research in this language. Since the idea of refactoring for Ruby is partially take by the RDT team, I might consider doing something in metaprogramming refactoring. That seems like a new field and should be interesting to venture into.
Ruby Manual Kernel-y
"ryan: You know how Kernel.p is a really convenient way to dump ruby structures? The only downside is that it's not as legible as YAML.
_why: (listening)
ryan: I know you don't want to urinate all over your users' namespaces. But, on the other hand, convenience of dumping for debugging is, IMO, a big YAML use case.
_why: Go nuts! Have a pony parade!
ryan: Either way, I certainly will have a pony parade."
I just found this out while browsing the methods in module Kernel in Ruby. In my opinion, this is not even funny. In fact, it is a very inconsiderate thing to do in terms of documentation.
Refactoring support for the Eclipse
It's amazing what results you can miss even if you search with Google. It's all a matter of what you are searching for. And which website you actually search for.
After some thought, I figured that it would be best to take a look at how the RDT: Ruby Development Tools are doing. So far they are one of the few open source projects that are moving along pretty well. They have some nice integration with the Eclipse project. And like it or not, the Eclipse platform offers one of the better environments for developing IDEs if you do not wish to waste too much time designing a GUI and all that. There is some pain involved with using Java and all that , but Eclipse itself can help you write Java in a more productive fashion.
Anyway, here is the thread in point. If you click on the link, they do have a decent Trac website for it. I have not downloaded the project yet but it seems that there has not been much buzz generated about this project. There was nothing much on the mailing list. And I don't think it is a good idea to point this project out on the Ruby mailing list since it is their project and they decide what to do. I have downloaded their nightly .pdf file describing what they are doing. We shall see how much they have covered. The table of contents does seem to address some of the main issues that I am concern with. And they claim to have done the Rename Local Variable (this is not the same as the more general Rename that goes look for all occurrences and finds out which are safe to replace) and Push Down Method. They have a few other of the normal refactorings that Martin Fowler discusses in his book.
After reading a bit more, it seems that this was a term project for the students. It was to be a fourteen week project but they are going to take it further and do it as a diploma thesis. All in all, I am pretty impressed with this project. Here is a group of developers that are doing all the "best practices" of software engineering. They have a repository, bug tracking, milestones, auto-generated integration tests and lots of unit tests.
Well, if they have gotten a refactoring tool for Ruby, then it means that I probably have to find something else to do. That is one of the pitfalls of choosing an "in" language. Almost everyone would want to get a hand in it.
Anyway, I am going to check out the source code for the RDT into a new Eclipse workspace. Hopefully I remember enough of the Eclipse plug-in development paradigm to understand the code. I left all my Eclipse books back home when I went back for the summer. The nice thing this time though is that I can develop on my Macbook Pro. Enough RAM, plenty of hard disk space and a speedy processor. And of course, Java 1.5 is now working nicely on OS X.
Parsing Ruby
SapphireSteel :: IntelliSense and Parsing Ruby:
"However, it did strike me that what I was doing was generating a successive approximation to a Ruby parser. The actual C Ruby parser is a combination of Matz's meanderings and bug fixes over the years - and it's neither pretty nor elegant. I can get closer and closer to the real thing but the amount of effort required to do that increases. For example, the one construct I won't deal with at present is nested here documents. For various technical reasons, they are hard hard work to implement. And how many people use them, anyway? I've only come across two examples so far in all those 2600 files."
Refactoring for Ruby: A solution to the first component?:
"The great thing with ParseTree is that it's guaranteed to parse all Ruby code as it actually steals the AST directly from the interpreter. The problem with ParseTree is that it discards everything that is not interesting to actually executing the Ruby code. Things like formatting, comments and so on may not be interesting to the Ruby interpreter but it certainly is interesting to humans reading and writing the code. So we don't want to loose all that interesting (to us) stuff when we execute a refactoring."
So far, there are two solutions to the parsing question. First, use ParseTree. But as stated above, your lose all the useful information such as comments and formatting. The other would be to use Antlr, write your own little grammar that would preserve those comments and formatting. I have not used Antlr before, and like the first quote above mentioned, getting the entire grammar for Ruby to work is not easy. There are some weird nuances that need to be detected.
I know that some people are working on the Antlr grammar for Ruby. However, the project seems pretty abandoned. I might be able to get something out of it as a starting point though. For a fairly large language like Ruby, the cases are rather extensive.
The other possibility would be to look at JRuby's implementation of a parser and see what we can get out of that. I really do not want to have to write the grammar for Ruby.
Update: Seems like I found something that might be useful: RubyFront: Ruby parser powered by Antlr. And they have been kind enough to include the ruby.g file that contains the grammar. And the good news, it parses everything, including the notorious here docs.