April NoVA Ruby Users Group - Sean Mountcastle

Tonight we held the April meeting of the Northern VA RUG. Ray Daly presented RSS and novel uses for it while Paul Stadig discussed various methods of screen scraping. Ray provided an overview of RSS, examples of widespread usage (news feeds, monitoring, podcasts, etc) and then spoke about how RSS could be used to tie together disparate systems. He wrapped up with a Rails demo showing how to create RSS feeds and a brief discussion of the available libraries for creating and consuming RSS/Atom. Paul’s presentation on screen scraping started with a discussion of why you’d want to gather data in this way (most of the interesting data lies in the ‘deep web’ where there is no API/RSS to extract the data easily). He then gave an overview, with example code, of N tecniques: POOR (Plain Old Open-URI and RegExps), POOH (Plain Old Open-URI and Hpricot),WWW::Mechanize, scRUBYt, WATIR and FireWatir, and scrAPI. Of the examples shown, Hpricot looks like an excellent HTML parser (though it does require native code) while scRUBYt and scrAPI seem to have the most promise for making screen scraping easy to do. His slides are here. The May NoVA RUG will be held on May 23 as the prior week is RailsConf.