March meet of BangPypers
Yesterday, we had the March meetup of the BangPypers at a room in the Thoughtworks office. There were about 15 people present. It was good to see Kuruvila join us this time.
First, we had an introduction to the Beautiful Soup module by Siddharta. Beautiful Soup is a neat module to help you do screen-scraping i.e. get information from websites, no matter how badly they are written. Siddharta demo-ed his own script which used Beautiful Soup to combine the bugzilla, twiki and code build status into a single dashboard. The script he used to do this was pretty simple as well.
After that, Anand gave a quick introduction to Uraga and I picked up from there and explained what we are trying do. It was good to see that it finally became a BoF session where everybody pitched in with their ideas, doubts and concerns. First, we talked about how we have imagined Uraga will work. Pradeep followed it up by explaining DistUtils and PyPI and how they fit in to our scheme of things.
After much discussion, we decided that our first priority is to make this work for pure Python modules and with no dependency checking (although, that’s next on the todo list). The roadblock we have now is how to get a list of download URLs for each module. PyPI does have a Download-url but it is meant to be a human-readable page and we do not have an automatic way of downloading the packages. For example, the PKG-INFO for Snakelets points to the Sourceforge download page, from where, the user has to manually select the download package type and mirror and only then download the packages. After this, he/she unarchives it and hopefully the only step is python setup.py install … now, how do we automate this?
One solution was screen-scraping Freshmeat to get these URLs but that idea was quickly scrapped since it was not a good idea in the long run (and we may have to get permission from the Freshmeat guys before doing such things). The only feasible solution was to either request packagers to include a direct download URL or we have to provide our own register method, similar to what PyPI has, in order to maintain a list of metadata on our BangPypers website.
Premshree then talked about how Ruby Gems works. The writers of Gems had an advantage on that they had a free-hand in doing whatever they liked. Packagers have to write a gemspec and run the gems program to create a ‘gem’ that can be installed on any Ruby system. Unfortunately, we can’t follow the same path since DistUtils already does so much work and building on top of DistUtils is the only way to go. Still, I prefer the Gems way of packaging.
We finally had to force ourselves to stop at about 7.45 ; the Thoughtworks guys had thoughtfully arranged snacks and we went to the cafeteria room and were getting to know the newbies to the group. Much later, I found out that the Computer Associates guys had come down from Hyderabad for this meet! Wow. Good to know they think the meetups are worth it
Even before the meet, Anand had already prepared a 0.02 version of Uraga. I am happy about the way things are turning out and it will be fun once we get this show on the road. More so, because a bunch of people sitting in a room in Bangalore, India are trying to create something that could help 14% of all programmers.
Update: Anand gives a brief overview about the current design of Uraga.














Comments
6 comments
Two days and no comments! Well now there is one
-Anand
March 22nd, 2005, 1:56 pm | #
Now there is two
March 23rd, 2005, 5:57 am | #
hey dude,
i liked beautifulsoup too but its choking up on quite a few websites like nytimes and timesofindia:
March 24th, 2005, 5:33 pm | #
Amit: I don’t know how good BeautifulSoup is, I have never tried it myself
… Have you tried contacting the authors of the software?
March 25th, 2005, 12:34 am | #
Well, that is what you will expect from a module which claims not to use regular expressions! It might be cool to say that we are doing all these without using regular expressions, but then you pay a bit for errors in HTML pages which regexps can catch
If you are looking for something that can handle bad HTML also, try HTMLScraper. It is available from the ASPN Python Cookbook . http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286269
March 25th, 2005, 11:46 am | #
swaroopch: it was my fault, once if it fails it will fail always, so create new BeautifulSoup object. i used it for my furl2delicious, and its pretty great!
anand: i strongly recommend u gave this parser a try if u needed to do some scraping, its quite elegant.
April 7th, 2005, 4:31 pm | #
Leave a Reply