import.io

Posted on February 17, 2014

My days of unemployment haven’t all been ones of lazing in the Melbourne summer sun, kayaking the bay and push biking to lunch, oh no. Yes there has been a fair serving of DIY projects around the house and of course toys to play with but I have on occasion been playing with a few courses on udemy.com.

One of these was for web scraping, a scenario that I could have used a couple of time in previous employment for gathering data from competitors sites and comparing to our own. So it was with some passing interest that I began this course. Its actually pretty simple to set up and use given the application is pretty much a  plugin from Chrome called import.io. Following the course within a few hours I was off scraping away happily. By the end of the day I had a list of the top 5000 odd films from IMDB, which is pretty cool cause you can only see the top 250 usually unless you go into categories and then of course you only see that categories top 1000.

So for the interested heres your list. DataSet-IMDB-orderedAndFiltered