Scraping IMDB’s Search Results

Learning how to scrape web pages on the internet, and given how i’m still a complete programming newbie, it took some getting used to. Credit to a certain Lee Hawthorn on how to do this.

I used SelectorGadget, as was suggested in Hawthorn’s article, but i noticed that it has a Chrome extension. This made the whole process even more easier! I’m not even certain i did this right, but it seems to give me the results i want, which when you’re starting out, is all that seems to matter.

Whenever you search for a word or phrase in IMDB’s search bar and refine the search by movie titles, you get a maximum of 200 search results. I’m still not too sure how i can get more than just 200, but for now, i guess this’ll have to do.

I created the function searchIMDB with the following script:

searchIMDB = function(searchTerm){


fullURL = paste("", searchTerm, "&s=tt&ttype=ft&ref_=fn_ft", sep = "")

page = html(fullURL)

movieTitles = page %>% 
  html_nodes(".result_text") %>%

movieTitles.df = na.omit(tbl_df(data.frame(movieTitles)))


And so running the script…


gives me the following list of movie titles:

Source: local data frame [6 x 1]

1 C.R.A.Z.Y. (2005) aka “Crazy”
2 Crazy (I) (2000)
3 Crazy (II) (2008)
4 Crazy, Stupid, Love. (2011)
5 The Crazies (2010)
6 Like Crazy (2011)

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s