[R] Developing a web crawler

From: antujsrv <antujsrv_at_gmail.com>
Date: Thu, 03 Mar 2011 01:22:44 -0800 (PST)


I wish to develop a web crawler in R. I have been using the functionalities available under the RCurl package.
I am able to extract the html content of the site but i don't know how to go about analyzing the html formatted document. I wish to know the frequency of a word in the document. I am only acquainted with analyzing data sets.
So how should i go about analyzing data that is not available in table format.

Few chunks of code that i wrote:
w <-
getURL("http://www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B003DZ1Y8Q/ref=dp_reviewsanchor#FullQuotes") write.table(w,"test.txt")
t <- readLines(w)

readLines also didnt prove out to be of any help.

Any help would be highly appreciated. Thanks in advance.

View this message in context: http://r.789695.n4.nabble.com/Developing-a-web-crawler-tp3332993p3332993.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Mar 2011 - 13:52:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 Mar 2011 - 15:10:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive