Re: [R] Rapache ( was Developing a web crawler )

From: Mike Marchywka <marchywka_at_hotmail.com>
Date: Sun, 06 Mar 2011 08:06:48 -0500



> Date: Thu, 3 Mar 2011 13:04:11 -0600
> From: Matt.Shotwell_at_vanderbilt.edu
> To: r-help@r-project.org
> Subject: Re: [R] Developing a web crawler / R "webkit" or something similar? [off topic]
>
> On 03/03/2011 08:07 AM, Mike Marchywka wrote:
> >
> >
> >
> >
> >
> >
> >
> >> Date: Thu, 3 Mar 2011 01:22:44 -0800
> >> From: antujsrv_at_gmail.com
> >> To: r-help_at_r-project.org
> >> Subject: [R] Developing a web crawler
> >>
> >> Hi,
> >>
> >> I wish to develop a web crawler in R. I have been using the functionalities
> >> available under the RCurl package.
> >> I am able to extract the html content of the site but i don't know how to go
> >
> > In general this can be a big effort but there may be things in
> > text processing packages you could adapt to execute html and javascript.
> > However, I guess what I'd be looking for is something like a "webkit"
> > package or other open source browser with or without an "R" interface.
> > This actually may be an ideal solution for a lot of things as you get
> > all the content handlers of at least some browser.
> >
> >
> > Now that you mention it, I wonder if there are browser plugins to handle
> > "R" content ( I'd have to give this some thought, put a script up as
> > a web page with mime type "test/R" and have it execute it in R. )
>
> There are server-side solutions for this sort of thing. See
> http://rapache.net/ . Also, there was a string of messages on R-devel
> some years ago addressing the mime type issue; beginning here:
> http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't
> know whether there was a resolution. Some suggestions were text/x-R,
> text/x-Rd, application/x-RData.
>
The rapache demo looks like something I could use right away but I haven't looked into the handlers yet. I have installed rapache now on my debian system ( still have config issues but I did get apach2 to restart LOL) Before I plow into this too far, how would this compare/compete with something like a PHP library for Rserve? That is the approach I had been pursuing.

Thanks.

> -Matt
>
> >
                                               



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 06 Mar 2011 - 14:41:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 06 Mar 2011 - 19:20:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive