Re: [R] Reasons to Use R

From: Ramon Diaz-Uriarte <rdiaz02_at_gmail.com>
Date: Fri 06 Apr 2007 - 19:18:29 GMT

Dear Lorenzo,

I'll try not to repeat what other have answered before.

On 4/5/07, Lorenzo Isella <lorenzo.isella@gmail.com> wrote:
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
(...)

> (1)Institutions (not only academia) using R

You can count my institution too. Several groups. (I can provide more details off-list if you want).

> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different hardware.

We do use R in commodity off-the shelf clusters; our two clusters are running Debian GNU/Linux; both 32-bit machines ---Xeons--- and 64-bit machines ---dual-core AMD Opterons. We use parallelization quite a bit, with MPI (via Rmpi and papply packages mainly). One convenient feature is that (once the lam universe is up and running) whether we are using the 4 cores in a single box, or the max available 120, is completeley transparent. Using R and MPI is, really, a piece of cake. That said, there are things that I miss; in particular, oftentimes I wish R were Erlang or Oz because of the straightforward fault-tolerant distributed computing and the built-in abstractions for distribution and concurrency. The issue of multithreading has come up several times in this list and is something that some people miss.

I am not sure how much R is used in the usual HPC realms. It is my understanding that the "traditional HPC" is still dominated by things such as HPF, and C with MPI, OpenMP, or UPC or Cilk. The usual answer to "but R is too slow" is "but you can write Fortran or C code for the bottlenecks and call it from R". I guess you could use, say, UPC in that C that is linked to R, but I have no experience. And I think this code can become a pain to write and maintain (specially if you want to play around with what you try to parallelize, etc). My feeling (based on no information or documentation whatsoever) is that how far R can be stretched or extended into HPC is still an open question.

> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>

(In addition to all the already mentioned answers) Complete source code availability. Being able to look at the C source code for a few things has been invaluable for me. And, of course, and extremely active, responsive, and vibrant community that, among other things, has contributed packages and code for an incredible range of problems.

Best,

R.

P.S. I'd be interested in hearing about the responses you get to your presentation.

> Kind Regards
>
> Lorenzo Isella
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat Apr 07 05:34:13 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 07 Apr 2007 - 00:31:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.