Re: [R] Reasons to Use R

From: Stephen Tucker <brown_emu_at_yahoo.com>
Date: Fri 06 Apr 2007 - 09:19:44 GMT


Hi Lorenzo,

I don't think I'm qualified to provide solid information on the first three questions, but I'd like to drop a few thoughts on (4). While there are no shortage of language advocates out there, I'd like to join in for this once. My background is in chemical engineering and atmospheric science; I've done simulation on a smaller scale but spend much of my time analyzing large sets of experimental data. I am comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro, and I also know a little IDL but have not programmed in it extensively.

As you are probably aware, I would count among these, Matlab, R, Python, and IDL as good candidates for processing large data sets, as they are high-level languages and can communicate with netCDF files (which I imagine will be used to transfer data).

Each language boasts an impressive array of libraries, but what I think gives R the advantage for analyzing data is the level of abstraction in the language. I am extremely impressed with the objects available to represent data sets, and the functions support them very well - it requires that I carry around a fewer number of objects to hold information about my data (and I don't have to "unpack" them to feed them into functions). The language is also very "expressive" in that it lets you write a procedure in many different ways, some shorter, some more readable, depending on what your situation requires. System commands and text processing are integrated into the language, and the input/output facilities are excellent, in terms of data and graphics. Once I have my data object I am only a few keystrokes to split, sort, and visualize multivariate data; even after several years I keep discovering new functions for basic things like manipulation of data objects and descriptive statistics, and plotting - truly, an analyst's needs have been well anticipated.

And this is a recent obsession of mine, which I was introduced to through Python, but the functional programming support for R is amazing. By using higher-order functions like lapply(), I infrequently rely on FOR-LOOPS, which have often caused me trouble in the past because I had forgotten to re-initialize a variable, or incremented the wrong variable, etc. Though I'm definitely not militant about functional programming, in general I try to write functions and then apply them to the data (if the functions don't exist in R already), often through higher-order functions such as lapply(). This approach keeps most variables out of the global namespace and so I am less likely to reassign a value to a variable that I had intended to keep. It also makes my code more modular so that I can re-use bits of my code as my analysis inevitably grows much larger than I had originally intended.

Furthermore, my code in R ends up being much, much shorter than code I imagine writing in other languages to accomplish the same task; I believe this leads to fewer places for errors to occur, and the nature of the code is immediately comprehensible (though a series of nested functions can get pretty hard to read at times), not to mention it takes less effort to write. This also makes it easier to interact with the data, I think, because after making a plot I can set up for the next plot with only a few function calls instead of setting out to write a block of code with loops, etc.

I have actually recommended R to colleagues who needed to analyze the information from large-scale air quality/ global climate simulations, and they are extremely pleased. I think the capability for statistics and graphics is well-established enough that I don't need to do a hard-sell on that so much, but R's language is something I get very excited about. I do appreciate all the contributors who have made this available.

Best regards,
ST

> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
>
> Kind Regards
>
> Lorenzo Isella
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
 



Bored stiff? Loosen up...

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Apr 06 19:31:40 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 06 Apr 2007 - 20:32:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.