# Re: [R] A comment about R:

From: Robert W. Baer, Ph.D. <rbaer_at_atsu.edu>
Date: Thu 05 Jan 2006 - 18:20:41 EST

>> On Wed, 4 Jan 2006, Roger Bivand wrote:
>> >
>> > source(url("http://spatial.nhh.no/R/etc/capabilities.R"), echo=TRUE)
>> >
>> > as a reproduction of the Stata capabilities session? Both the t test
>> > and
>> > the chi-square from our side point up oddities. I didn't succeed on
>> > putting fit lines on a grouped xyplot, so backed out to base graphics.
>> > This could be Swoven, possibly using the RweaveHTML driver.
>> >
>>
Excellent! Although I will point out that the Stata summarize command is a little different than the R summary command. The summarize command is a little more like:

summarize <- function(x){
obs=length(x)
mn=mean(x)
sd=sd(x)
min=min(x)
max=max(x)
cat('obs \t Average \t Std. Dev. \t Min \t Max \n', obs,'\t',mn,'\t',sd,'\t',min,'\t',max,'\n')  }

As a user of statistics rather than a statistician, I have to agree with the original author whose premise was that different statistical packages have different strengths. I think the main basis for his comments on R were, reading between the lines, that he knew it mostly from talking to friends. Any statistical tool for those of us in the back rows is as easy as our mentor make it. At my institution there is a paucity of good mentors, and I have found the learning curve equally steep for Stata 7 for which I have many, many volumes of documentation and R for which I have greatly benefited from several of the terrific contributed documentation and books already mentioned.

The original article was about SAS, Stata, and SPSS strengths for carrying out 'tradtional statistics'. What are R's strengths? Too numerous to mention in the hands of the right users. However, I would point to things like the tools at the Bioconductor site as a broad illustration of the nearly infinite flexibility and extensibility of R for specialized statistical tasks. Does this mean that R is a poor tool to choose for the basic and traditional procedures? Hardly! (Well written documentation like John Fox's cars, Peter Dalgard's ISwR, and John Verzani's Simple R contributed documentation put introductory R statistical procedures within easy grasp of users. I have found that non-statistics students rapidly catch on with 'problem-specific' guidance once they overcome the lack of GUI. (R-commander is certainly a solution there). As the number of R mentors grows to rival SAS, Stata, and SPSS, the everyday tasks might even appear easier to new initiates than the corresponding syntax and thought processes in the other programs.

So, what are R's major weaknesses? I do not think they are statistical. Rather, it is having 'mentors' who have gone before to do the type of analysis that you (the end user) wish to do, and who have graciously left behind a paper trail of how to syntactically address a specific statistical task. There is a huge amount out there, but it is hard to find at the beginning. [BTW: This listserve is of course a tremendous resource, and why should we not read the posting guide out of simple respect for those who have given us such a great resource. I don't like getting flammed either, but darn it, sometimes I deserve it ;-).]

Finally, this thread has made me think back 3-4 years to when I first discovered R. The think that frustrated me the most in the early weeks was getting data into R. It took me no time to learn to generate data with all kinds of distrbutions, no time to discover 'build in' datasets from the data() function, or to enter data a number at a time with the c() funtion. BUT HOW was I to get the datasets (spreadsheet, database) from my laboratory into R? This somehow has been much easier to figure out in the other (often GUI) statistical environments I have used. [Of course, I finally discovered the documentation for the foreign package and later learned about RODBC, and I was blown away by the flexibility available.

Well just the thoughts of one end user type...

Rob

R-help@stat.math.ethz.ch mailing list