Re: [R] A comment about R:

From: Thomas Lumley <>
Date: Wed 04 Jan 2006 - 06:23:18 EST

On Tue, 3 Jan 2006, Peter Dalgaard wrote:
> One thing that is often overlooked, and hasn't yet been mentioned in
> the thread, is how much *simpler* R can be for certain completely
> basic tasks of practical or pedagogical relevance: Calculate a simple
> derived statistic, confidence intervals from estimate and SE,
> percentage points of the binomial distribution - using dbinom or from
> the formula, take the sum of each of 10 random samples from a set of
> numbers, etc. This is where other packages get stuck in the
> procedure+dataset mindset.

Some of these things are actually fairly straightforward in Stata. For example, Stata will give confidence intervals and tests for linear combinations of coefficients and even (using symbolic differentiation and the delta method) for nonlinear combinations. The first is available in packages in R, the second is in "S Programming" but doesn't seem to be packaged.

. di Binomial(10,4,0.2)

Taking the sum of each of ten random samples, or other things of that sort, obviously requires creating a new data set, but again there are facilities to automate this. I have, for example, computed bootstrap confidence intervals for ratio or difference of medians in a service course using Stata. It would be easier in R, but not that much easier.

> For much the same reason, those packages make you tend to treat
> practical data analysis as something distinct from theoretical
> understanding of the methods: You just don't use SAS or SPSS or Stata
> to illustrate the concept of a random sample by setting up a small
> simulation study as the first thing you do in a statistics class,
> whereas you could quite conceivably do it in R. (What *is* the
> equivalent of rnorm(25) in those languages, actually?)

set obs 25
gen x = invnorm(uniform())

[This does create a new data set, of course]

> Even when using SAS in teaching, I sometimes fire up R just to
> calculate simple things like
> pbar <- (p1+p2)/2
> sqrt(pbar*(1-pbar))

local pbar=(0.3+0.5)/2
display sqrt(`pbar'*(1-`pbar'))

Now, I still prefer R both for data analysis and (even more so) for programming. There are some things that it is genuinely difficult to program in Stata -- and as evidence that this isn't just my ignorance of the best approaches, the language was substantially reworked in both versions 8 and 9 to allow the vendor to implement better graphics and linear mixed models.

On the question of which system really is easier to learn I can only comment that this isn't the only question where education, as a field, would benefit from some good randomized controlled trials.


Thomas Lumley			Assoc. Professor, Biostatistics	University of Washington, Seattle

______________________________________________ mailing list PLEASE do read the posting guide! Received on Wed Jan 04 06:30:44 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:41:46 EST