Re: [R] Can R replicate this data manipulation in SAS?

From: peter dalgaard <pdalgd_at_gmail.com>
Date: Fri, 22 Apr 2011 00:34:55 +0200

On Apr 21, 2011, at 16:00 , Bert Gunter wrote:

> Folks:
>
> It is perhaps worth noting that this is probably a Type III error: right
> answer to the wrong question. The right question would be: what data
> structures and analysis strategy are appropriate in R? As usual, different
> language architectures mean that different paradigms should be used to best
> fit a language's strengths and weaknesses. Direct translations do not
> necessarily do this.

Hum, there is a point, though: If you take the crude translation approach, you will soon realize that there is very little that SAS (or SPSS, or...) can do that you literally can't do in R.

It is often the case that there is much neater and well-structured approach in R, but the flip side is that there are cases where the neat solution is hard to find, and maybe some cases where it doesn't really exist (e.g. not everything can be vectorized). This is the sort of thing that in some circles give R a reputation for being poorly suited for data handling, compared to the DATA step in SAS. Do notice the circular logic that occurs when defining "typical statistical task" as "something you can do in SAS", though.

(One example is "last observation carried forward", a rather dubious technique for filling in missing observations in longitudinal studies, which probably directly stems from the RETAIN directive in SAS.

In R, you may find yourself doing something like

  x[is.na(x)] <- x[!is.na(x)][cumsum(!is.na(x))[is.na(x)]]

which isn't even completely failsafe. However, you'll get the result soon enough with

  for (i in seq_len(x)) if (is.na(x[i])) x[i] <- t else t <- x[i]

and this time, you can actually read the code.

Of course, approx() will do the trick much more swiftly than either of the above.)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 22 Apr 2011 - 02:31:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 22 Apr 2011 - 17:10:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive