Re: [R] sas to r

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Sat 17 Jul 2004 - 22:36:20 EST

Greg Adkison wrote:
> I would be incredibly grateful to anyone who'll help me translate some
> SAS code into R code.
>
> Say for example that I have a dataset named "dat1" that includes five
> variables: wshed, site, species, bda, and sla. I can calculate with the
> following SAS code the mean, CV, se, and number of observations of
> "bda" and "sla" for each combination of "wshed," "species," and "site,"
> restricting the species considered to only three of several species in
> dat1 (b, c, and p). Moreover, I can output these calculations and
> grouping variables to a dataset named "dat2" that will reside in RAM
> and include the variables wshed, site, species, mBdA, msla, cBda,
> sBdA, ssla, nBda, and nsla.
>
> proc sort data=dat1;
> by wshed site species;
> proc means data=dat1 noprint mean cv stderr n;
> by wshed site species;
> where species in ('b', 'c', 'p');
> var BdA sla;
> output out=dat2
> mean=mBdA msla
> cv=cBdA csla
> stderr=sBdA ssla
> n=nBdA nsla;
>
> Thanks,
> Greg

The following handles any number of analysis variables, with automatic naming of all statistics computed from them. It requires the Hmisc package.

# Generate some data. Put one NA in sla. set.seed(1)
dat1 <- expand.grid(wshed=1:2, site=c('A','B'),

                     species=c('a','b','c','p'),
                     reps=1:10)

n <- nrow(dat1)
dat1 <- transform(dat1,
                   BdA = rnorm(n, 100, 20),
                   sla = c(rnorm(n-1, 200, 30), NA))
# Can use upData function in Hmisc in place of transform

# Summarization function, per stratum, for a matrix of analysis # variables

g <- function(y) {
   n <- apply(y, 2, function(z) sum(!is.na(z)))
   m <- apply(y, 2, mean, na.rm=TRUE)
   s <- apply(y, 2, sd,   na.rm=TRUE)

   cv <- s/m
   se <- s/sqrt(n)
   w <- c(m, cv, se, n)
   names(w) <- t(outer(c('m','c','s','n'), colnames(y), paste, sep=''))    w
}
library(Hmisc)
dat2 <- with(dat1,
               summarize(cbind(BdA, sla),
                         llist(wshed, site, species),
                         g,
                         subset=species %in% c('b','c','p'),
                         stat.name='mBdA')
               )

options(digits=3)
dat2 # is a data frame

    wshed site species mBdA msla cBdA csla sBdA ssla nBdA nsla

1      1    A       b 100.5  195 0.133 0.1813 4.23 11.20   10   10
2      1    A       c  99.7  206 0.101 0.1024 3.17  6.68   10   10
3      1    A       p 101.4  188 0.239 0.1580 7.65  9.39   10   10
4      1    B       b 109.9  203 0.118 0.1433 4.09  9.21   10   10
5      1    B       c  98.4  221 0.193 0.1250 6.01  8.72   10   10
6      1    B       p 102.9  203 0.216 0.1446 7.03  9.29   10   10
7      2    A       b  95.8  195 0.241 0.2011 7.31 12.40   10   10
8      2    A       c  98.7  207 0.194 0.1274 6.04  8.33   10   10
9      2    A       p 102.2  191 0.217 0.1709 7.01 10.31   10   10
10     2    B       b  97.8  191 0.235 0.2079 7.27 12.58   10   10
11     2    B       c 100.9  194 0.164 0.0987 5.24  6.07   10   10
12     2    B       p 103.0  209 0.144 0.0769 4.69  5.35   10    9


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 17 22:47:29 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 02:36:41 EST