# Re: [R] Calculation of group summaries

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri 15 Jul 2005 - 12:43:03 EST

nsites <- 6
yearList <- 1999:2001
fakesub <- data.frame(

```	year = rep(yearList, nsites/length(yearList), each = 11),
site_id  = rep(c('site1','site2'), each = 11*nsites),
visit_no = rep(1, 11*2*nsites),
transect = rep(LETTERS[1:11], nsites, each = 2),
transdir = rep(c('LF','RT'), 11*nsites),
undercut = abs(rnorm(11*2*nsites, 10)),
angle    = runif(11*2*nsites, 0, 180)
```

)

f <- function(x) cbind(year = x[1,1], site_id = x[1,2], visit_no = x[1,3],

mean = mean(x[,6]), sd = sd(x[,6]), length = length(x[,6])) do.call("rbind", by(fakesub, fakesub[,1:3], f))

On 7/14/05, Seeliger.Curt@epamail.epa.gov <Seeliger.Curt@epamail.epa.gov> wrote:
> Several people suggested specific functions (by, tapply, sapply and
> others); thanks for not blowing off a simple question regarding how to
> do the following SQL in R:
> > select year,
> > site_id,
> > visit_no,
> > mean(undercut) AS meanUndercut,
> > count(undercut) AS nUndercut,
> > std(undercut) AS stdUndercut
> > from channelMorphology
> > group by year, site_id, visit_no
> > ;
>
> I'd spent quite a bit of time with the suggested functions earlier but
> had no luck as I'd misread the docs and put the entire dataframe where
> it only wants the columns to be processed. Sometimes it's the simplest
> of things.
>
> This has lead to another confoundment-- sd() acts differently than
> mean() for some reason, at least with R 1.9.0. For some reason, means
> generate NA results and a warning message for each group:
>
> argument is not numeric or logical: returning NA in:
> mean.default(data[x, ], ...)
>
> Of course, the argument is numeric, or there'd be no sd value. Or more
> likely, I'm still missing something really basic. If I wrap the value in
> as.numeric() things work fine. Why should I have to do this for mean
> and median, but not sd? The code below should reproduce this error
>
> # Fake data for demo:
> nsites<-6
> yearList<-1999:2001
> fakesub<-as.data.frame(cbind(
> year =rep(yearList,nsites/length(yearList),each=11)
> ,site_id =rep(c('site1','site2'),each=11*nsites)
> ,visit_no =rep(1,11*2*nsites)
> ,transect =rep(LETTERS[1:11],nsites,each=2)
> ,transdir =rep(c('LF','RT'),11*nsites)
> ,undercut =abs(rnorm(11*2*nsites,10))
> ,angle =runif(11*2*nsites,0,180)
> ))
>
> # Create group summaries:
> sdmets<-by(fakesub\$undercut
> ,list(fakesub\$year,fakesub\$site_id,fakesub\$visit_no)
> ,sd
> )
> nmets<-by(fakesub\$undercut
> ,list(fakesub\$year,fakesub\$site_id,fakesub\$visit_no)
> ,length
> )
> xmets<-by(fakesub\$undercut
> ,list(fakesub\$year,fakesub\$site_id,fakesub\$visit_no)
> ,mean
> )
> xmets<-by(as.numeric(fakesub\$undercut)
> ,list(fakesub\$year,fakesub\$site_id,fakesub\$visit_no)
> ,mean
> )
>
> # Put site id values (year, site_id and visit_no) into results:
> # List unique id combinations as a list of lists. Then
> # reorganize that into 3 vectors for final results.
> # Certainly, there MUST be a better way...
> foo<-strsplit(unique(paste(fakesub\$year
> ,fakesub\$site_id
> ,fakesub\$visit_no
> ,sep='#'))
> ,split='#'
> )
> year<-list()
> for(i in 1:length(foo)) {year<-rbind(year,foo[[i]][1])}
> site_id<-list()
> for(i in 1:length(foo)) {site_id<-rbind(site_id,foo[[i]][2])}
> visit_no<-list()
> for(i in 1:length(foo)) {visit_no<-rbind(visit_no,foo[[i]][3])}
>
> # Final result, more or less
> data.frame(cbind(a=year,b=site_id,c=visit_no,sdmets,nmets,xmets))
>
>
> cur
>
> --
> Curt Seeliger, Data Ranger
> CSC, EPA/WED contractor
> 541/754-4638
> seeliger.curt@epa.gov
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help