Re: [R] sapply following using by with a list of factors

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Mon 30 May 2005 - 13:42:12 EST

On 5/29/05, McClatchie, Sam (PIRSA-SARDI) <mcclatchie.sam@saugov.sa.gov.au> wrote:
> Background:
> OS: Linux Mandrake 10.1
> release: R 2.0.0
> editor: GNU Emacs 21.3.2
> front-end: ESS 5.2.3
> ---------------------------------
> Colleagues
>
> I am having some trouble extracting results from the function by, used to
> average variables in a data.frame first by one factor (depth) and then by a
> second factor (station). The real data.frame is quite large
> > dim(data.2001)
> [1] 32049 11
>
> Here is a snippet of code:
>
> ## bin density data for each station into 1 m depth bins, containing means
> data.2001.test$integer.Depth <- as.factor(round(data.2001.test$Depth,
> digits=0))
> attach(data.2001.test)
> binned.data.2001 <- by(data.2001.test[,5:11], list(depth=integer.Depth,
> station=Station), mean)
>
> and here is a snippet of the data.frame
>
> > dim(data.2001.test)
> [1] 150 11
> > dump("data.2001.test", file=stdout())
> data.2001.test <-
> structure(list(Cruise = structure(as.integer(c(1, 1, 1, 1, 1,

Try the following. To keep this short lets just take a subset of rows called dd. Also, we drop the Station levels that are not being used since this test only uses 2 levels and there are 288 Station levels in total. The function that we apply using by returns a vector consisting of the integer.Depth, Station and the column means of columns 5 to 10. (Asking for just the mean of those, as in your example, would take all the numbers in all the columns passed to mean and give back a grand mean  rather than a mean per column.) Finally we rbind it all back together.

> # data.2001.test is your data frame including the integer.Depth column
> dd <- data.2001.test[50:60,]
> dd$Station <- dd$Station[drop = TRUE]
> dd.bin <- by(dd, list(dd$integer.Depth, dd$Station), function(x)
+ c(integer.Depth = x$integer.Depth[1], Station = x$Station[1], + colMeans(x[,5:10])))
> do.call("rbind", dd.bin)

     integer.Depth Station    Depth Temperature.oC Salinity Fluoresence.Volts
[1,]            20       1 23.90167       17.67420 35.47650          1.107433
[2,]            21       1 24.75350       17.33355 35.59050          1.060400
[3,]             1       2  5.19000       19.61510 35.54870          0.726500
[4,]             2       2  5.82950       19.61305 35.55025          0.719200
[5,]             3       2  6.81250       19.61300 35.58345          0.741150
[6,]             4       2  7.55000       19.61180 35.60460          0.754600
     Density.kg.m3 Brunt.Vaisala.Freq.cycl.h
[1,]      25.82400                 -5.095467
[2,]      25.99820                 16.030975
[3,]      25.30560                 -6.261240
[4,]      25.31015                  4.051561
[5,]      25.33985                  8.893225
[6,]      25.35960                 -8.167610

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon May 30 13:51:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:15 EST