Re: [R] Odp: question about "mean"

From: Allan Engelhardt <allane_at_cybaea.com>
Date: Tue, 15 Jun 2010 17:08:20 +0100

This solution also seems to be the fastest of the proposed options for this data set:

library("rbenchmark")
benchmark(columns = c("test", "elapsed", "relative"), order = "elapsed",

```           apply =apply(iris[, -5], 2, tapply, iris\$Species, mean),
with = with(iris, rowsum(iris[, -5], Species)/table(Species)),
aggregate = aggregate(iris[,-5],list(iris[,5]),mean),
sapply = sapply(split(iris[,1:4], iris\$Species), mean))
# 4    sapply   0.148 1.000000
# 1     apply   0.248 1.675676
# 2      with   0.310 2.094595
```

# 3 aggregate 0.313 2.114865

However, the 'with/rowsum/table' option proposed by Bill Venables appears to scale better:

i <- rbind(iris, iris, iris, iris, iris) i <- rbind(i, i, i, i, i); i <- rbind(i, i, i, i, i); i <- rbind(i, i, i, i, i)
NROW(i)
# [1] 93750
benchmark(columns=c("test", "elapsed", "relative"), order="elapsed",

```           apply=apply(i[, -5], 2, tapply, i\$Species, mean),
with=with(i, rowsum(i[, -5], Species)/table(Species)),
aggregate=aggregate(i[,-5],list(i[,5]),mean),
sapply=sapply(split(i[,1:4], i\$Species), mean))
#        test elapsed  relative
# 2      with   2.708  1.000000
# 4    sapply   5.189  1.916174
# 3 aggregate  15.990  5.904727
# 1     apply  31.646 11.686115

```

(Because I care about these things...)

Allan

On 10/06/10 09:44, Petr PIKAL wrote:
> Hi
>
> split/sapply can be used besides other options
>
> sapply(split(iris[,1:4], iris\$Species), mean)
>
> Regards
> Petr
>
> r-help-bounces_at_r-project.org napsal dne 10.06.2010 00:43:29:
>
>
>> Hi there:
>> I have a question about generating mean value of a data.frame. Take
>> iris data for example, if I have a data.frame looking like the
>>
> following:
>
>> ---------------------
>> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
>> 1 5.1 3.5 1.4
>> 0.2 setosa
>> 2 4.9 3.0 1.4
>> 0.2 setosa
>> 3 4.7 3.2 1.3
>> 0.2 setosa
>> . . . .
>> . .
>> . . . .
>> . .
>> . . . .
>> . .
>> -----------------------
>> There are three different species in this table. I want to make a table
>>
> and
>
>> calculate mean value for each specie as the following table:
>>
>> -----------------
>> Sepal.Length Sepal.Width Petal.Length
>> Petal.Width
>> mean.setosa 5.006 3.428 1.462
>> 0.246
>> mean.versicolor 5.936 2.770 4.260
>> 1.326
>> mean.virginica 6.588 2.974 5.552
>> 2.026
>> -----------------
>> Is there any short syntax can do it?? I mean shorter than the code I
>>
> wrote
>
>> as following:
>>
>> attach(iris)
>> mean.setosa<-mean(iris[Species=="setosa", 1:4])
>> mean.versicolor<-mean(iris[Species=="versicolor", 1:4])
>> mean.virginica<-mean(iris[Species=="virginica", 1:4])
>> data.mean<-rbind(mean.setosa, mean.versicolor, mean.virginica)
>> detach(iris)
>> ------------------
>>
>> Thanks a million!!!
>>
>>
>> --
>> =====================================
>> Shih-Hsiung, Chou
>> System Administrator / PH.D Student at
>> Department of Industrial Manufacturing
>> and Systems Engineering
>> Kansas State University
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>>
> http://www.R-project.org/posting-guide.html
>
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help