From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Fri 15 Sep 2006 - 00:55:29 GMT

*>
*

*> # package doBy
*

*> library(doBy)
*

*> summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
*

V1 V4 mean.V2 mean.V3 length.V3

*>
*

*> # package reshape
*

*> library(reshape)
*

*> f <- function(x) c(mean = mean(x), n = length(x))
*

*> cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]
*

V1 V4 V2_mean V2_n V3_mean

Date: Fri 15 Sep 2006 - 00:55:29 GMT

Here are three different ways to do it:

# base R

fb <- function(x)

c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2), V3.mean = mean(x$V3), n = length(x$V1)) do.call(rbind, by(DF, DF[c(1,4)], fb))

# package doBy

library(doBy)

summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]

# package reshape

library(reshape)

f <- function(x) c(mean = mean(x), n = length(x))
cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]

*> # base R
**> fb <- function(x)
*

+ c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2),
+ V3.mean = mean(x$V3), n = length(x$V1))

*> do.call(rbind, by(DF, DF[c(1,4)], fb))
*

V1 V4 V2.mean V3.mean n [1,] 1 1 2.0 400 3 [2,] 3 1 5.0 70 1 [3,] 2 2 0.7 35 2

V1 V4 mean.V2 mean.V3 length.V3

1 A ID1 2.0 400 3 2 C ID1 5.0 70 1 3 B ID2 0.7 35 2

V1 V4 V2_mean V2_n V3_mean

1 A ID1 2.0 3 400 2 B ID2 0.7 2 35 3 C ID1 5.0 1 70

---Received on Fri Sep 15 10:58:45 2006

> library(doBy)> summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]

V1 V4 mean.V2 mean.V3 length.V3 1 A ID1 2.0 400 3 2 C ID1 5.0 70 1 3 B ID2 0.7 35 2

>> library(reshape)> f <- function(x) c(mean = mean(x), n = length(x))> cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate = f)[,-6]

V1 V4 V2_mean V2_n V3_mean 1 A ID1 2.0 3 400 2 B ID2 0.7 2 35 3 C ID1 5.0 1 70 On 9/14/06, Emmanuel Levy <emmanuel.levy@gmail.com> wrote:

> Thanks Gabor, that is much faster than using a loop!

>> I've got a last question:>> Can you think of a fast way of keeping track of the number of> observations collapsed for each entry?>> i.e. I'd like to end up with:>> A 2.0 400 ID1 3 (3obs in the first matrix)> B 0.7 35 ID2 2 (2obs in the first matrix)> C 5.0 70 ID1 1 (1obs in the first matrix)>> Or is it required to use an temporary matrix that is merged later? (As> examplified by Mark in a previous email?)>> Thanks a lot for your help,>> Emmanuel>> On 9/13/06, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:> > See below.> >> > On 9/13/06, Emmanuel Levy <emmanuel.levy@gmail.com> wrote:> > > Thanks for pointing me out "aggregate", that works fine!> > >> > > There is one complication though: I have mixed types (numerical and character),> > >> > > So the matrix is of the form:> > >> > > A 1.0 200 ID1> > > A 3.0 800 ID1> > > A 2.0 200 ID1> > > B 0.5 20 ID2> > > B 0.9 50 ID2> > > C 5.0 70 ID1> > >> > > One letter always has the same ID but one ID can be shared by many> > > letters (like ID1)> > >> > > I just want to keep track of the ID, and get a matrix like:> > >> > > A 2.0 400 ID1> > > B 0.7 35 ID2> > > C 5.0 70 ID1> > >> > > Any idea on how to do that without a loop?> >> > If V4 is a function of V1 then you can aggregate by it too and it will> > appear but have no effect on the classification:> >> > > aggregate(DF[2:3], DF[c(1,4)], mean)> > V1 V4 V2 V3> > 1 A ID1 2.0 400> > 2 C ID1 5.0 70> > 3 B ID2 0.7 35> >> >> > >> > > Many thanks,> > >> > > Emmanuel> > >> > > On 9/12/06, Emmanuel Levy <emmanuel.levy@gmail.com> wrote:> > > > Hello,> > > >> > > > I'd like to group the lines of a matrix so that:> > > > A 1.0 200> > > > A 3.0 800> > > > A 2.0 200> > > > B 0.5 20> > > > B 0.9 50> > > > C 5.0 70> > > >> > > > Would give:> > > > A 2.0 400> > > > B 0.7 35> > > > C 5.0 70> > > >> > > > So all lines corresponding to a letter (level), become a single line> > > > where all the values of each column are averaged.> > > >> > > > I've done that with a loop but it doesn't sound right (it is very> > > > slow). I imagine there is a> > > > sort of "apply" shortcut but I can't figure it out.> > > >> > > > Please note that it is not exactly a matrix I'm using, the function> > > > "typeof" tells me it's a list, however I access to it like it was a> > > > matrix.> > > >> > > > Could someone help me with the right function to use, a help topic or> > > > a piece of code?> > > >> > > > Thanks,> > > >> > > > Emmanuel> > > >> > >> > > ______________________________________________> > > R-help@stat.math.ethz.ch mailing list> > > https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code.> > >> >>

______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Fri 15 Sep 2006 - 01:30:05 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*