From: Gabor Grothendieck <ggrothendieck_at_gmail.com>

Date: Fri 16 Dec 2005 - 00:21:37 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Dec 16 00:29:39 2005

Date: Fri 16 Dec 2005 - 00:21:37 EST

On 12/15/05, January Weiner <january@uni-muenster.de> wrote:

> Hello again,

*>
**> On 12/14/05, Thomas Lumley <tlumley@u.washington.edu> wrote:
**> > You want
**> >
**> > by(df[,-1], df$Day, function.that.means.each.column)
**>
**> OK, slowly :-) I don't understand it.
**>
**> - why df[,-1] and not df? don't we loose the df$Day entries?
*

You don't get them as a column but you get them as the component labels.

by(df, df$Day, function(x) colMeans(x[,-1]))

If you convert it to a data frame you get them as the rownames:

do.call("rbind", by(df, df$Day, function(x) colMeans(x[,-1])))

*>
*

> (by the way, why does typeof(df) show "list"? I thought that

*> read.table() returns a data frame?)
*

I think you want class(df) which shows its a data frame.

*>
*

> > so all you need to do is write function.that.means.each.column()

*> > In this case there is a built-in function, colMeans, so you don't even
**> > have to write it.
**>
**> Hmmmmm, I tried it and it did not work. That is, it works - but not as
**> intended :-).
**>
**> Fake example:
**>
**> > df <- data.frame(Day=c("Tue","Tue","Tue", "Wed", "Wed"), val1=seq(1,5), val2=3*seq(1,5))
**> > df
**> Day val1 val2
**> 1 Tue 1 3
**> 2 Tue 2 6
**> 3 Tue 3 9
**> 4 Wed 4 12
**> 5 Wed 5 15
**> > ddf <- by(df[,-1], df$Day, colMeans)
**> > ddf
**> df$Day: Tue
**> val1 val2
**> 2 6
**> ------------------------------------------------------------
**> df$Day: Wed
**> val1 val2
**> 4.5 13.5
**> > ddf$Day
**> NULL
**> > ddf$val1
**> NULL
**>
**> In real data, instead of "days", I have around 6000 items, so I need
**> them to be in one column called "Days" (or whatever). OK. So correct
**> me if I understand wrongly what is happening here:
**>
**> by() divides df in data frame subsets and applies a function
**> (colMeans) to each of them. The result of colMeans ... manual says
**> that colMeans returns the following:
**>
**> A numeric or complex array of suitable size, or a vector if the
**> result is one-dimensional. The 'dimnames' (or 'names' for a
**> vector result) are taken from the original array.
**>
**> ...which doesn't tell me much. typeof(colMeans(...)) tells me
**> "double" but I think it lies. OK, lets assume it is a vector (should
**> be, I assume the result is one-dimensional, as I can hardly imagine a
**> multidimensional result).
**>
**> So in the end I have a list with as many columns as I have days, and
**> in each column I have a vector with N named dimensions, where N is the
**> numbers of variables in the original data frame bar one. But what I
**> would like to have is a data frame with exactly the same column names,
**> and rows being just a summary. And no clue how to convert one in the
**> other :-)
**>
**> > More generally (eg the approach would work for medians as well)
**> >
**> > by(df[,1], df$Day, function(today) apply(today, 2, mean))
**>
**> Huh? why is it df[,1] now? I think I'm completly lost.
*

df[,1] and df$Day both refer to the same first column.

*>
*

> > Finally, you could just use aggregate().

*>
**> Probably, yes. As soon as I figure out how to use it, that is :-) (an
*

aggregate(df[,-1], df[,1,drop = FALSE], mean)

or

aggregate(df[,-1], list(Day = df$Day), mean)

The second arg of aggregate must be a list which is why we used drop = FALSE in the first instance and an explicit list in the second.

Another alternative is to use summaryBy from the doBy package found at http://genetics.agrsci.dk/~sorenh/misc/ :

library(doBy)

summaryBy(cbind(var1, var2) ~ Day, data = df)

> hour later: OK, I got it! yuppie!) However what I really needed was

*> smth like this:
**>
**> ddf <- by(df[,-1], df$Day, function(z) { return(cor(z$val1,z$val2)) ; } )
**>
**> (but I still don't know how to convert it to a friendly data frame...)
**>
*

do.call("rbind", ddf)

> Thanks for the answers!

*>
**> January
**>
**> --
**> ------------ January Weiner 3 ---------------------+---------------
**> Division of Bioinformatics, University of Muenster | Schloßplatz 4
**> (+49)(251)8321634 | D48149 Münster
**> http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Dec 16 00:29:39 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:41:39 EST
*