Re: [Rd] Suggestion to extend aggregate() to return multiple and/or named values

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri, 13 Jul 2007 13:56:35 -0400

Note that it does not work in this case:

> aggregate(CO2[4:5], CO2[1:2], mean)

   Plant        Type conc   uptake
1    Qn1      Quebec  435 33.22857
2    Qn2      Quebec  435 35.15714
3    Qn3      Quebec  435 37.61429
4    Qc1      Quebec  435 29.97143
5    Qc3      Quebec  435 32.58571
6    Qc2      Quebec  435 32.70000
7    Mn3 Mississippi  435 24.11429
8    Mn2 Mississippi  435 27.34286
9    Mn1 Mississippi  435 26.40000
10   Mc2 Mississippi  435 12.14286

11 Mc3 Mississippi 435 17.30000
12 Mc1 Mississippi 435 18.00000
> > agg(CO2[4:5], CO2[1:2], mean)

Error: syntax error, unexpected GT in ">"
> Error in `[[<-.data.frame`(`*tmp*`, i, value = c(1L, 2L, 3L, 4L, 6L, 5L, :
Error: syntax error, unexpected IN, expecting '\n' or ';' in "Error in"
> replacement has 7056 rows, data has 84
Error: syntax error, unexpected SYMBOL, expecting '\n' or ';' in "

   replacement has"
>

On 7/13/07, Mike Lawrence <Mike.Lawrence_at_dal.ca> wrote:
> bugfix already :P prior version fails when there is only one factor
> in Ind. This version also might be faster as I avoid using aggregate
> to create the dummy frame.
>
> agg=function(z,Ind,FUN,...){
> FUN.out=by(z,Ind,FUN,...)
> num.cells=length(FUN.out)
> num.values=length(FUN.out[[1]])
>
> for(i in 1:length(Ind)){
> Ind[[i]]=unique(Ind[[i]])
> }
> temp=expand.grid(Ind)
>
> for(i in 1:num.values){
> temp$new=NA
> n=names(FUN.out[[1]])[i]
> names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste
> ('x',i,sep='')))
> for(j in 1:num.cells){
> temp[j,length(temp)]=FUN.out[[j]][i]
> }
> }
> return(temp)
> }
>
>
> On 13-Jul-07, at 1:29 PM, Mike Lawrence wrote:
>
> > Hi all,
> >
> > This is my first post to the developers list. As I understand it,
> > aggregate() currently repeats a function across cells in a
> > dataframe but is only able to handle functions with single value
> > returns. Aggregate() also lacks the ability to retain the names
> > given to the returned value. I've created an agg() function (pasted
> > below) that is apparently backwards compatible (i.e. returns
> > identical results as aggregate() if the function returns a single
> > unnamed value), but is able to handle named and/or multiple return
> > values. The code may be a little inefficient (there must be an
> > easier way to set up the 'temp' data frame than to call aggregate
> > and remove the final column), but I'm suggesting that something
> > similar to this may be profitably used to replace aggregate entirely.
> >
> > #modified aggregate command, allowing for multiple/named output values
> > agg=function(z,Ind,FUN,...){
> > FUN.out=by(z,Ind,FUN,...)
> > num.cells=length(FUN.out)
> > num.dv=length(FUN.out[[1]])
> >
> > temp=aggregate(z,Ind,length) #dummy data frame
> > temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy
> > frame
> >
> > for(i in 1:num.dv){
> > temp=cbind(temp,NA)
> > n=names(FUN.out[[1]])[i]
> > names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse
> > (i==1,'x',paste('x',i,sep='')))
> > for(j in 1:num.cells){
> > temp[j,length(temp)]=FUN.out[[j]][i]
> > }
> > }
> > return(temp)
> > }
> >
> > #create some factored data
> > z=rnorm(100) # the DV
> > A=rep(1:2,each=25,2) #one factor
> > B=rep(1:2,each=50) #another factor
> > Ind=list(A=A,B=B) #the factor list
> >
> > aggregate(z,Ind,mean) #show the means of each cell
> > agg(z,Ind,mean) #should be identical to aggregate
> >
> > aggregate(z,Ind,summary) #returns an error
> > agg(z,Ind,summary) #returns named columns
> >
> > #Make a function that returns multiple unnamed values
> > summary2=function(x){
> > s=summary(x)
> > names(s)=NULL
> > return(s)
> > }
> > agg(z,Ind,summary2) #returns multiple columns, default names
> >
> >
> > --
> > Mike Lawrence
> > Graduate Student, Department of Psychology, Dalhousie University
> >
> > Website: http://memetic.ca
> >
> > Public calendar: http://icalx.com/public/informavore/Public
> >
> > "The road to wisdom? Well, it's plain and simple to express:
> > Err and err and err again, but less and less and less."
> > - Piet Hein
> >
> >
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
> - Piet Hein
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 13 Jul 2007 - 18:00:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jul 2007 - 20:38:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.