[R] by inconsistently strips class - with fix

From: Alex Brown <fishtank_at_compsoc.man.ac.uk>
Date: Tue, 15 Apr 2008 12:28:15 +0100


summary:

The function 'by' inconsistently strips class from the data to which it is applied.

quick reason:

tapply strips class when simplify is set to TRUE (the default) due to the class stripping behaviour of unlist.

quick answer:

This can be fixed by invoking tapply with simplify=FALSE, or changing tapply to use do.call(c instead of unlist

executable example:

mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2))

by(mytimes$date, mytimes$set, function(x)x)

INDICES: 1
[1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST"



INDICES: 2
[1] "2008-04-15 11:41:40 BST" by(mytimes[1,]$date, mytimes[1,]$set, function(x)x)

INDICES: 1
[1] 1208256099

why this is a problem:

This is a problem when you are feeding the output of this by into a function which expects the class to be maintained. I see this problem when constructing

reason:

tapply strips class when simplify is set to TRUE (the default) due to the behaviour of unlist:

"Where possible the list elements are coerced to a common mode during the unlisting, and so the result often ends up as a character vector. Vectors will be coerced to the highest type of the components in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression: pairlists are treated as lists."

solution:

This problem can be fixed in the function by.data.frame by modifying the call to tapply in the function "by":

by.data.frame = function (data, INDICES, FUN, ...) {

   if (!is.list(INDICES)) {

       IND <- vector("list", 1)
       IND[[1]] <- INDICES
       names(IND) <- deparse(substitute(INDICES))[1]

}

   else IND <- INDICES
   FUNx <- function(x) FUN(data[x, ], ...)    nd <- nrow(data)
<<<<

   ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data)


   ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)), data)
 >>>>
   attr(ans, "call") <- match.call()
   class(ans) <- "by"
   ans
}

alternative solution:

the call in tapply to unlist(ans, recursive=F) can be replaced by do.call(c,ans, recursive=F) to fix this issue, since c does not strip class.

However, I haven't taken the time to work out if this will work in all cases.

for example:

function (X, INDEX, FUN = NULL, ..., simplify = TRUE) {

   FUN <- if (!is.null(FUN))

       match.fun(FUN)
   if (!is.list(INDEX))

       INDEX <- list(INDEX)
   nI <- length(INDEX)
   namelist <- vector("list", nI)
   names(namelist) <- names(INDEX)
   extent <- integer(nI)
   nx <- length(X)
   one <- 1L
   group <- rep.int(one, nx)
   ngroup <- one
   for (i in seq.int(INDEX)) {

       index <- as.factor(INDEX[[i]])
       if (length(index) != nx)
           stop("arguments must have same length")
       namelist[[i]] <- levels(index)
       extent[i] <- nlevels(index)
       group <- group + ngroup * (as.integer(index) - one)
       ngroup <- ngroup * nlevels(index)

}

   if (is.null(FUN))

       return(group)

   ans <- lapply(split(X, group), FUN, ...)
   index <- as.integer(names(ans))
   if (simplify && all(unlist(lapply(ans, length)) == 1)) {
       ansmat <- array(dim = extent, dimnames = namelist)
<<<<
       ans <- unlist(ans, recursive = FALSE)
====
	ans <- do.call(c, ans, recursive = FALSE)
 >>>>
}

   else {
       ansmat <- array(vector("list", prod(extent)), dim = extent,
           dimnames = namelist)

}

   if (length(index)) {
       names(ans) <- NULL
       ansmat[index] <- ans

}

   ansmat
}

Alexander Brown
Principal Engineer
Transitive
Maybrook House, 40 Blackfriars Street, Manchester M3 2EG Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44 (0)7980 708 221
www.transitive.com
* The leader in cross-platform virtualization



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Apr 2008 - 11:32:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 08:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive