Re: [R] by inconsistently strips class - with fix

From: Alex Brown <fishtank_at_compsoc.man.ac.uk>
Date: Thu, 17 Apr 2008 11:49:18 +0100

Adding a simplify argument to by would suit me fine.

In my (limited) experience in using R, the automatic simplification that R does in various situations is one of it's most troublesome features. It means that I cannot expect a program to work even if I give it data of the same types as I always have before; any time a dimension is reduced to 1 bad things happen.

Is there a master switch I can set so dropping never happens automatically?

Can you please have an option that by reads so I can indicate that by should never drop/simplify?

-Alex

On 17 Apr 2008, at 07:03, Prof Brian Ripley wrote:

> Unfortunately your proposed change changes the type of the output:
> simplification is intended in many applications of by().
>
> Before:
>
>> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
> by [, 1] 1.21e+09
> - attr(*, "dimnames")=List of 1
> ..$ mytimes$set[1]: chr "1"
> - attr(*, "call")= language by.default(data = mytimes$date[1],
> INDICES = mytimes$set[1], FUN = function(x) x)
>
> After:
>
>> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
> List of 1
> $ 1: POSIXct[1:1], format: "2008-04-17 06:53:31"
> - attr(*, "dim")= int 1
> - attr(*, "dimnames")=List of 1
> ..$ mytimes$set[1]: chr "1"
> - attr(*, "call")= language by.default(data = mytimes$date[1],
> INDICES = mytimes$set[1], FUN = function(x) x)
> - attr(*, "class")= chr "by"
>
> c() does not do the same thing as unlist() in general, and it is
> untrue that 'c does not strip class'. What happens in your example
> is that there is a c() method for your class (and not many others).
>
> What we could is to add a 'simplify' argument to by() so you can
> control the simplification.
>
>
> On Tue, 15 Apr 2008, Alex Brown wrote:
>
>> summary:
>>
>> The function 'by' inconsistently strips class from the data to which
>> it is applied.
>>
>> quick reason:
>>
>> tapply strips class when simplify is set to TRUE (the default) due to
>> the class stripping behaviour of unlist.
>>
>> quick answer:
>>
>> This can be fixed by invoking tapply with simplify=FALSE, or changing
>> tapply to use do.call(c instead of unlist
>>
>> executable example:
>>
>> mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2))
>>
>> by(mytimes$date, mytimes$set, function(x)x)
>>
>> INDICES: 1
>> [1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST"
>> ----------------------------------------------------------------------------------------
>> INDICES: 2
>> [1] "2008-04-15 11:41:40 BST"
>>
>> by(mytimes[1,]$date, mytimes[1,]$set, function(x)x)
>>
>> INDICES: 1
>> [1] 1208256099
>>
>> why this is a problem:
>>
>> This is a problem when you are feeding the output of this by into a
>> function which expects the class to be maintained. I see this
>> problem
>> when constructing
>>
>> reason:
>>
>> tapply strips class when simplify is set to TRUE (the default) due to
>> the behaviour of unlist:
>>
>> "Where possible the list elements are coerced to a common mode during
>> the unlisting, and so the result often ends up as a character vector.
>> Vectors will be coerced to the highest type of the components in the
>> hierarchy NULL < raw < logical < integer < real < complex < character
>> < list < expression: pairlists are treated as lists."
>>
>> solution:
>>
>> This problem can be fixed in the function by.data.frame by modifying
>> the call to tapply in the function "by":
>>
>> by.data.frame = function (data, INDICES, FUN, ...)
>> {
>> if (!is.list(INDICES)) {
>> IND <- vector("list", 1)
>> IND[[1]] <- INDICES
>> names(IND) <- deparse(substitute(INDICES))[1]
>> }
>> else IND <- INDICES
>> FUNx <- function(x) FUN(data[x, ], ...)
>> nd <- nrow(data)
>> <<<<
>> ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data)
>> ====
>> ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)),
>> data)
>> >>>>
>> attr(ans, "call") <- match.call()
>> class(ans) <- "by"
>> ans
>> }
>>
>> alternative solution:
>>
>> the call in tapply to unlist(ans, recursive=F) can be replaced by
>> do.call(c,ans, recursive=F) to fix this issue, since c does not strip
>> class.
>>
>> However, I haven't taken the time to work out if this will work in
>> all
>> cases.
>>
>> for example:
>>
>> function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
>> {
>> FUN <- if (!is.null(FUN))
>> match.fun(FUN)
>> if (!is.list(INDEX))
>> INDEX <- list(INDEX)
>> nI <- length(INDEX)
>> namelist <- vector("list", nI)
>> names(namelist) <- names(INDEX)
>> extent <- integer(nI)
>> nx <- length(X)
>> one <- 1L
>> group <- rep.int(one, nx)
>> ngroup <- one
>> for (i in seq.int(INDEX)) {
>> index <- as.factor(INDEX[[i]])
>> if (length(index) != nx)
>> stop("arguments must have same length")
>> namelist[[i]] <- levels(index)
>> extent[i] <- nlevels(index)
>> group <- group + ngroup * (as.integer(index) - one)
>> ngroup <- ngroup * nlevels(index)
>> }
>> if (is.null(FUN))
>> return(group)
>> ans <- lapply(split(X, group), FUN, ...)
>> index <- as.integer(names(ans))
>> if (simplify && all(unlist(lapply(ans, length)) == 1)) {
>> ansmat <- array(dim = extent, dimnames = namelist)
>> <<<<
>> ans <- unlist(ans, recursive = FALSE)
>> ====
>> ans <- do.call(c, ans, recursive = FALSE)
>> >>>>
>> }
>> else {
>> ansmat <- array(vector("list", prod(extent)), dim = extent,
>> dimnames = namelist)
>> }
>> if (length(index)) {
>> names(ans) <- NULL
>> ansmat[index] <- ans
>> }
>> ansmat
>> }
>>
>> Alexander Brown
>> Principal Engineer
>> Transitive
>> Maybrook House, 40 Blackfriars Street, Manchester M3 2EG
>> Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44
>> (0)7980 708 221
>> www.transitive.com
>> * The leader in cross-platform virtualization
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 17 Apr 2008 - 11:36:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 12:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive