From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Thu, 17 Apr 2008 11:54:11 +0100 (BST)

Date: Thu, 17 Apr 2008 11:54:11 +0100 (BST)

On Thu, 17 Apr 2008, Alex Brown wrote:

> Adding a simplify argument to by would suit me fine.

*>
**> In my (limited) experience in using R, the automatic simplification that R
**> does in various situations is one of it's most troublesome features. It
**> means that I cannot expect a program to work even if I give it data of the
**> same types as I always have before; any time a dimension is reduced to 1 bad
**> things happen.
**>
**> Is there a master switch I can set so dropping never happens automatically?
*

Nop, and you would break a lot of code by such a switch. Which is why we are very much against having global options.

> Can you please have an option that by reads so I can indicate that by should

*> never drop/simplify?
*

No, as it will break lots of other people's code. You can have your own version, and then namespaces will protect other code from your changes.

*>
*

> -Alex

*>
**> On 17 Apr 2008, at 07:03, Prof Brian Ripley wrote:
**>
**>> Unfortunately your proposed change changes the type of the output:
**>> simplification is intended in many applications of by().
**>>
**>> Before:
**>>
**>>> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
**>> by [, 1] 1.21e+09
**>> - attr(*, "dimnames")=List of 1
**>> ..$ mytimes$set[1]: chr "1"
**>> - attr(*, "call")= language by.default(data = mytimes$date[1], INDICES =
**>> mytimes$set[1], FUN = function(x) x)
**>>
**>> After:
**>>
**>>> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
**>> List of 1
**>> $ 1: POSIXct[1:1], format: "2008-04-17 06:53:31"
**>> - attr(*, "dim")= int 1
**>> - attr(*, "dimnames")=List of 1
**>> ..$ mytimes$set[1]: chr "1"
**>> - attr(*, "call")= language by.default(data = mytimes$date[1], INDICES =
**>> mytimes$set[1], FUN = function(x) x)
**>> - attr(*, "class")= chr "by"
**>>
**>> c() does not do the same thing as unlist() in general, and it is untrue
**>> that 'c does not strip class'. What happens in your example is that there
**>> is a c() method for your class (and not many others).
**>>
**>> What we could is to add a 'simplify' argument to by() so you can control
**>> the simplification.
**>>
**>>
**>> On Tue, 15 Apr 2008, Alex Brown wrote:
**>>
**>>> summary:
**>>>
**>>> The function 'by' inconsistently strips class from the data to which
**>>> it is applied.
**>>>
**>>> quick reason:
**>>>
**>>> tapply strips class when simplify is set to TRUE (the default) due to
**>>> the class stripping behaviour of unlist.
**>>>
**>>> quick answer:
**>>>
**>>> This can be fixed by invoking tapply with simplify=FALSE, or changing
**>>> tapply to use do.call(c instead of unlist
**>>>
**>>> executable example:
**>>>
**>>> mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2))
**>>>
**>>> by(mytimes$date, mytimes$set, function(x)x)
**>>>
**>>> INDICES: 1
**>>> [1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST"
**>>> ----------------------------------------------------------------------------------------
**>>> INDICES: 2
**>>> [1] "2008-04-15 11:41:40 BST"
**>>>
**>>> by(mytimes[1,]$date, mytimes[1,]$set, function(x)x)
**>>>
**>>> INDICES: 1
**>>> [1] 1208256099
**>>>
**>>> why this is a problem:
**>>>
**>>> This is a problem when you are feeding the output of this by into a
**>>> function which expects the class to be maintained. I see this problem
**>>> when constructing
**>>>
**>>> reason:
**>>>
**>>> tapply strips class when simplify is set to TRUE (the default) due to
**>>> the behaviour of unlist:
**>>>
**>>> "Where possible the list elements are coerced to a common mode during
**>>> the unlisting, and so the result often ends up as a character vector.
**>>> Vectors will be coerced to the highest type of the components in the
**>>> hierarchy NULL < raw < logical < integer < real < complex < character
**>>> < list < expression: pairlists are treated as lists."
**>>>
**>>> solution:
**>>>
**>>> This problem can be fixed in the function by.data.frame by modifying
**>>> the call to tapply in the function "by":
**>>>
**>>> by.data.frame = function (data, INDICES, FUN, ...)
**>>> {
**>>> if (!is.list(INDICES)) {
**>>> IND <- vector("list", 1)
**>>> IND[[1]] <- INDICES
**>>> names(IND) <- deparse(substitute(INDICES))[1]
**>>> }
**>>> else IND <- INDICES
**>>> FUNx <- function(x) FUN(data[x, ], ...)
**>>> nd <- nrow(data)
**>>> <<<<
**>>> ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data)
**>>> ====
**>>> ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)),
**>>> data)
**>>>>>>>
**>>> attr(ans, "call") <- match.call()
**>>> class(ans) <- "by"
**>>> ans
**>>> }
**>>>
**>>> alternative solution:
**>>>
**>>> the call in tapply to unlist(ans, recursive=F) can be replaced by
**>>> do.call(c,ans, recursive=F) to fix this issue, since c does not strip
**>>> class.
**>>>
**>>> However, I haven't taken the time to work out if this will work in all
**>>> cases.
**>>>
**>>> for example:
**>>>
**>>> function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
**>>> {
**>>> FUN <- if (!is.null(FUN))
**>>> match.fun(FUN)
**>>> if (!is.list(INDEX))
**>>> INDEX <- list(INDEX)
**>>> nI <- length(INDEX)
**>>> namelist <- vector("list", nI)
**>>> names(namelist) <- names(INDEX)
**>>> extent <- integer(nI)
**>>> nx <- length(X)
**>>> one <- 1L
**>>> group <- rep.int(one, nx)
**>>> ngroup <- one
**>>> for (i in seq.int(INDEX)) {
**>>> index <- as.factor(INDEX[[i]])
**>>> if (length(index) != nx)
**>>> stop("arguments must have same length")
**>>> namelist[[i]] <- levels(index)
**>>> extent[i] <- nlevels(index)
**>>> group <- group + ngroup * (as.integer(index) - one)
**>>> ngroup <- ngroup * nlevels(index)
**>>> }
**>>> if (is.null(FUN))
**>>> return(group)
**>>> ans <- lapply(split(X, group), FUN, ...)
**>>> index <- as.integer(names(ans))
**>>> if (simplify && all(unlist(lapply(ans, length)) == 1)) {
**>>> ansmat <- array(dim = extent, dimnames = namelist)
**>>> <<<<
**>>> ans <- unlist(ans, recursive = FALSE)
**>>> ====
**>>> ans <- do.call(c, ans, recursive = FALSE)
**>>>>>>>
**>>> }
**>>> else {
**>>> ansmat <- array(vector("list", prod(extent)), dim = extent,
**>>> dimnames = namelist)
**>>> }
**>>> if (length(index)) {
**>>> names(ans) <- NULL
**>>> ansmat[index] <- ans
**>>> }
**>>> ansmat
**>>> }
**>>>
**>>> Alexander Brown
**>>> Principal Engineer
**>>> Transitive
**>>> Maybrook House, 40 Blackfriars Street, Manchester M3 2EG
**>>> Phone: +44 (0)161 836 2321 Fax: +44 (0)161 836 2399 Mobile: +44
**>>> (0)7980 708 221
**>>> www.transitive.com
**>>> * The leader in cross-platform virtualization
**>>>
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide
**>>> http://www.R-project.org/posting-guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>>>
**>>
**>> --
**>> Brian D. Ripley, ripley_at_stats.ox.ac.uk
**>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
**>> University of Oxford, Tel: +44 1865 272861 (self)
**>> 1 South Parks Road, +44 1865 272866 (PA)
**>> Oxford OX1 3TG, UK Fax: +44 1865 272595
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu 17 Apr 2008 - 11:45:43 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 17 Apr 2008 - 12:30:30 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*