Re: [R] Subset by Factor by date

From: Charilaos Skiadas <cskiadas_at_gmail.com>
Date: Sat, 14 Jun 2008 01:46:24 -0400

On Jun 14, 2008, at 1:25 AM, T.D.Rudolph wrote:

>
> aggregate() is indeed a useful function in this case, but it only
> returns the
> columns by which it was grouped. Is there a way I can use this while
> simultaneously retaining all the other column values in the dataframe?
>
> e.g. add superfluous (yet pertinent for later) column containing any
> information at all and retain it in the final output

I had exactly this kind of need many times, and I have finally created a function for it, which I hope to include soon in an upcoming package. Here is a run of it (I added an extra "A" column containing just the numbers 1:8):

 > DF
   id day diff A

1  1 01-01-09  0.5 1
2  1 01-01-09  0.7 2
3  2 01-01-09  0.2 3
4  2 01-01-09  0.4 4
5  1 01-02-09  0.1 5
6  1 01-02-09  0.3 6
7  2 01-02-09  0.3 7
8  2 01-02-09  0.4 8
 > byDataFrame(DF, list(id, day), function(x) x[which.min(x$diff),])
   diff A id      day
1  0.5 1  1 01-01-09
2  0.2 3  2 01-01-09

3 0.1 5 1 01-02-09
4 0.3 7 2 01-02-09

Would that do what you want?

I've appended the function byDataFrame, and its prerequisite, a function parseIndexList. I'm not quite set on the names yet, but anyway. Hope this helps. I haven't really tested it on large sets, it might perform poorly. Any suggestions on speeding the code / corrections are welcome.

Haris Skiadas
Department of Mathematics and Computer Science Hanover College

parseIndexList <- function(indexList) {

   # browser()
   if (!is.list(indexList))
     indexList <- as.list(indexList)
   nI <- length(indexList)
   namelist <- vector("list", nI)
   names(namelist) <- names(indexList)
   extent <- integer(nI)
   nx <- length(indexList[[1]])
   one <- as.integer(1)
   group <- rep.int(one, nx)
   ngroup <- one
   for (i in seq.int(indexList)) {

       index <- as.factor(indexList[[i]])
       if (length(index) != nx)
           stop("arguments must have same length")
       namelist[[i]] <- sort(unique(indexList[[i]]))
       extent[i] <- length(namelist[[i]])
       group <- group + ngroup * (as.integer(index) - one)
       ngroup <- ngroup * nlevels(index)
   }
   nms <- do.call(expand.grid, namelist)
   ind <- unique(sort(group))
   res <- data.frame(index=ind, nms[ind, , drop=FALSE])
   return(list(cases=group, groups=res)) }

byDataFrame <- function (data, INDEX, FUN, newnames, omit.index.cols=TRUE, ...) {
# # Part of the code shamelessly stolen from tapply

   IND <- eval(substitute(INDEX), data)
   nms <- as.character(as.list(substitute(INDEX)))    if (!is.list(IND)) {

     IND <- list(IND)
     names(IND) <- nms
   } else {
     names(IND) <- nms[-1]

   }
   funname <- paste(as.character(substitute(FUN)), collapse=".")    indexInfo <- parseIndexList(IND)
   FUNx <- if (omit.index.cols) {
     omit.cols <- match(names(indexInfo$groups)[-1], names(data))
     function(x, ...) FUN(data[x, -omit.cols], ...)
   } else {
     function(x, ...) FUN(data[x, ], ...)
   }
   ans <- lapply(split(1:nrow(data), indexInfo$cases), FUNx, ...)    index <- as.numeric(names(ans))
   if (!is.data.frame(ans[[1]])) {
     ans <- lapply(ans, function(x) {
       dframe <- as.data.frame(t(x))
       if (is.null(names(x)))
         names(dframe) <- funname
       dframe
     })

   }
   lengths <- sapply(ans, nrow)
   ans <- do.call(rbind, ans)
   if (!missing(newnames))
     names(ans) <- newnames
   nms <- indexInfo$groups[rep(index, lengths),-1, drop=FALSE]    res <- cbind(ans, nms)
   res
}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 05:53:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 08:30:46 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive