Re: [Rd] sweep sanity checking?

From: Petr Savicky <savicky_at_cs.cas.cz>
Date: Fri, 27 Jul 2007 09:10:06 +0200

When I was preparing the patch of sweep submitted on July 25, I was unaware of the code by Heather Turner. She suggested a very elegant solution, if STATS is a vector and we want to use meaningful recycling in full generality. I would like to suggest a combined solution, which uses Heather Turner's algorithm if check.margin=FALSE (default) and STATS is a vector and my previous algorithm, if check.margin=TRUE or STATS is an array. The suggestion is

# combined from the original code of sweep without warnings and from
# https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin Hankin
# https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather Turner
# https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben Bolker
# with some further modifications by Petr Savicky
  sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...)   {

      FUN <- match.fun(FUN)
      dims <- dim(x)
      dimmargin <- dims[MARGIN]
      if (is.null(dim(STATS))) {
          dimstats <- length(STATS)
      } else {
          dimstats <- dim(STATS)
          check.margin <- TRUE
      }
      s <- length(STATS)
      if (s > prod(dimmargin)) {
          warning("length of STATS greater than the extent of dim(x)[MARGIN]")
      } else if (check.margin) {
          dimmargin <- dimmargin[dimmargin > 1]
          dimstats <- dimstats[dimstats > 1]
          if (length(dimstats) > length(dimmargin) ||
              any(dimstats != dimmargin[seq(along.with=dimstats)]))
              warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]")
      } else {
          cumDim <- c(1, cumprod(dimmargin))
          upper <- min(cumDim[cumDim >= s])
          lower <- max(cumDim[cumDim <= s])
          if (upper %% s != 0 || s %% lower != 0)
              warning("STATS does not recycle exactly across MARGIN")
      }
      perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
      FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
  }

Heather presented four examples testing her code:

  sweep(array(1:24, dim = c(4,3,2)), 1, 1:2)    # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:12)   # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:24)   # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)  # warning
The second and third example are not really correct, since STATS extends also to dimensions not included in MARGIN. The problem is better visible for example in
  sweep(array(1:24, dim = c(4,4,3,3,2,2)), c(1,3), 1:12) where MARGIN clearly has to contain two dimensions explicitly. So, I use the examples with a larger margin corresponding to STATS as follows
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:2)    # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:12) # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:3, 1:24) # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)  # warning
The current proposal for sweep indeed gives no warning in the first three examples and gives a warning in the last one.

I did not use the suggestion to call the option warn with default warn = getOption("warn"). The reason is that there are two different decisions:
 (1) whether to generate a warning
 (2) what to do with the warning, if it is generated. The warn option influences (2): the warning may be suppressed, printed after the return to the top level, printed immediately or it may be converted to an error. I think that the option controling (2) should not be mixed with an option which controls (1). If R has an option controling to which extent recycling is allowed, then this could be used, but warn has a different purpose.

I appreciate feedback.

Petr.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 27 Jul 2007 - 07:13:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 09 Aug 2007 - 03:38:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.