Re: [Rd] sweep sanity checking?

From: Petr Savicky <savicky_at_cs.cas.cz>
Date: Wed, 25 Jul 2007 09:03:21 +0200

I would like to suggest a patch against R-devel-2007-07-24, which modifies function sweep by including a warning, if dim(STATS) is not consistent with dim(x)[MARGIN]. If check.margin=FALSE, the simple test whether prod(dim(x)[MARGIN]) is a multiple of length(STATS) is performed. If check.margin=TRUE, then a more restrictive test is used, but a limited recycling is still allowed without warning. Besides generating a warning in some situations, there is no other change in the behavior of sweep. The patch is:

The patch uses the default check.margin=FALSE, since this is more backward compatible. Changing the default to check.margin=TRUE would also be fine with me and also with Ben Bolker, who told me this in a separate email.

Let me include more comments on the stricter test. If check.margin=TRUE, then the patch tests whether (after deleting possible dimensions with only one level) dim(STATS) is a prefix of dim(x)[MARGIN]. Hence, for example, if dim(x)[MARGIN] = c(k1,k2), the cases
  length(STATS) = 1,

  dim(STATS) = k1,
  dim(STATS) = NULL and length(STATS) = k1,
  dim(STATS) = c(k1,k2)

are accepted without warning. On the other hand, if k1 != k2, then, for example, dim(STATS)= k2, dim(STATS) = c(k2,k1) generate a warning, although the simple divisibility condition

   length(STATS) divides prod(dim(x)[MARGIN]) is satisfied. The warning is generated, since in the last two cases, recycling produces incorrect or at least suspicious result.

In the simplest case, when length(MARGIN)=1 and STATS is a vector, the cases accepted by the stricter test without warning are exactly the following two: length(STATS) = 1, length(STATS) = dim(x)[MARGIN].

I tested the patch using the script
  http://www.cs.cas.cz/~savicky/R-devel/verify_sweep1.R Ben Bolker also tested the patch in his environment.

I appreciate to know the opinion of R core developers on this patch. Thank you in advance.

Petr.



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 25 Jul 2007 - 07:09:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 27 Jul 2007 - 07:38:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.