[Rd] max.col oddity

From: John Zedlewski <jzedlewski_at_hbs.edu>
Date: Sat 16 Dec 2006 - 00:09:42 GMT


I've noticed that the max.col function with the default "random" option often gives unexpected results. For instance, in this test, it seems clear what the answer should be:

> # second col should always be max
> x1 = cbind(1:10, 2:11, -Inf)
>
> # this works fine
> max.col(x1, "first")

 [1] 2 2 2 2 2 2 2 2 2 2
>
> # this gives random answers
> max.col(x1)
> [1] 3 1 1 2 3 3 1 3 1 1

Ouch! max.col is randomizing across all values. Even without infinite values, something similar can happen:

> # test 2 --- tolerance problems
>
> # clearly column 3 is the max
> x1 = cbind(-1e9 * 1:10, 1:10, 2:11)
>
> # again, first method works:
> max.col(x1, "first")

 [1] 3 3 3 3 3 3 3 3 3 3
>
> # but random doesn't
> max.col(x1)

 [1] 2 3 2 3 3 2 2 2 3 2
>

The max.col docs say " there is a relative tolerance of 1e-5, relative to the largest entry in the row", but it's really using the maximum absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0). Is this necessary for some sort of S-plus compatibility? If so, I think it would be good to make this absolute value property very clear in the docs, since it can cause subtle bugs (and did for me).

Personally, I think the behavior is much nicer with the following patch:

    for (r = 0; r < n_r; r++) {

-       /* first check row for any NAs and find the largest abs(entry) */
+       /* first check row for any NAs and find the largest entry */
       large = 0.0;
       isna = FALSE;
       for (c = 0; c < *nc; c++) {
           a = matrix[r + c * n_r];
           if (ISNAN(a)) { isna = TRUE; break; }
-           if (do_rand) large = fmax2(large, fabs(a));
+           if (!R_FINITE(a)) continue;
+           if (do_rand) large = fmax2(large, a);
       }
       if (isna) { maxes[r] = NA_INTEGER; continue; }

---------------- END   ----------------------

This gives the expected behavior in the two examples above.

(Sorry to crosspost to both this list and R-help, but it was suggested that R-devel would be a more appropriate forum for this.)



R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sat Dec 16 17:28:12 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 16 Dec 2006 - 12:31:01 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.