From: Duncan Murdoch <murdoch_at_stats.uwo.ca>

Date: Sat 16 Dec 2006 - 17:45:40 GMT

*>
*

*>
*

> On 12/16/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:

*>
*

*> ______________________________________________
*

> R-devel@r-project.org mailing list

> https://stat.ethz.ch/mailman/listinfo/r-devel

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Dec 17 07:01:13 2006

Date: Sat 16 Dec 2006 - 17:45:40 GMT

On 12/16/2006 12:25 PM, John Zedlewski wrote:

*> Duncan--
**> Ah, good point, clearly setting the tolerance to 0 is bad in that
**> case. Also, my code has another problem when the max is negative -- it
**> will set a negative tolerance! One good fix for both problems is the
**> following: set the initial value of "large" to the first value in the
**> row instead of 0.0, then keep the "fmax2(a,large)" approach from my
**> patch, but at the end, take the absolute value of large. That will
**> always follow the current docs -- using the largest value, not the
**> largest absolute value in the row, for comparison.
**>
**> You proposal of changing the docs and just fixing the infinite
**> problem sounds like a reasonable approach too, especially if people
**> are already depending on this behavior (??), although I still think
*

> it's s a little weird that comparing "(-1e10, 2, 3)" will say that 2

> or 3 could be the max.

I think the explanation for this is in the man page: "In this case, the determination of "tie" assumes that the entries are probabilities..." If you transform that vector linearly to force it into the [0,1] range of probabilities, the behaviour is clear:

> x <- c(-1e10, 2, 3)

> (x - min(x))/diff(range(x))

[1] 0 1 1

Duncan Murdoch

*> Thanks,
***> --JRZ
**

>> I think I'd prefer to be clear about what "largest" means in the docs >> rather than dropping the absolute value, because if all entries are >> negative, your version sets the tolerance to 0.

> On 12/16/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:

>> On 12/15/2006 7:09 PM, John Zedlewski wrote: >>> I've noticed that the max.col function with the default "random" >>> option often gives unexpected results. For instance, in this test, it >>> seems clear what the answer should be: >>> >>>> # second col should always be max >>>> x1 = cbind(1:10, 2:11, -Inf) >>>> >>>> # this works fine >>>> max.col(x1, "first") >>> [1] 2 2 2 2 2 2 2 2 2 2 >>>> # this gives random answers >>>> max.col(x1) >>>> [1] 3 1 1 2 3 3 1 3 1 1 >>> Ouch! max.col is randomizing across all values. >>> Even without infinite values, something similar can happen: >>> >>>> # test 2 --- tolerance problems >>>> >>>> # clearly column 3 is the max >>>> x1 = cbind(-1e9 * 1:10, 1:10, 2:11) >>>> >>>> # again, first method works: >>>> max.col(x1, "first") >>> [1] 3 3 3 3 3 3 3 3 3 3 >>>> # but random doesn't >>>> max.col(x1) >>> [1] 2 3 2 3 3 2 2 2 3 2 >>> >>> The max.col docs say " there is a relative tolerance of 1e-5, relative >>> to the largest entry in the row", but it's really using the maximum >>> absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0). >>> Is this necessary for some sort of S-plus compatibility? If so, I >>> think it would be good to make this absolute value property very clear >>> in the docs, since it can cause subtle bugs (and did for me). >>> >>> Personally, I think the behavior is much nicer with the following patch: >>> >>> --- rplain/R-2.4.0/src/appl/maxcol.c 2006-04-09 18:19:58.000000000 -0400 >>> +++ R-2.4.0/src/appl/maxcol.c 2006-12-14 15:30:56.000000000 -0500 >>> @@ -26,13 +26,14 @@ >>> do_rand = *ties_meth == 1; >>> >>> for (r = 0; r < n_r; r++) { >>> - /* first check row for any NAs and find the largest abs(entry) */ >>> + /* first check row for any NAs and find the largest entry */ >>> large = 0.0; >>> isna = FALSE; >>> for (c = 0; c < *nc; c++) { >>> a = matrix[r + c * n_r]; >>> if (ISNAN(a)) { isna = TRUE; break; } >>> - if (do_rand) large = fmax2(large, fabs(a)); >>> + if (!R_FINITE(a)) continue; >>> + if (do_rand) large = fmax2(large, a); >>> } >>> if (isna) { maxes[r] = NA_INTEGER; continue; } >>> >>> ---------------- END ---------------------- >>> >>> This gives the expected behavior in the two examples above. >> I think I'd prefer to be clear about what "largest" means in the docs >> rather than dropping the absolute value, because if all entries are >> negative, your version sets the tolerance to 0. >> >> But ignoring infinite values when calculating the largest absolute value >> might be a good idea: I can't see why someone would want an infinite >> tolerance. >> >> So I'd propose to add your R_FINITE check, and make this change to the docs: >> >> " there is a relative tolerance of 1e-5, relative >> to the largest finite absolute value in the row" >> >> The copyright notice in the file says this function is from MASS/MASS.c >> by W. N. Venables and B. D. Ripley, so I'd like to hear from at least >> one of them before making a change. >> >> Duncan Murdoch >> >>> (Sorry to crosspost to both this list and R-help, but it was suggested >>> that R-devel would be a more appropriate forum for this.) >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>

> R-devel@r-project.org mailing list

> https://stat.ethz.ch/mailman/listinfo/r-devel

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Dec 17 07:01:13 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sun 17 Dec 2006 - 00:31:03 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*