From: John Zedlewski <jzedlewski_at_hbs.edu>

Date: Sat 16 Dec 2006 - 17:25:14 GMT

*> I think I'd prefer to be clear about what "largest" means in the docs
*

> rather than dropping the absolute value, because if all entries are

> negative, your version sets the tolerance to 0.

> But ignoring infinite values when calculating the largest absolute value

*> might be a good idea: I can't see why someone would want an infinite
*

*> tolerance.
*

> So I'd propose to add your R_FINITE check, and make this change to the docs:

> " there is a relative tolerance of 1e-5, relative

> to the largest finite absolute value in the row"

> The copyright notice in the file says this function is from MASS/MASS.c

*> by W. N. Venables and B. D. Ripley, so I'd like to hear from at least
*

*> one of them before making a change.
*

> Duncan Murdoch

*> > R-devel@r-project.org mailing list
*

*> > https://stat.ethz.ch/mailman/listinfo/r-devel
*

>

>

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Dec 17 13:07:29 2006

Date: Sat 16 Dec 2006 - 17:25:14 GMT

Duncan--

Ah, good point, clearly setting the tolerance to 0 is bad in that
case. Also, my code has another problem when the max is negative -- it
will set a negative tolerance! One good fix for both problems is the
following: set the initial value of "large" to the first value in the
row instead of 0.0, then keep the "fmax2(a,large)" approach from my
patch, but at the end, take the absolute value of large. That will
always follow the current docs -- using the largest value, not the
largest absolute value in the row, for comparison.

You proposal of changing the docs and just fixing the infinite
problem sounds like a reasonable approach too, especially if people
are already depending on this behavior (??), although I still think
it's s a little weird that comparing "(-1e10, 2, 3)" will say that 2
or 3 could be the max.

Thanks,

**--JRZ
**

*> I think I'd prefer to be clear about what "largest" means in the docs
*

> rather than dropping the absolute value, because if all entries are

> negative, your version sets the tolerance to 0.

On 12/16/06, Duncan Murdoch <murdoch@stats.uwo.ca> wrote:

> On 12/15/2006 7:09 PM, John Zedlewski wrote:

*> > I've noticed that the max.col function with the default "random"
**> > option often gives unexpected results. For instance, in this test, it
**> > seems clear what the answer should be:
**> >
**> >> # second col should always be max
**> >> x1 = cbind(1:10, 2:11, -Inf)
**> >>
**> >> # this works fine
**> >> max.col(x1, "first")
**> > [1] 2 2 2 2 2 2 2 2 2 2
**> >> # this gives random answers
**> >> max.col(x1)
**> >> [1] 3 1 1 2 3 3 1 3 1 1
**> >
**> > Ouch! max.col is randomizing across all values.
**> > Even without infinite values, something similar can happen:
**> >
**> >> # test 2 --- tolerance problems
**> >>
**> >> # clearly column 3 is the max
**> >> x1 = cbind(-1e9 * 1:10, 1:10, 2:11)
**> >>
**> >> # again, first method works:
**> >> max.col(x1, "first")
**> > [1] 3 3 3 3 3 3 3 3 3 3
**> >> # but random doesn't
**> >> max.col(x1)
**> > [1] 2 3 2 3 3 2 2 2 3 2
**> >
**> > The max.col docs say " there is a relative tolerance of 1e-5, relative
**> > to the largest entry in the row", but it's really using the maximum
**> > absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0).
**> > Is this necessary for some sort of S-plus compatibility? If so, I
**> > think it would be good to make this absolute value property very clear
**> > in the docs, since it can cause subtle bugs (and did for me).
**> >
**> > Personally, I think the behavior is much nicer with the following patch:
**> >
**> > --- rplain/R-2.4.0/src/appl/maxcol.c 2006-04-09 18:19:58.000000000 -0400
**> > +++ R-2.4.0/src/appl/maxcol.c 2006-12-14 15:30:56.000000000 -0500
**> > @@ -26,13 +26,14 @@
**> > do_rand = *ties_meth == 1;
**> >
**> > for (r = 0; r < n_r; r++) {
**> > - /* first check row for any NAs and find the largest abs(entry) */
**> > + /* first check row for any NAs and find the largest entry */
**> > large = 0.0;
**> > isna = FALSE;
**> > for (c = 0; c < *nc; c++) {
**> > a = matrix[r + c * n_r];
**> > if (ISNAN(a)) { isna = TRUE; break; }
**> > - if (do_rand) large = fmax2(large, fabs(a));
**> > + if (!R_FINITE(a)) continue;
**> > + if (do_rand) large = fmax2(large, a);
**> > }
**> > if (isna) { maxes[r] = NA_INTEGER; continue; }
**> >
**> > ---------------- END ----------------------
**> >
**> > This gives the expected behavior in the two examples above.
*

>

> rather than dropping the absolute value, because if all entries are

> negative, your version sets the tolerance to 0.

>

> But ignoring infinite values when calculating the largest absolute value

>

> So I'd propose to add your R_FINITE check, and make this change to the docs:

>

> " there is a relative tolerance of 1e-5, relative

> to the largest finite absolute value in the row"

>

> The copyright notice in the file says this function is from MASS/MASS.c

>

> Duncan Murdoch

>

> >

> > (Sorry to crosspost to both this list and R-help, but it was suggested

> > that R-devel would be a more appropriate forum for this.)> >> > ______________________________________________

>

>

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Dec 17 13:07:29 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sun 17 Dec 2006 - 02:31:01 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*