[Rd] validObject() -> slow down ?! [was "package:Matrix handling ..."]

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Mon 10 Jul 2006 - 10:30:44 GMT

[Diverted from R-help to R-devel]

>>>>> "roger" == roger koenker <roger@ysidro.econ.uiuc.edu> >>>>> on Sun, 9 Jul 2006 12:31:16 -0500 writes:

    >>
    roger> On 7/8/06, Thaden, John J <ThadenJohnJ@uams.edu>     roger> wrote:

>> As there is nothing inherent in either compressed,
>> sparse, format that would prevent recognition and
>> handling of duplicated index pairs, I'm curious why the
>> dgCMatrix class doesn't also add x values in those
>> instances?

    roger> why not multiply them?  or take the larger one, or
    roger> ...?  I would interpret this as a case of user
    roger> negligence -- there is no "natural" default behavior
    roger> for such cases.

    roger> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

>> Your matrix Mc should be flagged as invalid. Martin and
>> I should discuss whether we want to add such a test to
>> the validity method. It is not difficult to add the test
>> but there will be a penalty in that it will slow down all
>> operations on such matrices

hmm, maybe "all operations" is slightly pessimistic. The issue seems to be *when* (under what exact circumstances) the 'validity' method for a class will be called, i.e., when the equivalent of validObject(<obj>) should be called automatically.

We (those from R-core present) discussed this question a bit last summer in Seattle, and we had a proposal by Robert Gentleman, that this should both be better defined and documented and also slightly changed -- such that validObject() is called less frequently.

IIRC, one consequence of that is the 'complete = FALSE' default that validObject() has got in the mean time. But I don't know about the other issue, of ensuring (or not) that validObject() is not called too often.

I wonder if we should consider a new optional argument to new(..) [ well actuallly, initialize() ] :

the default new(....., .check.validity = TRUE) would call {the equivalent of} validObject() after object creation, but one could always explicitly use

          new(....., .check.validity = FALSE)
for fast "but dangerous" objet creation.    

>> and I'm not sure if we want to pay that price to catch a
>> rather infrequently occuring problem.

    roger> Elaborating the validity procedure to flag such
    roger> instances seems to be well worth the speed penalty in
    roger> my view.  Of course, anticipating every such misstep
    roger> imposes a heavy burden on developers and constitutes
    roger> the real "cost" of more elaborate validity checking.

At the moment I tend to agree with Roger that we (Matrix authors) should try to add more stringent testing even at some cost --- particularly if that penalty would only occur at object creation time. One important "use case" of our sparse matrices of course are lmer() calls. They shouldn't become slower noticably.

    roger> [My 2cents based on experience with SparseM.]

Martin



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon Jul 10 20:34:18 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 10 Jul 2006 - 14:27:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.