Re: [R] package:Matrix handling of data with identical indices

From: Thaden, John J <>
Date: Mon 10 Jul 2006 - 04:53:48 EST

On Sunday, July 09, 2006 12:31 PM, Roger Koenker = RK <> wrote

RK> On 7/8/06, Thaden, John J <> wrote:

    JT> As there is nothing inherent in either compressed, sparse,
    JT> format that would prevent recognition and handling of
    JT> duplicated index pairs, I'm curious why the dgCMatrix
    JT> class doesn't also add x values in those instances?

RK> why not multiply them?  or take the larger one, 
RK> or ...?  I would interpret this as a case of user
RK> negligence -- there is no "natural" default behavior RK> for such cases.

This user created example data to illustrate his question, but of course he faces real data, analytical chemical in this case, data that happen to come with an 8.4% occurrence of non-unique index pairs, and also, quite literally, with a "natural" way to treat cases (the ~nature~ of the assay makes it correct to sum them). I can think of other natural data sets where averaging would be the "natural" behavior. So you are right that there is no "default" natural behavior, thus, my suggestion to leave that to user choice via function argument or class slot, defaulted to summing.

Actually in this case there ~is~ one behavior superior to summing -- abstracting one of the data pair (that share indices) into a second (very sparse) "overlay" matrix. Perhaps it is my negligence not to have done this instead querying the list :-) I am doing it now.

-John Thaden

RK> On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

  DB> Your matrix Mc should be flagged as invalid. Martin and I should   DB> discuss whether we want to add such a test to the validity method. It
  DB> is not difficult to add the test but there will be a penalty in that
  DB> it will slow down all operations on such matrices and I'm not sure if
  DB> we want to pay that price to catch a rather infrequently occuring   DB> problem.

RK> Elaborating the validity procedure to flag such instances seems
RK> to be well worth the  speed penalty in my view.  Of course,
RK> anticipating every such misstep imposes a heavy burden
RK> on developers and constitutes the real "cost" of more elaborate
RK> validity checking.
RK> [My 2cents based on experience with SparseM.]

Confidentiality Notice: This e-mail message, including any a...{{dropped}} mailing list PLEASE do read the posting guide! Received on Mon Jul 10 04:57:55 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 10 Jul 2006 - 06:16:17 EST.

Mailing list information is available at Please read the posting guide before posting to the list.