[Rd] Table vs unique

From: Terry Therneau <therneau_at_mayo.edu>
Date: Wed, 21 Jul 2010 07:20:46 -0500

A bug in the survival routines was reported to me today. The root cause is a difference between table, unique, and sort.

> temp <- rep(c(1, sqrt(2)^2, 2), 1:3)
> unique(temp)

[1] 1 2 2
> table(temp)

1 2
1 5

  I'm using 2.10 on Linux, the user reported from 2.9 on Windows.

  1. Minor issue: I think the root rounding occurs in factor. I didn't see any discussion of this in the help page, perhaps something should be added.
  2. The error popped up in summary.survfit but the root cause is an inconsistent survfit object. The survfit routine uses sort and unique to create the unique survival times and most of the output, but table to count them for another component. Lumping the two versions of "2.0000...." together is the preferable output. I think the best solution will be to preprocess the time variable so that the three operators are consistent.

        as.numeric(as.character(as.factor(time))) ?

Rather ugly. But most importantly what is a guarranteed construct that would ensure consistency? Should we use a rounding level that is more or less equivalent to all.equal()?

 The solution will have to be incorporated into survfit, coxph, ... perhaps a dozen places in the survival suite so I'd like to get it right the first time.

Terry T

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 21 Jul 2010 - 12:28:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 21 Jul 2010 - 14:50:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive