Re: [Rd] Table vs unique

From: Therneau, Terry M., Ph.D. <>
Date: Thu, 22 Jul 2010 16:30:03 -0500

 Thanks to both Brian and Goran for the suggestions. I've now added the following to the code:

    # Round the survival times just a little.  Otherwise unique() and
    #  table() can give a different list of unique times, and both are
    #  used in the subsequent routines.  A test case with times of
    #  sqrt(2)^2 and 2 revealed this issue.  Normally double.digits is
    #  53 in base 2, which leads to 15 decimal digits.
    digits <- floor((.Machine$double.digits -1) * 
                    logb(.Machine$double.base,10)) #base 10 digits
    Y[,1] <- signif(Y[,1], digits)
    if (ncol(Y)==3) Y[,2] <- signif(Y[,2],digits)

It will migrate to CRAN in due course.

   Terry T

-----Original Message-----
From: Göran Broström [] Sent: Thursday, July 22, 2010 3:54 PM
To: Prof Brian Ripley
Cc: Therneau, Terry M., Ph.D.; Subject: Re: [Rd] Table vs unique

On 07/21/2010 03:54 PM, Prof Brian Ripley wrote:
> I believe this is a misdiagnosis: the 'rounding' is done by
> as.character (as the help for argument 'levels' in ?factor does say)
> and ?as.character has a full explanation (and is linked from the
> relevant part of ?factor).
> as.numeric(as.character()) should do the trick.


temp <- signif(temp, digits = 15)

which seems to be much faster.

Göran B.

> On Wed, 21 Jul 2010, Terry Therneau wrote:
>> A bug in the survival routines was reported to me today. The root cause
>> is a difference between table, unique, and sort.
>>> temp<- rep(c(1, sqrt(2)^2, 2), 1:3)
>>> unique(temp)
>> [1] 1 2 2
>>> table(temp)
>> temp
>> 1 2
>> 1 5
>> I'm using 2.10 on Linux, the user reported from 2.9 on Windows.
>> 1. Minor issue: I think the root rounding occurs in factor. I didn't
>> see any discussion of this in the help page, perhaps something should be
>> added.
>> 2. The error popped up in summary.survfit but the root cause is an
>> inconsistent survfit object. The survfit routine uses sort and unique
>> to create the unique survival times and most of the output, but table to
>> count them for another component.
>> Lumping the two versions of "2.0000...." together is the preferable
>> output. I think the best solution will be to preprocess the time
>> variable so that the three operators are consistent.
>> as.numeric(as.character(as.factor(time))) ?
>> Rather ugly. But most importantly what is a guarranteed construct that
>> would ensure consistency? Should we use a rounding level that is more
>> or less equivalent to all.equal()?
>> The solution will have to be incorporated into survfit, coxph, ...
>> perhaps a dozen places in the survival suite so I'd like to get it right
>> the first time.
>> Terry T
>> ______________________________________________
>> mailing list
> mailing list Received on Thu 22 Jul 2010 - 21:33:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 25 Jul 2010 - 09:40:19 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive