Re: [Rd] unique.matrix issue [Was: Anomaly with unique and match]

From: jochen laubrock <jochen.laubrock_at_gmail.com>
Date: Mon, 28 Mar 2011 16:54:47 +0200

Still, from a user's perspective this behavior is somewhat irritating. Wouldn't it be better to rewrite unique.matrix to use formatC or sprintf instead of as.character, on which paste in line 9 implicitly relies, at least in R version 2.12.2 (2011-02-25)?

For example, use

temp <- apply(x, MARGIN, formatC, digits=324, format="f")

instead of

temp <- apply(x, MARGIN, function(x) paste(x, collapse = "\r"))

Don't know whether this affects performance, though.

Sorry to chime in late.
Cheers,
Jochen

> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

On Mar 9, 2011, at 20:11 , Simon Urbanek wrote:

> match() is a red herring here -- it is really a very specific thing that has to do with the fact that you're running unique() on a matrix. Also it's much easier to reproduce:
> 

>> x=c(1,1+0.2e-15)
>> x
> [1] 1 1

>> sprintf("%a",x)
> [1] "0x1p+0" "0x1.0000000000001p+0"

>> unique(x)
> [1] 1 1

>> sprintf("%a",unique(x))
> [1] "0x1p+0" "0x1.0000000000001p+0"
>> unique(matrix(x,2))
> [,1] > [1,] 1 > > and this comes from the fact that unique.matrix uses string representation since it has to take into account all values of a row/column so it pastes all values into one string, but for the two numbers that is the same:

>> as.character(x)
> [1] "1" "1"
> 
> Cheers,
> Simon
> 
> 
> On Mar 9, 2011, at 9:48 AM, Terry Therneau wrote:
> 

>> I stumbled onto this working on an update to coxph. The last 6 lines
>> below are the question, the rest create a test data set.
>>
>> tmt585% R
>> R version 2.12.2 (2011-02-25)
>> Copyright (C) 2011 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> # Lines of code from survival/tests/singtest.R
>>> library(survival)

>> Loading required package: splines
>>> test1 <- data.frame(time=  c(4, 3,1,1,2,2,3),

>> + status=c(1,NA,1,0,1,1,0),
>> + x= c(0, 2,1,1,1,0,0))
>>> >>> temp <- rep(0:3, rep(7,4)) >>> >>> stest <- data.frame(start = 10*temp,
>> + stop = 10*temp + test1$time,
>> + status = rep(test1$status,4),
>> + x = c(test1$x+ 1:7, rep(test1$x,3)),
>> + epoch = rep(1:4, rep(7,4)))
>>> >>> fit1 <- coxph(Surv(start, stop, status) ~ x * factor(epoch), stest)

>>
>> ## New lines
>>> temp1 <- fit1$linear.predictor
>>> temp2 <- as.matrix(temp1)
>>> match(temp1, unique(temp1))

>> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6
>>> match(temp2, unique(temp2))

>> [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 NA NA NA 6 6 6 8 8 8
>> 6 6
>>
>> -----------------------
>>
>> I've solved it for my code by not calling match on a 1 column vector.
>> In general, however, should I be using some other paradym for this "map
>> to unique" operation? For example match(as.character(x),
>> unique(as.character(x)) ?
>>
>> Terry T
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Mon 28 Mar 2011 - 14:59:02 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Mar 2011 - 15:50:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive