Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

From: Gordon Brown <Gordon.Brown_at_cancer.org.uk>
Date: Wed, 07 Dec 2011 15:03:15 +0000


Hi, folks,

Underscores are, in fact, ignored in some collation orders, including (if I recall correctly) en_CA.UTF-8. It's caused me a bit of confusion now and then. No idea about "English_United States.1252", but from the fact that Joris' example does not agree with Hervé's, it seems most likely that it does not ignore them.

Cheers,

On 2011/12/07 14:48, "Joris Meys" <jorismeys_at_gmail.com> wrote:

> @Barry : regardless of whether '_' comes before or after '1' , it
> should be consistent. Adding an 'a' shouldn't shift '_' from before
> '1' to between '1' and '2', that's clearly an error. The help files
> are not stating anything about that. The only thing I can imagine, is
> that '_' gets ignored (in that case 19a would rank before 1a).
> 
> This said, I can't reproduce.
> 

>> x <- c("_1_", "1_9", "2_9")
>> xa <- paste(x,'a',sep='')
>> rank(x)
> [1] 1 2 3

>> rank(xa)
> [1] 1 2 3
> 

>> sessionInfo()
> R version 2.14.0 Patched (2006-00-00 r00000)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
> 
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
>    methods   base
> 
> other attached packages:
> [1] svSocket_0.9-51 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3
> survival_2.36-9
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.14.1  grid_2.14.0     lattice_0.19-33 svMisc_0.9-63
> tools_2.14.0
> 
> 
> 2011/12/7 Hervé Pagès <hpages_at_fhcrc.org>:

>> Hi,
>>
>> This looks OK:
>>
>>> x <- c("_1_", "1_9", "2_9")
>>> rank(x)

>> [1] 1 2 3
>>
>> But this does not:
>>
>>> xa <- paste(x, "a", sep="")
>>> xa

>> [1] "_1_a" "1_9a" "2_9a"
>>> rank(xa)

>> [1] 2 1 3
>>
>> Cheers,
>> H.
>>
>>> sessionInfo()

>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.14.0
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages_at_fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 08 Dec 2011 - 12:36:26 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 08 Dec 2011 - 15:00:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive