Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

From: peter dalgaard <pdalgd_at_gmail.com>
Date: Wed, 07 Dec 2011 19:30:10 +0100

On Dec 7, 2011, at 15:48 , Joris Meys wrote:

> @Barry : regardless of whether '_' comes before or after '1' , it
> should be consistent. Adding an 'a' shouldn't shift '_' from before
> '1' to between '1' and '2', that's clearly an error. The help files
> are not stating anything about that. The only thing I can imagine, is
> that '_' gets ignored (in that case 19a would rank before 1a).

As far as I remember, that is exactly the case. In some locales, and not even consistently across different OS versions of the "same" locale, there are characters that are ignored for collation. With that in mind, what we see is really not any stranger than "a" < "ab" but "ac" > "abc".

R just uses what the OS supplies, so if you want to use words like "inconsistent" or "error", please direct them at those who define the locales. (And be prepared to realize that you may have kicked a hornet's nest...)

> 
> This said, I can't reproduce.
> 

>> x <- c("_1_", "1_9", "2_9")
>> xa <- paste(x,'a',sep='')
>> rank(x)
> [1] 1 2 3

>> rank(xa)
> [1] 1 2 3
> 

>> sessionInfo()
> R version 2.14.0 Patched (2006-00-00 r00000)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
> 
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats     tcltk     utils
>   methods   base
> 
> other attached packages:
> [1] svSocket_0.9-51 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3
> survival_2.36-9
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.14.1  grid_2.14.0     lattice_0.19-33 svMisc_0.9-63
> tools_2.14.0
> 
> 
> 2011/12/7 Hervé Pagès <hpages_at_fhcrc.org>:

>> Hi,
>>
>> This looks OK:
>>
>>> x <- c("_1_", "1_9", "2_9")
>>> rank(x)

>> [1] 1 2 3
>>
>> But this does not:
>>
>>> xa <- paste(x, "a", sep="")
>>> xa

>> [1] "_1_a" "1_9a" "2_9a"
>>> rank(xa)

>> [1] 2 1 3
>>
>> Cheers,
>> H.
>>
>>> sessionInfo()

>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.14.0
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages_at_fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
> 
> tel : +32 9 264 59 87
> Joris.Meys_at_Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 07 Dec 2011 - 18:33:01 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 07 Dec 2011 - 18:50:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive