Re: [Rd] sort yields different results on OS X (PR#14163)

From: Jeffrey M Sullivan <jeffreys_at_rand.org>
Date: Tue, 22 Dec 2009 11:55:35 -0800

On Dec 22, 2009, at 4:18 AM, Prof Brian Ripley wrote:

> As the help says
>
> The sort order for character vectors will depend on the collating
> sequence of the locale in use: see ‘Comparison’.
>
> and that ref says
>
> Collation of
> non-letters (spaces, punctuation signs, hyphens, fractions and so
> on) is even more problematic.
>
> That different OSes use the same name for a locale does not make
> them the same locale.
>
> Note that R can be compiled to use ICU, which provides a well-
> considered collation suite. R on Mac OS X uses ICU, as does a Linux
> build if it is available -- so I would say that it is RHEL that is
> out of line here (it makes little sense to have < and > far apart in
> the collation sequence).
>
> Why did you report a documented difference as a bug?
>

Because it wasn't clear to me from the documentation what sort of "problematic" behaviors were covered as documented differences vs unexpected behavior. Other OSS projects I have been involved with have a "when in doubt, file a bug" policy. If that isn't the case with R, I won't do so in the future.

Thank you for the pointer towards ICU. RHEL has some of the ICU libraries, but the icuSetCollate function returns a warning that R was not built with them. Including a reference to this function in the "See Also" for Comparison would make this info a little easier to find.

Thanks for your time,
Jeff

> On Mon, 21 Dec 2009, jeffreys@rand.org wrote:
>
>> Full_Name: Jeffrey Sullivan
>> Version: 2.10
>> OS: Mac
>> Submission from: (NULL) (130.154.0.250)
>>
>>
>> Sort produces different results when sorting strings with non-
>> alphanumeric
>> characters, depending on the operating system:
>>
>> RHEL 5.2, R 2.10.0
>> -------------
>>> v <- c("1","<0",">3","2")
>>> Sys.setlocale("LC_COLLATE","en_US.UTF-8")
>> [1] "en_US.UTF-8"
>>> sort(v)
>> [1] "<0" "1" "2" ">3"
>>
>> Max OS 10.5.8, R 2.10.1
>> -------------------
>>> v <- c("1","<0",">3","2")
>>> Sys.setlocale("LC_COLLATE","en_US.UTF-8")
>> [1] "en_US.UTF-8"
>>> sort(v)
>> [1] "<0" ">3" "1" "2"
>>
>> ______________________________________________
>> R-devel_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595

-- 
Jeffrey Sullivan
Senior Project Associate
RAND Corporation

Work : (310) 393-0411 x6883
Fax  : (310) 260-8147
SIPR : jeffreys_at_sm.rand.pentagon.smil.mil
JWICS: sullivanj_at_la.ic.gov


__________________________________________________________________________ This email message is for the sole use of the intended r...{{dropped:8}}

______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Received on Wed 23 Dec 2009 - 12:41:15 GMT

This archive was generated by hypermail 2.2.0 : Wed 23 Dec 2009 - 16:11:13 GMT