Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

From: Gabriel Becker <gmbecker_at_ucdavis.edu>
Date: Wed, 07 Dec 2011 07:24:39 -0800

I'm not an expert on Locales but those that are getting this behavior and those that aren't appear to be different. (in fact, all three sets are slightly different).

Isn't sorting order based on Locale rather than any internal R code anyway?

~G

On Wed, Dec 7, 2011 at 7:06 AM, Rainer M Krug <r.m.krug_at_gmail.com> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/12/11 15:48, Joris Meys wrote:
> > @Barry : regardless of whether '_' comes before or after '1' , it
> > should be consistent. Adding an 'a' shouldn't shift '_' from
> > before '1' to between '1' and '2', that's clearly an error. The
> > help files are not stating anything about that. The only thing I
> > can imagine, is that '_' gets ignored (in that case 19a would rank
> > before 1a).
> >
> > This said, I can't reproduce.
>
> I can:
>
> > x <- c("_1_", "1_9", "2_9") xa <- paste(x,'a',sep='') rank(x)
> [1] 1 2 3
> > rank(xa)
> [1] 2 1 3
> > sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> > version
> _
> platform i686-pc-linux-gnu
> arch i686
> os linux-gnu
> system i686, linux-gnu
> status
> major 2
> minor 14.0
> year 2011
> month 10
> day 31
> svn rev 57496
> language R
> version.string R version 2.14.0 (2011-10-31)
> >
>
> Interesting.
>
> Rainer
>
>
> >
> >> x <- c("_1_", "1_9", "2_9") xa <- paste(x,'a',sep='') rank(x)
> > [1] 1 2 3
> >> rank(xa)
> > [1] 1 2 3
> >
> >> sessionInfo()
> > R version 2.14.0 Patched (2006-00-00 r00000) Platform:
> > i386-pc-mingw32/i386 (32-bit)
> >
> > locale: [1] LC_COLLATE=English_United States.1252
> > LC_CTYPE=English_United States.1252 LC_MONETARY=English_United
> > States.1252 [4] LC_NUMERIC=C
> > LC_TIME=English_United States.1252
> >
> > attached base packages: [1] grDevices datasets splines graphics
> > stats tcltk utils methods base
> >
> > other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3
> > R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9
> >
> > loaded via a namespace (and not attached): [1] cluster_1.14.1
> > grid_2.14.0 lattice_0.19-33 svMisc_0.9-63 tools_2.14.0
> >
> >
> > 2011/12/7 Hervé Pagès <hpages_at_fhcrc.org>:
> >> Hi,
> >>
> >> This looks OK:
> >>
> >>> x <- c("_1_", "1_9", "2_9") rank(x)
> >> [1] 1 2 3
> >>
> >> But this does not:
> >>
> >>> xa <- paste(x, "a", sep="") xa
> >> [1] "_1_a" "1_9a" "2_9a"

> >>> rank(xa)
> >> [1] 2 1 3
> >>
> >> Cheers, H.
> >>
> >>> sessionInfo()
> >> R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu
> >> (64-bit)
> >>
> >> locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3]
> >> LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5]
> >> LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C
> >> LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11]
> >> LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
> >>
> >> attached base packages: [1] stats graphics grDevices utils
> >> datasets methods base
> >>
> >> loaded via a namespace (and not attached): [1] tools_2.14.0
> >>
> >>
> >> -- Hervé Pagès
> >>
> >> Program in Computational Biology Division of Public Health
> >> Sciences Fred Hutchinson Cancer Research Center 1100 Fairview
> >> Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024
> >>
> >> E-mail: hpages_at_fhcrc.org Phone: (206) 667-5791 Fax: (206)
> >> 667-1319
> >>
> >> ______________________________________________
> >> R-devel_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
>
>
> - --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
> Biology, UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Stellenbosch University
> South Africa
>
> Tel : +33 - (0)9 53 10 27 44
> Cell: +33 - (0)6 85 62 59 98
> Fax : +33 - (0)9 58 10 27 44
>
> Fax (D): +49 - (0)3 21 21 25 22 44
>
> email: Rainer_at_krugs.de
>
> Skype: RMkrug
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk7fgQMACgkQoYgNqgF2egrjvACffUhSUEriYGSQY8MstwVbvAj6
> +w8An1FrwX0YXqDUqDoRq/zW31FW7WOj
> =zQr1
> -----END PGP SIGNATURE-----
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

	[[alternative HTML version deleted]]


______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Received on Wed 07 Dec 2011 - 15:27:45 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 07 Dec 2011 - 18:40:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive