[R] unexpected sort order with merge

From: Johann Hibschman <jhibschman_at_gmail.com>
Date: Wed, 06 Apr 2011 12:37:34 -0500


`merge` lists sorted as if by character, not by the actual class of the by-columns.

> tmp <- merge(data.frame(f=ordered(c("a","b","b","a","b"),

                                    levels=c("b","a")),
                          x=1:5),
               data.frame(f=ordered(c("a","b"),
                                    levels=c("b","a")),
                          y=c(10,20)))

> tmp

  f x y
1 a 1 10
2 a 4 10
3 b 2 20
4 b 3 20
5 b 5 20

> tmp[order(tmp$f),]

  f x y
3 b 2 20
4 b 3 20
5 b 5 20
1 a 1 10
2 a 4 10

I expected the second order, not the first.

I actually ran into this issue when merging zoo yearmon columns, but that adds a package dependency. In that context, I observed different behavior depending on whether I had one key or two:

> library(zoo)
> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), icpn=500, bar=10*1:6)
> merge(d1,d2)

      date icpn foo bar
1 Apr 2000 500 4 40
2 Feb 2000 500 2 20
3 Jan 2000 500 1 10

4 Jun 2000 500 6 60
5 Mar 2000 500 3 30
6 May 2000 500 5 50

> d1 <- data.frame(date=as.yearmon(2000 + (0:5)/12), foo=1:6)
> d2 <- data.frame(date=as.yearmon(2000 + (0:5)/12), bar=10*1:6)
> merge(d1,d2)

      date foo bar
1 Jan 2000 1 10
2 Feb 2000 2 20
3 Mar 2000 3 30
4 Apr 2000 4 40
5 May 2000 5 50
6 Jun 2000 6 60

The first example appears to sort by the name of the date, not by the actual date value.

The documentation of `merge` says the sort is "lexicographic", but I assumed that was in the cartesian-product sense, not in some convert-everything-to-character sense.

Is this behavior expected?

Thanks,
Johann

P.S.

> sessionInfo()

R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
[1] C

attached base packages:

[1] grid      splines   stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] ggplot2_0.8.8   reshape_0.8.3   Rauto_1.0       plyr_1.1       
[5] zoo_1.6-4       Hmisc_3.7-0     survival_2.35-8 ascii_0.7      
[9] proto_0.3-8

loaded via a namespace (and not attached): [1] cluster_1.12.1 digest_0.4.2 lattice_0.17-26 tools_2.10.1



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 17:40:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 07 Apr 2011 - 01:00:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive