[R] 'merge' function creating duplicate columns names in the output

From: jim holtman <jholtman_at_gmail.com>
Date: Thu, 03 Mar 2011 17:04:52 -0500


The "merge" command is creating duplicate column names in a dataframe that is the result of the merge. The following is the 'merge' command:

x <- merge(invType

    , allocSlots
    , by.x = 'index'
    , by.y = 'indx'
    , all.x = TRUE

    )

The 'invType' dataframe was the result of a previous merge and has the following column names that are probably causing the problem: height.x
height.y
height

> str(invType)
'data.frame': 2219 obs. of 30 variables:
$ loc : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
$ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
$ comm : int 451 57 560 40 560 39 560 40 560 46 ...
$ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
$ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH

FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...

$ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
$ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
$ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
$ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
$ high : int 5 2 3 3 4 3 3 3 3 3 ...
$ pqty : int 65 26 18 45 20 45 21 39 24 30 ...
$ boh : int 4372 58 1199 51 836 116 64 312 371 389 ...
$ awm : num 694 44.3 53.8 35 0.8 ...
$ cubes : num 0.586 1.06 1.821 0.563 1.328 ...
$ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
$ adm : num 99.143 6.329 7.686 5 0.114 ...
$ tie : num 13 13 6 15 5 15 7 13 8 10 ...
$ origComm: int 457 57 547 40 541 39 552 40 552 46 ...
$ days : num 0.656 6.162 2.342 11.998 216.853 ...
$ class : chr "single" "double" "single" "double" ...
$ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ...
$ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
$ type.x : int 2 2 2 2 2 2 2 2 2 2 ...
$ height.y: num 47 47 47 47 47 47 47 47 47 47 ...
$ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ...
$ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ type.y : int 2 2 2 2 2 2 2 2 2 2 ...
$ depth : num 48 48 48 48 48 48 48 48 48 48 ...
$ height : num 47 47 47 47 47 47 47 47 47 47 ...
$ index : int 1 2 3 4 5 6 7 8 9 10 ...

Now the "allocSlots" dataframe also has a column name 'height'

> str(allocSlots)
'data.frame': 2462 obs. of 6 variables:
$ loc : chr "F1AA02" "F1AA12" "F1AA22" "F1AA32" ...
$ height: num 72 72 72 72 72 72 72 72 72 72 ...
$ depth : num 48 48 48 48 48 48 48 48 48 48 ...
$ bay : chr "F1AA0" "F1AA0" "F1AA2" "F1AA2" ...
$ indx : int 1675 1617 1386 1096 1077 963 816 471 275 259 ...
$ type : int 1 1 1 1 1 1 1 1 1 1 ...

Here is the result of the 'merge': (notice that there are now two 'height.x' and 'height.y' columns in the dataframe:

> str(x)
'data.frame': 2219 obs. of 35 variables:
$ index : int 1 2 3 4 5 6 7 8 9 10 ...
$ loc.x : chr "F0AA63" "F0AA65" "F0AA73" "F0AA75" ...
$ KLN : int 3569383 3515513 3565497 3555138 3565162 3555001
3565139 3555886 3565796 3556647 ...
$ comm : int 451 57 560 40 560 39 560 40 560 46 ...
$ case : num 7.70e+09 1.00e+12 3.00e+12 1.00e+12 1.11e+09 ...
$ desc : chr "PGPR RTC BONELESS WINGS" "GRTN POT CRNCH FISH

FILET" "TYSON CORNISH HENS TWN PK" "GGNT RSTD POT GRLC HERB" ...

$ height.x: num 7.2 12.6 11 7.8 6.8 10.1 11.2 10 11 10.5 ...
$ length : num 14.5 15.8 20 15.6 22.2 15 20.2 15 17 19.8 ...
$ weight : num 11 16.3 39 11 35.6 6.5 36 4 30 12.5 ...
$ width : num 9.7 9.2 14.3 8 15.2 7.5 13.2 8.5 13 10 ...
$ high : int 5 2 3 3 4 3 3 3 3 3 ...
$ pqty : int 65 26 18 45 20 45 21 39 24 30 ...
$ boh : int 4372 58 1199 51 836 116 64 312 371 389 ...
$ awm : num 694 44.3 53.8 35 0.8 ...
$ cubes : num 0.586 1.06 1.821 0.563 1.328 ...
$ pallet : num 42 31.2 39 29.4 33.2 36.3 39.6 36 39 37.5 ...
$ adm : num 99.143 6.329 7.686 5 0.114 ...
$ tie : num 13 13 6 15 5 15 7 13 8 10 ...
$ origComm: int 457 57 547 40 541 39 552 40 552 46 ...
$ days : num 0.656 6.162 2.342 11.998 216.853 ...
$ class : chr "single" "double" "single" "double" ...
$ top.x : logi TRUE TRUE FALSE TRUE FALSE TRUE ...
$ comm_ord: Factor w/ 30 levels "37A","38A","43A",..: 25 5 23 15 23 8
23 15 23 16 ...
$ type.x : int 2 2 2 2 2 2 2 2 2 2 ...
$ height.y: num 47 47 47 47 47 47 47 47 47 47 ...
$ top.y : logi FALSE TRUE FALSE TRUE FALSE TRUE ...
$ noChange: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ type.y : int 2 2 2 2 2 2 2 2 2 2 ...
$ depth.x : num 48 48 48 48 48 48 48 48 48 48 ...
$ height.x: num 47 47 47 47 47 47 47 47 47 47 ...
$ loc.y : chr "F1KC22" "F1BM34" "F1HC73" "F1FJ65" ...
$ height.y: num 72 44 72 44 72 44 72 44 72 72 ...
$ depth.y : num 48 48 48 48 48 48 48 48 48 48 ...
$ bay : chr "F1KC2" "F1BM2" "F1HC7" "F1FJ5" ...
$ type : int 1 2 1 2 1 2 1 2 1 1 ...

My workaround is to change one of the "height" to something else to avoid the problem, but someone else might stumble on the same error. Should we expect 'merge' to ensure that the column names are unique in the result?

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 03 Mar 2011 - 22:25:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 Mar 2011 - 22:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive