# [R] duplicated() and unique() problems

From: christiaan pauw <cjpauw_at_gmail.com>
Date: Tue, 08 Jun 2010 08:44:39 +0200

I have found something (for me at least) strange with duplicated(). I will first provide a replicable example of a certain kind of behaviour that I find odd and then give a sample of unexpected results from my own data. I hope someone can help me understand this.

Consider the following

# this works as expected

ex=sample(1:20, replace=TRUE)

ex

duplicated(ex)

ex=sort(ex)

ex

duplicated(ex)

# but why does duplicate not work after order() ?

ex=sample(1:20, replace=TRUE)

ex

duplicated(ex)

ex=order(ex)

duplicated(ex)

> str(Masechaba\$PROPDESC)

Factor w/ 24545 levels " 06"," 71Hemilton str",..: 14527 8043 16113 16054 13875 15780 12522 7771 14824 12314 ...
> # Create a indicator if the PROPDESC is unique. Default false
> Masechaba\$unique=FALSE

> Masechaba\$unique[which(is.na(unique(Masechaba\$PROPDESC))==FALSE)]=TRUE
> # Check is something happended
> length(which(Masechaba\$unique==TRUE))
[1] 2174
> length(which(Masechaba\$unique==FALSE))

[1] 476
> Masechaba\$duplicate=FALSE
> Masechaba\$duplicate[which(duplicated(Masechaba\$PROPDESC)==TRUE)]=TRUE
> length(which(Masechaba\$duplicate==TRUE))
[1] 476
> length(which(Masechaba\$duplicate==FALSE))
[1] 2174
> # Looks OK so far
> # Test on a known duplicate. I expect one to be true and one to be false
> Masechaba[which(Masechaba\$PROPDESC==2363),10:12]

```      PROPDESC unique duplicate
24874     2363   TRUE     FALSE
31280     2363   TRUE      TRUE

```

# This is strange. I expected that unique() and duplicate() would give the same results. The variable PROPDESC is clearly not unique in both cases. # The totals are the same but not the individual results
> table(Masechaba\$unique,Masechaba\$duplicate)

FALSE TRUE
FALSE 342 134
TRUE 1832 342 I don't understand this. Is there something I am missing?

Best regards
Christaan

P.S
> sessionInfo()

R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines stats graphics grDevices utils datasets methods base

other attached packages:

```[1] plyr_0.1.9      maptools_0.7-34 lattice_0.18-8  foreign_0.8-40
Hmisc_3.8-0     survival_2.35-8 rgdal_0.6-26
```
[8] sp_0.9-64

loaded via a namespace (and not attached): [1] cluster_1.12.3 grid_2.11.1 tools_2.11.1

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 08 Jun 2010 - 08:44:03 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Jun 2010 - 12:10:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.