Re: [Rd] Ordering of values returned by unique

From: Tony Plate <>
Date: Thu 30 Sep 2004 - 02:09:18 EST

AFAIK, it has always worked that way in S-plus and R. Furthermore, the documentation in R for 'unique' says that it removes duplicated elements. This does seem to leave the possibility that element other than the first of a set of duplicates is retained, which could mess up the order. However, the documentation for 'duplicated' is clearer: it says that 'duplicated' identifies duplicates of earlier elements. Also in the examples for 'duplicated', it says that x[!duplicated(x)] == unique(x) (paraphrased).

I depend on this all the time, so I also checked some references. In the Blue book the documentation for the functions unique and duplicated is combined and implies the above. In MASS 4th Ed, the page referred to by the index entry for 'unique' (p48, #9 in my copy) states that 'unique' removes duplicates as identified by 'duplicated', which implies that the order of retained elements is not changed. The Green book has no index entry for 'unique'. In S-plus the implementation of unique.default(x) uses x[!duplicated(x)].

So, I think the evidence is pretty strong that unique(x) will always return elements in the same order as they first appear in x. But it would be nice if the documentation for 'unique' explicitly stated that this is the behavior for all methods. (It does state this for the array method for 'unique').

At Wednesday 09:17 AM 9/29/2004, Witold Eryk Wolski wrote:
>Is the ordering of the values returned something on what I can rely on, a
>form of a standard, that a function called unique in R (in futher
>versions) will return the uniq elements in order of they first occurcence.
> > x<-c(2,2,1,2)
> > unique(x)
>[1] 2 1
>Its seems not to be the standard. E.g. matlab
> >> x=[2,2,1,2]
>x =
> 2 2 1 2
> >> unique(x)
>ans =
> 1 2
>I just noted it because, the way how it is working now is extremely
>usefull for some applications (e.g tree traversal), so i use it in a
>script. But I am a little woried if I can rely on this behaviour in
>further versions. And furthermore can I assume that someone reading the
>code will think that it works in that way?
>Or is it better to define a additional function?
> res<-rep(NA,length(unique(x))
> count<-0
> for(i in x)
> {
> if(!i%in%res)
> {
> count<-count+1
> res[count]<-i
> }
> }
> res
>Dipl. bio-chem. Witold Eryk Wolski
>MPI-Moleculare Genetic
>Ihnestrasse 63-73 14195 Berlin _
>tel: 0049-30-83875219 'v'
> / \
>mail: ---W-W----
> mailing list
> mailing list Received on Thu Sep 30 02:14:38 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:00:23 EST