Re: [R] Basic question on concatenating factors

From: Stavros Macrakis <macrakis_at_alum.mit.edu>
Date: Sun, 23 Nov 2008 11:53:38 -0500

I originally wrote:

>>> This algorithm depends crucially on union preserving the order of the
>>> elements of its arguments. As far as I can tell, the spec of union
>>> does not require this. If union were to (for example) sort its
>>> arguments then merge them (generally a more efficient algorithm), this
>>> function would no longer work.

On Sun, 23 Nov 2008, jim holtman replied
>
>> You are right. union used 'unique(c(x,y))' and I am not sure if
>> 'unique' preserves the order, but the help page seems to indicate that
>> "an element is omitted if it is identical to any previous element "...

On Sun, Nov 23, 2008 at 2:36 AM, Prof Brian Ripley <ripley_at_stats.ox.ac.uk> wrote:
>
> It says
>
> 'unique' returns a vector, data frame or array like 'x' but with
> duplicate elements/rows removed.
>
> Although it is a generic function, it is hard to see how that can be
> interpreted to allow the order to be changed.

First of all, the original issue was about "union", not about "unique". I see nothing in the "union" help page specifying the order of the resulting vector. Indeed, since union is specified to perform "set union", and the normal definition of set does not include a notion of order, it seems fair to assume that the result order is undefined in the absence of an explicit statement. What's more, there could well be set representations (e.g. bitstrings) which are inherently unordered -- but union is not (currently) generic.

The fact that union is currently implemented using unique should be of no concern of the user.

> The claim that union would be more efficiently implemented via sorting is made with no evidence:

I simply gave sorted lists as an *example* of a possible alternative implementation that the spec appears to allow.

> do look up a basic computer science textbook for this kind of thing

Having taught algorithms and data structure courses at a local university, and having written the "set" package for the Maxima computer algebra system, I am fairly well versed in the alternatives, thank you.

> as well as how R actually does it

How "R actually does it" today is not the issue. Do look up a basic computer science textbook on data abstraction or software engineering for this kind of thing.

           -s

Stavros Macrakis
(oh, and since the habit on this list seems to be to include degrees: PhD, Computer Science)
Cambridge, MA



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 23 Nov 2008 - 16:55:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 23 Nov 2008 - 17:30:27 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive