Re: [R] subset

From: Marc Schwartz (via MN) <mschwartz_at_mn.rr.com>
Date: Wed 17 May 2006 - 04:49:33 EST

On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron wrote:
> Hello everyone,
>
> I have a large dataset (x) with some rows that have duplicate variables
> that I would like to remove. I find which rows are the duplicates with
> X1<-which(duplicated(x)). That gives me the rows with duplicated
> variables. Now, how can I remove just those rose from the original data
> frame. I think I can create a new data frame without the duplicates
> using subset. I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax. Any advice.
> Thanks in advance

Even easier would be to use unique():

  NewDF < unique(x)

NewDF will contain rows from 'x' with duplicates removed.

See ?unique for more information.

unique(), which has a data.frame method, is basically:

  x[!duplicated(x), , drop = FALSE]

which covers the case where the result may contain a single row and which remains a data frame.

Note that the above presumes that you want to test all columns in 'x' for dups.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed May 17 04:59:44 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 17 May 2006 - 06:10:18 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.