Re: [R] Fast Removing Duplicates from Every Column

From: jim holtman <jholtman_at_gmail.com>
Date: Thu 18 Jan 2007 - 01:27:41 GMT

Here is one way of doing it by 'padding' all the elements to the same length:

> x <- "Col1 Col2 Col3 Col159 Col160

+  Row1      0     0     LD  0       VD
+  Row2      HD    0     0      0       MD
+  Row3      0     HD    HD     0       LD
+  Row4      LD    HD    HD     0       0
+  LastRow    HD    HD    LD     0       MD"

> input <- read.table(textConnection(x), header=TRUE)
> Uniq <- apply(input, 2, unique)
> # find maximum length of an element
> maxLen <- max(sapply(Uniq, length))
> # pad with '0' all element to maxLen
> Uniq <- lapply(Uniq, function(x){


+ c(x, rep('0', maxLen - length(x))) + })
> as.data.frame(Uniq)

  Col1 Col2 Col3 Col159 Col160
1    0    0   LD      0     VD
2   HD   HD    0      0     MD
3   LD    0   HD      0     LD
4    0    0    0      0      0



On 1/17/07, Bert Jacobs <b.jacobs@pandora.be> wrote:
>
> Hi,
>
>
>
> Working further on this dataframe : my_data
>
>
>
> Col1 Col2 Col3 ... Col 159 Col 160
>
> Row 1 0 0 LD ... 0 VD
>
> Row 2 HD 0 0 0 MD
>
> Row 3 0 HD HD 0 LD
>
> Row 4 LD HD HD 0 0
>
> ... ...
>
> LastRow HD HD LD 0 MD
>
>
>
> Running this line of code:
>
> Test = apply(X=my_data, MARGIN=2, FUN=unique)
>
>
>
> I get this list:
>
>
>
> $Col1
>
> [1] "0" "HD" "LD"
>
> $Col2
>
> [1] "0" "HD"
>
> $Col3
>
> [1] "LD" "0" "HD"
>
> ...
>
> $Col159
>
> [1] "0"
>
> $Col160
>
> [1] "VD" "MD" "LD" "0"
>
>
>
> Now I was wondering how I can get this list into a data.frame:
>
> because a simple data.frame doesn't work (error: arguments imply differing
> number of rows)
>
>
>
> Can someone help me out on this. Thx
>
>
>
> So that I get the following result:
>
> Col1 Col2 Col3 ... Col 159 Col 160
>
> Row 1 0 0 LD 0 VD
>
> Row 2 HD HD 0 0 MD
>
> Row 3 LD 0 HD 0 LD
>
> Row 4 0 0 0 0 0
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Petr Pikal [mailto:petr.pikal@precheza.cz]
> Sent: 05 January 2007 11:51
> To: Bert Jacobs; 'R help list'
> Subject: Re: [R] Fast Removing Duplicates from Every Column
>
>
>
> Hi
>
>
>
> I am not sure if I understand how do you want to select unique items.
>
>
>
> with
>
> sapply(DF, function(x) !duplicated(x))
>
> you can get data frame with TRUE when an item in particular column is
>
> unique and FALSE in opposite. However then you need to choose which
>
> rows to keep or discard
>
>
>
> e.g.
>
>
>
> DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]
>
>
>
> selects all rows in which are 2 or more unique values.
>
>
>
> HTH
>
> Petr
>
>
>
>
>
> On 5 Jan 2007 at 9:54, Bert Jacobs wrote:
>
>
>
> From: "Bert Jacobs" <b.jacobs@pandora.be>
>
> To: "'R help list'" <r-help@stat.math.ethz.ch>
>
> Date sent: Fri, 5 Jan 2007 09:54:17 +0100
>
> Subject: Re: [R] Fast Removing Duplicates from Every Column
>
>
>
> > Hi,
>
> >
>
> > I'm looking for some lines of code that does the following:

>
> > I have a dataframe with 160 Columns and a number of rows (max 30):
>
> >
>
> > Col1 Col2 Col3 ... Col 159 Col 160
>
> > Row 1 0 0 LD ... 0 VD
>
> > Row 2 HD 0 0 0 MD
>
> > Row 3 0 HD HD 0 LD
>
> > Row 4 LD HD HD 0 0
>
> > ... ...
>
> > LastRow HD HD LD 0 MD
>
> >
>
> >
>
> > Now I want a dataframe that looks like this. As you see all duplicates
>
> > are removed. Can this dataframe be constructed in a fast way?

>
> >
>
> > Col1 Col2 Col3 ... Col 159 Col 160
>
> > Row 1 0 0 LD 0 VD
>
> > Row 2 HD HD 0 0 MD
>
> > Row 3 LD 0 HD 0 LD
>
> >
>
> > Thx for helping me out.

>
> > Bert
>
> >
>
> > ______________________________________________
>
> > R-help@stat.math.ethz.ch mailing list
>
> > https://stat.ethz.ch/mailman/listinfo/r-help
>
> > PLEASE do read the posting guide
>
> > http://www.R-project.org/posting-guide.html and provide commented,
>
> > minimal, self-contained, reproducible code.
>
>
>
> Petr Pikal
>
> petr.pikal@precheza.cz
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu Jan 18 12:33:24 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 18 Jan 2007 - 02:30:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.