# Re: [R] How to remove some rows from a data.frame

From: affy snp <affysnp_at_gmail.com>
Date: Mon, 24 Dec 2007 10:57:36 -0500

(-1 -1) (-1 0) (-1 1) (0 -1) (0 0) (0 1) (1 -1) (1 0) (1 1)   104 131 57 631 305 668 33 15 107

There are 9 patterns, in other words, 9 combinations of -1,1, 0 given in the parenthesis. The occurring numbers are underneath. What I wish to have is that: scan the data from the begin, if any consecutive rows are of the same pattern (one of the 9 combinations in the above), we will 'memorize' the following information:

the number in 'chr' column, the number in 'pos' column for the first row in the consecutive rows, the number in 'pos' column for the last row in the consecutive rows, how many rows of the consecutive rows, the corresponding pattern for them.

I forgot to reinforce one requirement before for definition of the consecutive rows, which is that they are in the consecutive orders and are of the same number of 'chr'.

Just to illustrate this, an example could be that, based on the data:

```BAC                 chr    pos          s1   s2
RP11-80G24    1    77465510    0    0
RP11-198H14    1    78696291    -1    0
RP11-267M21    1    79681704    -1    0
RP11-89A19      1    80950808    -1    0
RP11-6B16        1    82255496    -1    0
```
RP11-210E16 2 228801510 -1 0

even though row 2---6 are of the same pattern, which is -1 0 and are in the consecutive order, but row 6 is of different number of 'chr' than other rows. Therefore, we will not count row 6 and end up with:

```chr    Start           End        #of_rows          pattern
1    78696291    82255496   4                    (-1 0)

```

Hope this is clear. Thank you once again and Merry X'mas!

Best,

Allen

> BAC chr pos s1 s2
> RP11-80G24 1 77465510 -1 0
> RP11-198H14 1 78696291 -1 0
> RP11-267M21 1 79681704 -1 0
> RP11-89A19 1 80950808 -1 0
> RP11-6B16 1 82255496 -1 0
> RP11-210E16 1 228801510 0 -1
> RP11-155C15 1 230957584 0 -1
> RP11-210F8 1 237932418 0 -1
> RP11-263L17 2 65724492 0 1
> RP11-340F16 2 65879898 0 1
> RP11-68A1 2 67718674 0 0
> RP11-474G23 2 68318411 0 0
> RP11-218N6 2 68454651 0 0
> CTD-2003M22 2 68567494 0 0
> .....
>

On Dec 24, 2007 3:54 AM, Moshe Olshansky <m_olshansky_at_yahoo.com> wrote:

>
> M[-which( M\$s1 == 0 & M\$s2 == 0),]
>
> precise definition of the grouping criterion.
>
> --- affy snp <affysnp_at_gmail.com> wrote:
>
> > Hello list,
> >
> > I have a data frame M like:
> >
> > BAC chr pos s1 s2
> > RP11-80G24 1 77465510 -1 0
> > RP11-198H14 1 78696291 -1 0
> > RP11-267M21 1 79681704 -1 0
> > RP11-89A19 1 80950808 -1 0
> > RP11-6B16 1 82255496 -1 0
> > RP11-210E16 1 228801510 0 -1
> > RP11-155C15 1 230957584 0 -1
> > RP11-210F8 1 237932418 0 -1
> > RP11-263L17 2 65724492 0 1
> > RP11-340F16 2 65879898 0 1
> > RP11-68A1 2 67718674 0 0
> > RP11-474G23 2 68318411 0 0
> > RP11-218N6 2 68454651 0 0
> > CTD-2003M22 2 68567494 0 0
> > .....
> >
> > how to remove those rows which have 0 for both of
> > columns s1,s2?
> > sth like M[!M\$21=0&!M\$s2=0]?
> >
> > Moreover, I want to get a list which could find a
> > subset of rows which have
> > the same pattern of data. For example, the first 8
> > rows in M can be
> > clustered
> > into 2 groups (represented below in 2 rows) and
> > shown as:
> >
> > chr Start End # of
> > rows Pattern
> > 1 77465510 82255496 5
> > (-1 0)
> > 1 228801510 237932418 3
> > (0 -1)
> >
> > Can anybody help me out of this? Thank you very much
> > and happy holiday!
> >
> > Best,
> > Allen
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
>

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 24 Dec 2007 - 16:02:26 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 24 Dec 2007 - 17:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.