From: Henrique Dallazuanna <wwwhsd_at_gmail.com>

Date: Mon, 24 Dec 2007 14:27:10 -0200

}

Date: Mon, 24 Dec 2007 14:27:10 -0200

f <- function(x)

{

cbind.data.frame(chr=unique(x$chr),

Start=min(x$pos), End=max(x$pos), Rows=nrow(x), Pattern=paste("(", x$s1, x$s2, ")") )

}

do.call("rbind", lapply(lapply(split(df, paste(df$s1, df$s2)), f), unique))

On 24/12/2007, affy snp <affysnp_at_gmail.com> wrote:

> Thanks Moshe! I apologize for not being so clear about the

*> second part. Again, below is how the data looks like. The
**> pattern for columns s1 and s2 will be:
**>
**> (-1 -1) (-1 0) (-1 1) (0 -1) (0 0) (0 1) (1 -1) (1 0) (1 1)
**> 104 131 57 631 305 668 33 15 107
**>
**> There are 9 patterns, in other words, 9 combinations of -1,1, 0
**> given in the parenthesis. The occurring numbers are underneath.
**> What I wish to have is that: scan the data from the begin,
**> if any consecutive rows are of the same pattern (one of the 9
**> combinations in the above), we will 'memorize' the following information:
**>
**> the number in 'chr' column, the number in 'pos' column for the first
**> row in the consecutive rows, the number in 'pos' column for the
**> last row in the consecutive rows, how many rows of the consecutive
**> rows, the corresponding pattern for them.
**>
**> I forgot to reinforce one requirement before for definition of
**> the consecutive rows, which is that they are in the consecutive
**> orders and are of the same number of 'chr'.
**>
**> Just to illustrate this, an example could be that, based on the data:
**>
**> BAC chr pos s1 s2
**> RP11-80G24 1 77465510 0 0
**> RP11-198H14 1 78696291 -1 0
**> RP11-267M21 1 79681704 -1 0
**> RP11-89A19 1 80950808 -1 0
**> RP11-6B16 1 82255496 -1 0
**> RP11-210E16 2 228801510 -1 0
**>
**> even though row 2---6 are of the same pattern, which is -1 0
**> and are in the consecutive order, but row 6 is of different number
**> of 'chr' than other rows. Therefore, we will not count row 6 and
**> end up with:
**> chr Start End #of_rows pattern
**> 1 78696291 82255496 4 (-1 0)
**>
**> Hope this is clear. Thank you once again and Merry X'mas!
**>
**> Best,
**> Allen
**>
**>
**>
**>
**>
**> > BAC chr pos s1 s2
**> > RP11-80G24 1 77465510 -1 0
**> > RP11-198H14 1 78696291 -1 0
**> > RP11-267M21 1 79681704 -1 0
**> > RP11-89A19 1 80950808 -1 0
**> > RP11-6B16 1 82255496 -1 0
**> > RP11-210E16 1 228801510 0 -1
**> > RP11-155C15 1 230957584 0 -1
**> > RP11-210F8 1 237932418 0 -1
**> > RP11-263L17 2 65724492 0 1
**> > RP11-340F16 2 65879898 0 1
**> > RP11-68A1 2 67718674 0 0
**> > RP11-474G23 2 68318411 0 0
**> > RP11-218N6 2 68454651 0 0
**> > CTD-2003M22 2 68567494 0 0
**> > .....
**> >
**>
**> On Dec 24, 2007 3:54 AM, Moshe Olshansky <m_olshansky_at_yahoo.com> wrote:
**>
**> > To answer your firs question try
**> >
**> > M[-which( M$s1 == 0 & M$s2 == 0),]
**> >
**> > For the second question, you must start with the more
**> > precise definition of the grouping criterion.
**> >
**> > --- affy snp <affysnp_at_gmail.com> wrote:
**> >
**> > > Hello list,
**> > >
**> > > I have a data frame M like:
**> > >
**> > > BAC chr pos s1 s2
**> > > RP11-80G24 1 77465510 -1 0
**> > > RP11-198H14 1 78696291 -1 0
**> > > RP11-267M21 1 79681704 -1 0
**> > > RP11-89A19 1 80950808 -1 0
**> > > RP11-6B16 1 82255496 -1 0
**> > > RP11-210E16 1 228801510 0 -1
**> > > RP11-155C15 1 230957584 0 -1
**> > > RP11-210F8 1 237932418 0 -1
**> > > RP11-263L17 2 65724492 0 1
**> > > RP11-340F16 2 65879898 0 1
**> > > RP11-68A1 2 67718674 0 0
**> > > RP11-474G23 2 68318411 0 0
**> > > RP11-218N6 2 68454651 0 0
**> > > CTD-2003M22 2 68567494 0 0
**> > > .....
**> > >
**> > > how to remove those rows which have 0 for both of
**> > > columns s1,s2?
**> > > sth like M[!M$21=0&!M$s2=0]?
**> > >
**> > > Moreover, I want to get a list which could find a
**> > > subset of rows which have
**> > > the same pattern of data. For example, the first 8
**> > > rows in M can be
**> > > clustered
**> > > into 2 groups (represented below in 2 rows) and
**> > > shown as:
**> > >
**> > > chr Start End # of
**> > > rows Pattern
**> > > 1 77465510 82255496 5
**> > > (-1 0)
**> > > 1 228801510 237932418 3
**> > > (0 -1)
**> > >
**> > > Can anybody help me out of this? Thank you very much
**> > > and happy holiday!
**> > >
**> > > Best,
**> > > Allen
**> > >
**> > > [[alternative HTML version deleted]]
**> > >
**> > > ______________________________________________
**> > > R-help_at_r-project.org mailing list
**> > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > PLEASE do read the posting guide
**> > > http://www.R-project.org/posting-guide.html
**> > > and provide commented, minimal, self-contained,
**> > > reproducible code.
**> > >
**> >
**> >
**>
**> [[alternative HTML version deleted]]
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Mon 24 Dec 2007 - 16:32:39 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 24 Dec 2007 - 17:30:20 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*