Re: [R] 'Record' row values every time the binary value in acollumn changes

From: William Dunlap <wdunlap_at_tibco.com>
Date: Wed, 20 Apr 2011 10:30:26 -0700

> -----Original Message-----
> From: r-help-bounces_at_r-project.org
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of jim holtman
> Sent: Wednesday, April 20, 2011 9:59 AM
> To: baboon2010
> Cc: r-help_at_r-project.org
> Subject: Re: [R] 'Record' row values every time the binary
> value in acollumn changes
>
> Here is an answer to part 1:
>
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> > # determine where the start/end of each group is
> > # use indices since the size is large
> > startEnd <- lapply(split(seq(nrow(Table))
> + , list(Table[, "Chromosome"], Table[,
> 'binary'])
> + , drop = TRUE
> + )
> + , function(.indx){
> + se <- range(.indx)
> + c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
> + , position_start = unname(Table[se[1L], 'start'])
> + , position_end = unname(Table[se[2L], 'start'])
> + , binary2 = unname(Table[se[1L], 'binary'])
> + )
> + })
> > do.call(rbind, startEnd)
> Chromosome2 position_start position_end binary2
> 1.0 1 20 36 0
> 2.0 2 17 19 0
> 1.1 1 12 18 1
> 2.1 2 12 16 1

The following will likely be quicker way to find where a column changes values than that lapply() when there are lots of rows:

  f1 <- function (Table) {

      isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
      isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
      with(data.frame(Table), {
          first <- isFirstInRun(binary)
          last <- isLastInRun(binary)
          cbind(Chromosome2 = Chromosome[first], position_start = start[first], 
              position_end = start[last], binary2 = binary[first])
    })
  }

E.g.,

  > f1(Table)

       Chromosome2 position_start position_end binary2
  [1,]           1             12           18       1
  [2,]           1             20           36       0
  [3,]           2             12           16       1
  [4,]           2             17           19       0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> >
> >
>
>
> On Wed, Apr 20, 2011 at 5:01 AM, baboon2010
> <nielsvanderaa_at_live.be> wrote:
> > My question is twofold.
> >
> > Part 1:
> > My data looks like this:
> >
> > (example set, real data has 2*10^6 rows)
> > binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> > Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> > start<-c(12,17,18,20,25,36,12,15,16,17,19)
> > Table<-cbind(Chromosome,start,binary)
> >      Chromosome start binary
> >  [1,]          1    12      1
> >  [2,]          1    17      1
> >  [3,]          1    18      1
> >  [4,]          1    20      0
> >  [5,]          1    25      0
> >  [6,]          1    36      0
> >  [7,]          2    12      1
> >  [8,]          2    15      1
> >  [9,]          2    16      1
> > [10,]          2    17      0
> > [11,]          2    19      0
> >
> > As output I need a shortlist for each binary block: giving
> me the starting
> > and ending position of each block.
> > Which for these example would look like this:
> >     Chromosome2 position_start position_end binary2
> > [1,]           1             12           18       1
> > [2,]           1             20           36       0
> > [3,]           2             12           16       1
> > [4,]           2             17           19       0
> >
> > Part 2:
> > Based on the output of part 1, I need to assign the binary
> to rows of
> > another data set. If the position value in this second data
> set falls in one
> > of the blocks defined in the shortlist made in part1,the
> binary value of the
> > shortlist should be assigned to an extra column for this
> row.  This would
> > look something like this:
> >     Chromosome3 position Value binary3
> >  [1,] "1"         "12"     "a"   "1"
> >  [2,] "1"         "13"     "b"   "1"
> >  [3,] "1"         "14"     "c"   "1"
> >  [4,] "1"         "15"     "d"   "1"
> >  [5,] "1"         "16"     "e"   "1"
> >  [6,] "1"         "18"     "f"   "1"
> >  [7,] "1"         "20"     "g"   "0"
> >  [8,] "1"         "21"     "h"   "0"
> >  [9,] "1"         "22"     "i"   "0"
> > [10,] "1"         "23"     "j"   "0"
> > [11,] "1"         "25"     "k"   "0"
> > [12,] "1"         "35"     "l"   "0"
> > [13,] "2"         "12"     "m"   "1"
> > [14,] "2"         "13"     "n"   "1"
> > [15,] "2"         "14"     "o"   "1"
> > [16,] "2"         "15"     "p"   "1"
> > [17,] "2"         "16"     "q"   "1"
> > [18,] "2"         "17"     "s"   "0"
> > [19,] "2"         "18"     "d"   "0"
> > [20,] "2"         "19"     "f"   "0"
> >
> >
> > Many thanks in advance,
> >
> > Niels
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Record-row-values-every-time-the
-binary-value-in-a-collumn-changes-tp3462496p3462496.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 20 Apr 2011 - 17:38:33 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Apr 2011 - 17:50:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive