Re: [R] rarefy a matrix of counts

From: Brian Frappier <brian.frappier_at_gmail.com>
Date: Fri 13 Oct 2006 - 14:35:37 GMT

Thank you, Alex! That's exactly what I was looking to do. I'm going to remove the loops and use your apply function approach. Best regards and much thanks, brian

On 10/13/06, Alex Brown <alex@transitive.com> wrote:
>
> I thought at first that you could use a weighted sample (the sample
> function) but, you can't since it doesn't take proper account of
> replacement if you try that.
>
> You can use the list approach, but through the power of R, you don't
> need a lot of loops to do it...
>
> I can't speak for the efficiency of this approach in terms of cpu cycle.
>
> In short:
>
> apply(z2,2,function(x)sample(rep(names(x),x),100))
>
> In long:
>
> #let's load the data:
>
> z = scan(,"",sep="\n")
> sample.1 sample.2 sample.3
> red.candy 400 300 2500
> green.candy 100 0 200
> black.candy 300 1000 500
>
> #and turn into a table
>
> z2 = read.table(textConnection(z), header=TRUE, row.names=1)
>
> # let's create a functon to expand a sample column into individuals:
>
> expand <- function(x) rep(names(x), x)
>
> # test it on a smaller set:
>
> ex <- expand( c( red = 2, blue = 3) )
>
> ex
> [1] "red" "red" "blue" "blue" "blue"
>
> # and sample 2 things from that:
>
> sample( ex, 2 )
>
> # combine the two
>
> samplex <- function( x, size ) sample(expand(x), size )
>
> samplex( c( red = 2, blue = 3), size = 2 )
>
> # ok, now we use the apply function to apply this to each column
>
> apply(z2, 2, samplex, size = 2 )
>
> # you wanted 100?
>
> apply(z2, 2, samplex, size = 100 )
>
> # all done.
>
> #You should note that if there are less than 100 (samplenumber)
> candies in any given sample, this function will fail.
> # eg:
>
> apply(z2, 2, samplex, size = 2000 )
>
> Error in sample(length(x), size, replace, prob) :
> cannot take a sample larger than the population
> when 'replace = FALSE'
>
> -Alex
>
> On 11 Oct 2006, at 15:10, Brian Frappier wrote:
>
> > Hi Petr,
> >
> > Thanks for your response. I have data that looks like the following:
> >
> > sample 1 sample 2 sample 3 ....
> > red candy 400 300 2500
> > green candy 100 0 200
> > black candy 300 1000 500
> >
> > I don't want to randomly select either the samples (columns) or the
> > "candy"
> > types (rows), which sample as you state would allow me. Instead, I
> > want to
> > randomly sample 100 candies from each sample and retain info on their
> > associated type. I could make a list of all the candies in each
> > sample:
> >
> > sample 1
> > red
> > red
> > red
> > red
> > green
> > green
> > black
> > red
> > black
> > ...
> >
> > and then randomly sample those rows. Repeat for each sample. But,
> > I am not
> > sure how to do that without alot of loops, and am wondering if
> > there is an
> > easier way in R. Thanks! I should have laid this out in the first
> > email...sorry.
> >
> >
> > On 10/11/06, Petr Pikal <petr.pikal@precheza.cz> wrote:
> >>
> >> Hi
> >>
> >> I am not experienced in Matlab and from your explanation I do not
> >> understand what exactly do you want. It seems that you want randomly
> >> choose a sample of 100 rows from your martix, what can be achived by
> >> sample.
> >>
> >> DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
> >> DF[sample(1:100, 10),]
> >>
> >> If you want to do this several times, you need to save your result
> >> and than it depends on what you want to do next. One suitable form is
> >> list of matrices the other is array and you can use for loop for
> >> completing it.
> >>
> >> HTH
> >> Petr
> >>
> >>
> >> On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> >>
> >> Date sent: Tue, 10 Oct 2006 17:40:47 -0400
> >> From: "Brian Frappier" <brian.frappier@gmail.com>
> >> To: r-help@stat.math.ethz.ch
> >> Subject: [R] rarefy a matrix of counts
> >>
> >>> Hi all,
> >>>
> >>> I have a matrix of counts for objects (rows) by samples
> >>> (columns). I
> >>> aimed for about 500 counts in each sample (I have about 80 samples)
> >>> and would now like to rarefy these down to 100 counts in each sample
> >>> using simple random sampling without replacement. I plan on
> >>> rarefying
> >>> several times for each sample. I could do the tedious looping
> >>> task of
> >>> making a list of all objects (with its associated identifier) in
> >>> each
> >>> sample and then use the wonderful "sampling" package to select a
> >>> sub-sample of 100 for each sample and thereby get a logical
> >>> vector of
> >>> inclusions. I would then regroup the resulting logical vector
> >>> into a
> >>> vector of counts by object, rinse and repeat several times for each
> >>> sample.
> >>>
> >>> Alternately, using the same list, I could create a random index of
> >>> integers between 1 and the number of objects for a sample (without
> >>> repeats) and then select those objects from the list. Again, rinse
> >>> and repeat several time for each sample.
> >>>
> >>> Is there a way to directly rarefy a matrix of counts without
> >>> having to
> >>> create a list of objects first? I am trying to switch to R from
> >>> Matlab and am trying to pick up good programming habits from the
> >>> start.
> >>>
> >>> Much appreciation!
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help@stat.math.ethz.ch mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html and provide commented,
> >>> minimal, self-contained, reproducible code.
> >>
> >> Petr Pikal
> >> petr.pikal@precheza.cz
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Oct 14 03:06:00 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 13 Oct 2006 - 17:30:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.