From: Brian Frappier <brian.frappier_at_gmail.com>

Date: Fri 13 Oct 2006 - 14:35:37 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Oct 14 03:06:00 2006

Date: Fri 13 Oct 2006 - 14:35:37 GMT

Thank you, Alex! That's exactly what I was looking to do. I'm going to remove the loops and use your apply function approach. Best regards and much thanks, brian

On 10/13/06, Alex Brown <alex@transitive.com> wrote:

*>
*

> I thought at first that you could use a weighted sample (the sample

*> function) but, you can't since it doesn't take proper account of
**> replacement if you try that.
**>
**> You can use the list approach, but through the power of R, you don't
**> need a lot of loops to do it...
**>
**> I can't speak for the efficiency of this approach in terms of cpu cycle.
**>
**> In short:
**>
**> apply(z2,2,function(x)sample(rep(names(x),x),100))
**>
**> In long:
**>
**> #let's load the data:
**>
**> z = scan(,"",sep="\n")
**> sample.1 sample.2 sample.3
**> red.candy 400 300 2500
**> green.candy 100 0 200
**> black.candy 300 1000 500
**>
**> #and turn into a table
**>
**> z2 = read.table(textConnection(z), header=TRUE, row.names=1)
**>
**> # let's create a functon to expand a sample column into individuals:
**>
**> expand <- function(x) rep(names(x), x)
**>
**> # test it on a smaller set:
**>
**> ex <- expand( c( red = 2, blue = 3) )
**>
**> ex
**> [1] "red" "red" "blue" "blue" "blue"
**>
**> # and sample 2 things from that:
**>
**> sample( ex, 2 )
**>
**> # combine the two
**>
**> samplex <- function( x, size ) sample(expand(x), size )
**>
**> samplex( c( red = 2, blue = 3), size = 2 )
**>
**> # ok, now we use the apply function to apply this to each column
**>
**> apply(z2, 2, samplex, size = 2 )
**>
**> # you wanted 100?
**>
**> apply(z2, 2, samplex, size = 100 )
**>
**> # all done.
**>
**> #You should note that if there are less than 100 (samplenumber)
**> candies in any given sample, this function will fail.
**> # eg:
**>
**> apply(z2, 2, samplex, size = 2000 )
**>
**> Error in sample(length(x), size, replace, prob) :
**> cannot take a sample larger than the population
**> when 'replace = FALSE'
**>
**> -Alex
**>
**> On 11 Oct 2006, at 15:10, Brian Frappier wrote:
**>
**> > Hi Petr,
**> >
**> > Thanks for your response. I have data that looks like the following:
**> >
**> > sample 1 sample 2 sample 3 ....
**> > red candy 400 300 2500
**> > green candy 100 0 200
**> > black candy 300 1000 500
**> >
**> > I don't want to randomly select either the samples (columns) or the
**> > "candy"
**> > types (rows), which sample as you state would allow me. Instead, I
**> > want to
**> > randomly sample 100 candies from each sample and retain info on their
**> > associated type. I could make a list of all the candies in each
**> > sample:
**> >
**> > sample 1
**> > red
**> > red
**> > red
**> > red
**> > green
**> > green
**> > black
**> > red
**> > black
**> > ...
**> >
**> > and then randomly sample those rows. Repeat for each sample. But,
**> > I am not
**> > sure how to do that without alot of loops, and am wondering if
**> > there is an
**> > easier way in R. Thanks! I should have laid this out in the first
**> > email...sorry.
**> >
**> >
**> > On 10/11/06, Petr Pikal <petr.pikal@precheza.cz> wrote:
**> >>
**> >> Hi
**> >>
**> >> I am not experienced in Matlab and from your explanation I do not
**> >> understand what exactly do you want. It seems that you want randomly
**> >> choose a sample of 100 rows from your martix, what can be achived by
**> >> sample.
**> >>
**> >> DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
**> >> DF[sample(1:100, 10),]
**> >>
**> >> If you want to do this several times, you need to save your result
**> >> and than it depends on what you want to do next. One suitable form is
**> >> list of matrices the other is array and you can use for loop for
**> >> completing it.
**> >>
**> >> HTH
**> >> Petr
**> >>
**> >>
**> >> On 10 Oct 2006 at 17:40, Brian Frappier wrote:
**> >>
**> >> Date sent: Tue, 10 Oct 2006 17:40:47 -0400
**> >> From: "Brian Frappier" <brian.frappier@gmail.com>
**> >> To: r-help@stat.math.ethz.ch
**> >> Subject: [R] rarefy a matrix of counts
**> >>
**> >>> Hi all,
**> >>>
**> >>> I have a matrix of counts for objects (rows) by samples
**> >>> (columns). I
**> >>> aimed for about 500 counts in each sample (I have about 80 samples)
**> >>> and would now like to rarefy these down to 100 counts in each sample
**> >>> using simple random sampling without replacement. I plan on
**> >>> rarefying
**> >>> several times for each sample. I could do the tedious looping
**> >>> task of
**> >>> making a list of all objects (with its associated identifier) in
**> >>> each
**> >>> sample and then use the wonderful "sampling" package to select a
**> >>> sub-sample of 100 for each sample and thereby get a logical
**> >>> vector of
**> >>> inclusions. I would then regroup the resulting logical vector
**> >>> into a
**> >>> vector of counts by object, rinse and repeat several times for each
**> >>> sample.
**> >>>
**> >>> Alternately, using the same list, I could create a random index of
**> >>> integers between 1 and the number of objects for a sample (without
**> >>> repeats) and then select those objects from the list. Again, rinse
**> >>> and repeat several time for each sample.
**> >>>
**> >>> Is there a way to directly rarefy a matrix of counts without
**> >>> having to
**> >>> create a list of objects first? I am trying to switch to R from
**> >>> Matlab and am trying to pick up good programming habits from the
**> >>> start.
**> >>>
**> >>> Much appreciation!
**> >>>
**> >>> [[alternative HTML version deleted]]
**> >>>
**> >>> ______________________________________________
**> >>> R-help@stat.math.ethz.ch mailing list
**> >>> https://stat.ethz.ch/mailman/listinfo/r-help
**> >>> PLEASE do read the posting guide
**> >>> http://www.R-project.org/posting-guide.html and provide commented,
**> >>> minimal, self-contained, reproducible code.
**> >>
**> >> Petr Pikal
**> >> petr.pikal@precheza.cz
**> >>
**> >>
**> >
**> > [[alternative HTML version deleted]]
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide http://www.R-project.org/posting-
**> > guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**>
**>
*

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Oct 14 03:06:00 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Fri 13 Oct 2006 - 17:30:10 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*