Re: [R] Fwd: rarefy a matrix of counts

From: Brian Frappier <brian.frappier_at_gmail.com>
Date: Wed 11 Oct 2006 - 18:25:33 GMT

I tried all of the approaches below.

the problem with:

> x <- data.frame(matrix(NA,100,3))
> for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> if you want result in data frame
> or
> x<-vector("list", 3)
> for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)

is that this code still samples the rows, not the elements, i.e. returns 100 or 300 in the matrix cells instead of "red" or a matrix of counts by color (object type) like:

       x1    x2   x3
red  32     5    60

gr 68 95 40
sum 100 100 100

 It looks like Tony is right: sampling without replacement requires listing of all elements to be sampled. But, the code Petr provided

x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100)

did give me a clue of how to quickly make such a list using the 'rep' command. I will for-loop a rep statement using my original matrix to create a list of elements for each sample:

Thanks Petr and Tony for your help!

On 10/11/06, Tony Plate <tplate@acm.org> wrote:
>
> Here's a way using apply(), and the prob= argument of sample():
>
> > df <- data.frame(sample1=c(red=400,green=100,black=300),
> sample2=c(300,0,1000), sample3=c(2500,200,500))
> > df
> sample1 sample2 sample3
> red 400 300 2500
> green 100 0 200
> black 300 1000 500
> > set.seed(1)
> > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> size=7, prob=counts))
> sample1 sample2 sample3
> [1,] 1 3 1
> [2,] 1 3 1
> [3,] 3 3 1
> [4,] 2 3 2
> [5,] 1 3 1
> [6,] 2 3 1
> [7,] 2 3 3
> >
>
> Note that this does sampling WITH replacement.
> AFAIK, sampling without replacement requires enumerating the entire
> population to be sampled from. I.e., you cannot do
> > sample(1:3, prob=1:3, rep=F, size=4)
> instead of
> > sample(c(1,2,2,3,3,3), rep=F, size=4)
>
> -- Tony Plate
>
> From reading ?sample, I was a little unclear on whether sampling
> without replacement could work
>
> Petr Pikal wrote:
> > Hi
> >
> > a litle bit different story. But
> >
> > x1 <- sample(c(rep("red",400),rep("green", 100),
> > rep("black",300)),100)
> >
> > is maybe close. With data frame (if it is not big)
> >
> >
> >>DF
> >
> > color sample1 sample2 sample3
> > 1 red 400 300 2500
> > 2 green 100 0 200
> > 3 black 300 1000 500
> >
> > x <- data.frame(matrix(NA,100,3))
> > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > if you want result in data frame
> > or
> > x<-vector("list", 3)
> > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> >
> > if you want it in list. Maybe somebody is clever enough to discard
> > for loop but you said you have 80 columns which shall be no problem.
> >
> > HTH
> > Petr
> >
> >
> >
> >
> >
> >
> >
> > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> >
> > Date sent: Wed, 11 Oct 2006 10:11:33 -0400
> > From: "Brian Frappier" <brian.frappier@gmail.com>
> > To: "Petr Pikal" <petr.pikal@precheza.cz>
> > Subject: Fwd: [R] rarefy a matrix of counts
> >
> >
> >>---------- Forwarded message ----------
> >>From: Brian Frappier <brian.frappier@gmail.com>
> >>Date: Oct 11, 2006 10:10 AM
> >>Subject: Re: [R] rarefy a matrix of counts
> >>To: r-help@stat.math.ethz.ch
> >>
> >>Hi Petr,
> >>
> >>Thanks for your response. I have data that looks like the following:
> >>
> >> sample 1 sample 2 sample 3 ....
> >>red candy 400 300 2500
> >>green candy 100 0 200
> >>black candy 300 1000 500
> >>
> >>I don't want to randomly select either the samples (columns) or the
> >>"candy" types (rows), which sample as you state would allow me.
> >>Instead, I want to randomly sample 100 candies from each sample and
> >>retain info on their associated type. I could make a list of all the
> >>candies in each sample:
> >>
> >>sample 1
> >>red
> >>red
> >>red
> >>red
> >>green
> >>green
> >>black
> >>red
> >>black
> >>...
> >>
> >>and then randomly sample those rows. Repeat for each sample. But, I
> >>am not sure how to do that without alot of loops, and am wondering if
> >>there is an easier way in R. Thanks! I should have laid this out in
> >>the first email...sorry.
> >>
> >>
> >>On 10/11/06, Petr Pikal <petr.pikal@precheza.cz> wrote:
> >>
> >>>Hi
> >>>
> >>>I am not experienced in Matlab and from your explanation I do not
> >>>understand what exactly do you want. It seems that you want randomly
> >>>choose a sample of 100 rows from your martix, what can be achived by
> >>>sample.
> >>>
> >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
> >>>DF[sample(1:100, 10),]
> >>>
> >>>If you want to do this several times, you need to save your result
> >>>and than it depends on what you want to do next. One suitable form
> >>>is list of matrices the other is array and you can use for loop for
> >>>completing it.
> >>>
> >>>HTH
> >>>Petr
> >>>
> >>>
> >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> >>>
> >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400
> >>>From: "Brian Frappier" <brian.frappier@gmail.com>
> >>>To: r-help@stat.math.ethz.ch Subject:
> >>> [R] rarefy a matrix of counts
> >>>
> >>>
> >>>>Hi all,
> >>>>
> >>>>I have a matrix of counts for objects (rows) by samples (columns).
> >>>> I aimed for about 500 counts in each sample (I have about 80
> >>>>samples) and would now like to rarefy these down to 100 counts in
> >>>>each sample using simple random sampling without replacement. I
> >>>>plan on rarefying several times for each sample. I could do the
> >>>>tedious looping task of making a list of all objects (with its
> >>>>associated identifier) in each sample and then use the wonderful
> >>>>"sampling" package to select a sub-sample of 100 for each sample
> >>>>and thereby get a logical vector of inclusions. I would then
> >>>>regroup the resulting logical vector into a vector of counts by
> >>>>object, rinse and repeat several times for each sample.
> >>>>
> >>>>Alternately, using the same list, I could create a random index of
> >>>>integers between 1 and the number of objects for a sample (without
> >>>>repeats) and then select those objects from the list. Again,
> >>>>rinse and repeat several time for each sample.
> >>>>
> >>>>Is there a way to directly rarefy a matrix of counts without
> >>>>having to create a list of objects first? I am trying to switch
> >>>>to R from Matlab and am trying to pick up good programming habits
> >>>>from the start.
> >>>>
> >>>>Much appreciation!
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>>______________________________________________
> >>>>R-help@stat.math.ethz.ch mailing list
> >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>PLEASE do read the posting guide
> >>>>http://www.R-project.org/posting-guide.html and provide commented,
> >>>>minimal, self-contained, reproducible code.
> >>>
> >>>Petr Pikal
> >>>petr.pikal@precheza.cz
> >>>
> >>>
> >>
> >
> > Petr Pikal
> > petr.pikal@precheza.cz
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Oct 12 04:35:25 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 11 Oct 2006 - 20:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.