From: Brian Frappier <brian.frappier_at_gmail.com>

Date: Wed 11 Oct 2006 - 18:25:33 GMT

gr 68 95 40

sum 100 100 100

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Oct 12 04:35:25 2006

Date: Wed 11 Oct 2006 - 18:25:33 GMT

I tried all of the approaches below.

> x <- data.frame(matrix(NA,100,3))

*> for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
**> if you want result in data frame
**> or
**> x<-vector("list", 3)
**> for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
*

is that this code still samples the rows, not the elements, i.e. returns 100 or 300 in the matrix cells instead of "red" or a matrix of counts by color (object type) like:

x1 x2 x3 red 32 5 60

gr 68 95 40

sum 100 100 100

It looks like Tony is right: sampling without replacement requires listing of all elements to be sampled. But, the code Petr provided

x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100)

did give me a clue of how to quickly make such a list using the 'rep' command. I will for-loop a rep statement using my original matrix to create a list of elements for each sample:

Thanks Petr and Tony for your help!

On 10/11/06, Tony Plate <tplate@acm.org> wrote:

*>
*

> Here's a way using apply(), and the prob= argument of sample():

*>
**> > df <- data.frame(sample1=c(red=400,green=100,black=300),
**> sample2=c(300,0,1000), sample3=c(2500,200,500))
**> > df
**> sample1 sample2 sample3
**> red 400 300 2500
**> green 100 0 200
**> black 300 1000 500
**> > set.seed(1)
**> > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
**> size=7, prob=counts))
**> sample1 sample2 sample3
**> [1,] 1 3 1
**> [2,] 1 3 1
**> [3,] 3 3 1
**> [4,] 2 3 2
**> [5,] 1 3 1
**> [6,] 2 3 1
**> [7,] 2 3 3
**> >
**>
**> Note that this does sampling WITH replacement.
**> AFAIK, sampling without replacement requires enumerating the entire
**> population to be sampled from. I.e., you cannot do
**> > sample(1:3, prob=1:3, rep=F, size=4)
**> instead of
**> > sample(c(1,2,2,3,3,3), rep=F, size=4)
**>
**> -- Tony Plate
**>
**> From reading ?sample, I was a little unclear on whether sampling
**> without replacement could work
**>
**> Petr Pikal wrote:
**> > Hi
**> >
**> > a litle bit different story. But
**> >
**> > x1 <- sample(c(rep("red",400),rep("green", 100),
**> > rep("black",300)),100)
**> >
**> > is maybe close. With data frame (if it is not big)
**> >
**> >
**> >>DF
**> >
**> > color sample1 sample2 sample3
**> > 1 red 400 300 2500
**> > 2 green 100 0 200
**> > 3 black 300 1000 500
**> >
**> > x <- data.frame(matrix(NA,100,3))
**> > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
**> > if you want result in data frame
**> > or
**> > x<-vector("list", 3)
**> > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
**> >
**> > if you want it in list. Maybe somebody is clever enough to discard
**> > for loop but you said you have 80 columns which shall be no problem.
**> >
**> > HTH
**> > Petr
**> >
**> >
**> >
**> >
**> >
**> >
**> >
**> > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
**> >
**> > Date sent: Wed, 11 Oct 2006 10:11:33 -0400
**> > From: "Brian Frappier" <brian.frappier@gmail.com>
**> > To: "Petr Pikal" <petr.pikal@precheza.cz>
**> > Subject: Fwd: [R] rarefy a matrix of counts
**> >
**> >
**> >>---------- Forwarded message ----------
**> >>From: Brian Frappier <brian.frappier@gmail.com>
**> >>Date: Oct 11, 2006 10:10 AM
**> >>Subject: Re: [R] rarefy a matrix of counts
**> >>To: r-help@stat.math.ethz.ch
**> >>
**> >>Hi Petr,
**> >>
**> >>Thanks for your response. I have data that looks like the following:
**> >>
**> >> sample 1 sample 2 sample 3 ....
**> >>red candy 400 300 2500
**> >>green candy 100 0 200
**> >>black candy 300 1000 500
**> >>
**> >>I don't want to randomly select either the samples (columns) or the
**> >>"candy" types (rows), which sample as you state would allow me.
**> >>Instead, I want to randomly sample 100 candies from each sample and
**> >>retain info on their associated type. I could make a list of all the
**> >>candies in each sample:
**> >>
**> >>sample 1
**> >>red
**> >>red
**> >>red
**> >>red
**> >>green
**> >>green
**> >>black
**> >>red
**> >>black
**> >>...
**> >>
**> >>and then randomly sample those rows. Repeat for each sample. But, I
**> >>am not sure how to do that without alot of loops, and am wondering if
**> >>there is an easier way in R. Thanks! I should have laid this out in
**> >>the first email...sorry.
**> >>
**> >>
**> >>On 10/11/06, Petr Pikal <petr.pikal@precheza.cz> wrote:
**> >>
**> >>>Hi
**> >>>
**> >>>I am not experienced in Matlab and from your explanation I do not
**> >>>understand what exactly do you want. It seems that you want randomly
**> >>>choose a sample of 100 rows from your martix, what can be achived by
**> >>>sample.
**> >>>
**> >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
**> >>>DF[sample(1:100, 10),]
**> >>>
**> >>>If you want to do this several times, you need to save your result
**> >>>and than it depends on what you want to do next. One suitable form
**> >>>is list of matrices the other is array and you can use for loop for
**> >>>completing it.
**> >>>
**> >>>HTH
**> >>>Petr
**> >>>
**> >>>
**> >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
**> >>>
**> >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400
**> >>>From: "Brian Frappier" <brian.frappier@gmail.com>
**> >>>To: r-help@stat.math.ethz.ch Subject:
**> >>> [R] rarefy a matrix of counts
**> >>>
**> >>>
**> >>>>Hi all,
**> >>>>
**> >>>>I have a matrix of counts for objects (rows) by samples (columns).
**> >>>> I aimed for about 500 counts in each sample (I have about 80
**> >>>>samples) and would now like to rarefy these down to 100 counts in
**> >>>>each sample using simple random sampling without replacement. I
**> >>>>plan on rarefying several times for each sample. I could do the
**> >>>>tedious looping task of making a list of all objects (with its
**> >>>>associated identifier) in each sample and then use the wonderful
**> >>>>"sampling" package to select a sub-sample of 100 for each sample
**> >>>>and thereby get a logical vector of inclusions. I would then
**> >>>>regroup the resulting logical vector into a vector of counts by
**> >>>>object, rinse and repeat several times for each sample.
**> >>>>
**> >>>>Alternately, using the same list, I could create a random index of
**> >>>>integers between 1 and the number of objects for a sample (without
**> >>>>repeats) and then select those objects from the list. Again,
**> >>>>rinse and repeat several time for each sample.
**> >>>>
**> >>>>Is there a way to directly rarefy a matrix of counts without
**> >>>>having to create a list of objects first? I am trying to switch
**> >>>>to R from Matlab and am trying to pick up good programming habits
**> >>>>from the start.
**> >>>>
**> >>>>Much appreciation!
**> >>>>
**> >>>> [[alternative HTML version deleted]]
**> >>>>
**> >>>>______________________________________________
**> >>>>R-help@stat.math.ethz.ch mailing list
**> >>>>https://stat.ethz.ch/mailman/listinfo/r-help
**> >>>>PLEASE do read the posting guide
**> >>>>http://www.R-project.org/posting-guide.html and provide commented,
**> >>>>minimal, self-contained, reproducible code.
**> >>>
**> >>>Petr Pikal
**> >>>petr.pikal@precheza.cz
**> >>>
**> >>>
**> >>
**> >
**> > Petr Pikal
**> > petr.pikal@precheza.cz
**> >
**> > ______________________________________________
**> > R-help@stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**>
*

[[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Oct 12 04:35:25 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 11 Oct 2006 - 20:30:26 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*