Re: [R] help on permutation/randomization test

From: Greg Snow <Greg.Snow_at_imail.org>
Date: Tue, 24 May 2011 20:03:33 -0600

OK, I was not sure from your description, if there had been a large number of small clusters then my suggestion would have worked, but it looks like now that there would be too much cluster overlapping.

I know that there are bootstrap methods used on time series data that sample blocks of data to preserve at least some of the correlation, you could try those techniques (I have read about them, but never used them myself, so the most help that I can be is pointing you in that direction, I think it is described in Efron's original bootstrap book, but probably in other places as well).

The random number generators in R are based on good theory, I doubt that there would be any problems with using the sample function for randomization tests.

From: Wenjin Mao [mailto:wenj.mao_at_gmail.com] Sent: Tuesday, May 24, 2011 6:54 PM
To: Greg Snow
Cc: Meyners, Michael; r-help_at_r-project.org Subject: Re: [R] help on permutation/randomization test

Thanks, Greg.

I also considered the clusters. The difficulty is those objects not only enter the system at different time, but may have different duration in the system. Once they have a time overlap in the system, impacts may exist. If splitting into two clusters by setting a time threshold t, I need to drop all objects that enter before time t and leave after time t. The more clusters, the more objects to be dropped that I don't prefer. But two or three clusters may be too small as a sample size. My purpose is to test the difference between two systems.

Back to the R function question. When sample size is large, the full permutation test is infeasible and we have to use randomization test by selecting permutation order randomly. One factor I know that impacts the randomness is the random number generator. I am not sure how well the function "sample" is in randomness.

Thanks,
Wenjin

On Tue, May 24, 2011 at 4:45 PM, Greg Snow <Greg.Snow_at_imail.org<mailto:Greg.Snow_at_imail.org>> wrote: If the x's that don't enter at the same time can be considered independent of each other, and only clusters that enter at the same time are dependent, then you can still do a permutation test by creating clusters with dependent values within each cluster, but independent between clusters, then permute the clusters rather than the individual data points. This maintains the dependency.

I don't know of any existing functions that will do the whole thing for you, but this would only be a few lines of R code to do this type of permutation test. The split function can help with separating the clusters, sample can do the permutations, and unlist or sapply can be used in calculating the statistic of interest.

-----Original Message-----
From: r-help-bounces_at_r-project.org<mailto:r-help-bounces_at_r-project.org> [mailto:r-help-bounces_at_r-project.org<mailto:r-help-bounces_at_r-project.org>] On Behalf Of Wenjin Mao Sent: Tuesday, May 24, 2011 11:22 AM
To: Meyners, Michael
Cc: r-help_at_r-project.org<mailto:r-help_at_r-project.org> Subject: Re: [R] help on permutation/randomization test

Thank you, Michael.

I don't think those data for the same group can be treated as repeated measurements. Let's say I have 1000 observations from group 1 and 1500 obs from group 2. Some of the 1000 objects of group 1 entered the system at the same time and may effect each other; same for the other group. It's hard to measure the heaviness of the dependency.

Even after some twist or transformation, the correlation can be reduced, the R function "permtest" cannot handle such high sample size. Is there any other R function I can use?

Thanks,
Wenjin

On Tue, May 24, 2011 at 1:37 AM, Meyners, Michael <meyners.m_at_pg.com<mailto:meyners.m_at_pg.com>> wrote:

> I suspect you need to give more information/background on the data (though
> this is not primarily an R-related question; you might want to try other
> resources instead). Unless I'm missing something here, I cannot think of ANY
> reasonable test: A permutation (using permtest or anything else) would
> destroy the correlation structure and hence give invalid results, and the
> assumptions of parametric tests are violated as well. Basically, you only
> have two observations, one for each group; with some good will you might
> consider these as repeated measurements, but still on the same subject or
> whatsoever. Hence no way to discriminate the subject from a treatment
> effect. There is not enough data to permute or to rely a statistical test
> on. So unless you can get rid of the dependency within groups (or at least
> reasonably assume observations to be independent), I'm not very
> optimistic...
> HTH, Michael
>
> > -----Original Message-----
> > From: r-help-bounces_at_r-project.org<mailto:r-help-bounces_at_r-project.org> [mailto:r-help-bounces_at_r-<mailto:r-help-bounces_at_r->
> > project.org<http://project.org>] On Behalf Of Wenjin Mao
> > Sent: Monday, May 23, 2011 20:56
> > To: r-help_at_r-project.org<mailto:r-help_at_r-project.org>
> > Subject: [R] help on permutation/randomization test
> >
> > Hi,
> >
> > I have two groups of data of different size:
> > group A: x1, x2, ...., x_n;
> > group B: y1, y2, ...., y_m; (m is not equal to n)
> >
> > The two groups are independent but observations within each group are
> > not independent,
> > i.e., x1, x2, ..., x_n are not independent; but x's are independent
> > from y's
> >
> > I wonder if randomization test is still applicable to this case. Does
> > R have any function that can do this test for large m and n? I notice
> > that "permtest" can only handle small (m+n<22) samples.
> >
> > Thank you very much,
> > Wenjin
> >
> > ______________________________________________
> > R-help_at_r-project.org<mailto:R-help_at_r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

       [[alternative HTML version deleted]]



R-help_at_r-project.org<mailto:R-help_at_r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 25 May 2011 - 02:07:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 25 May 2011 - 06:10:10 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive