From: Wolfgang Huber <huber_at_ebi.ac.uk>

Date: Sat, 07 Jun 2008 23:40:22 +0100

Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 07 Jun 2008 - 22:44:50 GMT

Date: Sat, 07 Jun 2008 23:40:22 +0100

Dear Mark,

try out the example code below. Such a p-value distribution often occurs if you have "batch" effects, i.e. if the between-group variability is in fact less than the within-group variability.

In the example below, I do, for each row of x, a t-test between the values in the even and odd columns; for rt2, a "batch effect" has been added to columns 1:10.

hope this helps

Wolfgang

library("genefilter")

nr = 31000

nc = 20

x = matrix(rnorm(nr*nc), nrow=nr, ncol=nc)

rt1 = rowttests(x, factor(1:nc %% 2))

## add a batch effect

x[, 1:10] = x[, 1:10] + pi/2

rt2 = rowttests(x, factor(1:nc %% 2))

par(mfrow=c(2,1))

hist(rt1$p.value, breaks=100, col="mistyrose")
hist(rt2$p.value, breaks=100, col="mistyrose")

Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

Mark Kimpel a écrit 07/06/2008 18:39:

> I'm working with a genomic data-set with ~31k end-points and have

*> performed an F-test across 5 groups for each end-point. The QA
**> measurments on the individual micro-arrays all look good. One of the
**> first things I do in my work-flow is take a look at the p-valued
**> distribution. it is my understanding that, if the findings are due to
**> chance alone, the p-value distribution should be uniform. In this case
**> the histogram, even with 1000 break points, starts low on the left and
**> climbs almost linearly to the right. In other words, very skewed
**> towards high p-values. I understand that this could be happening by
**> chance alone, but the same behavior is seen in the two contrasts of
**> interest I looked at and I have seen it in a couple of our other
**> genomic, high-dimensional experiments as well. I might also add that I
**> looked at the actual numbers of genes with p-val < X and indeed, for
**> each X < 0.05, there are far fewer sig. genes than one would expect by
**> chance.
**>
**> I can't figure out what is causing this and, if there is a cause, I'd
**> like to be able to tell the experimenter if it indicates a technical
**> factor. I've had other experiments where the p-value dist approximates
**> normal and of course those that have nice spikes at low p-values
**> indicating we have some significant genes.
**>
**> I'm addressing this hear rather than to BioC because I suspect there
**> is some basis statistical mechanism that could explain this. Is there?
**>
**> Mark
**>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 07 Jun 2008 - 22:44:50 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sun 08 Jun 2008 - 09:30:37 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*