# Re: [R] strange (to me) p-value distribution

From: Wolfgang Huber <huber_at_ebi.ac.uk>
Date: Sat, 07 Jun 2008 23:40:22 +0100

Dear Mark,

try out the example code below. Such a p-value distribution often occurs   if you have "batch" effects, i.e. if the between-group variability is in fact less than the within-group variability.

In the example below, I do, for each row of x, a t-test between the values in the even and odd columns; for rt2, a "batch effect" has been added to columns 1:10.

hope this helps

Wolfgang

library("genefilter")

nr = 31000
nc = 20

x = matrix(rnorm(nr*nc), nrow=nr, ncol=nc)

rt1 = rowttests(x, factor(1:nc %% 2))

x[, 1:10] = x[, 1:10] + pi/2
rt2 = rowttests(x, factor(1:nc %% 2))

par(mfrow=c(2,1))
hist(rt1\$p.value, breaks=100, col="mistyrose") hist(rt2\$p.value, breaks=100, col="mistyrose")

Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber

Mark Kimpel a écrit 07/06/2008 18:39:
> I'm working with a genomic data-set with ~31k end-points and have
> performed an F-test across 5 groups for each end-point. The QA
> measurments on the individual micro-arrays all look good. One of the
> first things I do in my work-flow is take a look at the p-valued
> distribution. it is my understanding that, if the findings are due to
> chance alone, the p-value distribution should be uniform. In this case
> the histogram, even with 1000 break points, starts low on the left and
> climbs almost linearly to the right. In other words, very skewed
> towards high p-values. I understand that this could be happening by
> chance alone, but the same behavior is seen in the two contrasts of
> interest I looked at and I have seen it in a couple of our other
> genomic, high-dimensional experiments as well. I might also add that I
> looked at the actual numbers of genes with p-val < X and indeed, for
> each X < 0.05, there are far fewer sig. genes than one would expect by
> chance.
>
> I can't figure out what is causing this and, if there is a cause, I'd
> like to be able to tell the experimenter if it indicates a technical
> factor. I've had other experiments where the p-value dist approximates
> normal and of course those that have nice spikes at low p-values
> indicating we have some significant genes.
>
> I'm addressing this hear rather than to BioC because I suspect there
> is some basis statistical mechanism that could explain this. Is there?
>
> Mark
>

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 07 Jun 2008 - 22:44:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 08 Jun 2008 - 09:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.