Re: [Rd] p.adjust(<NA>s), was 'Re: [BioC] limma and p-values'

From: Gordon K Smyth <smyth_at_wehi.EDU.AU>
Date: Tue 18 Jan 2005 - 09:10:42 EST

On Tue, January 18, 2005 7:45 am, Martin Maechler said:
>>>>>> "GS" == Gordon Smyth <smyth@wehi.edu.au>
>>>>>> on Sun, 16 Jan 2005 19:44:26 +1100 writes:
>
> GS> The new committed version of p.adjust() contains some
> GS> problems:
> >> p.adjust(c(0.05,0.5),method="hommel")
> GS> [1] 0.05 0.50
>
> GS> No adjustment!
>
> yes, but that's still better than what the current version of
> R 2.0.1 does, namely to give NA NA + two warnings ..

The R 2.0.1 version has some problems, no question, and needs to be fixed. Thanks for giving time to it. Given a choice though between a wrong answer and no answer/warning/error, I think I'd prefer the latter.

The problem with n=2 is easily fixed here because Hommel's method coincides with Hochberg's when n=2.

> GS> I can't see how the new treatment of NAs can be
> GS> justified. One needs to distinguish between NAs which
> GS> represent missing p-values and NAs which represent
> GS> unknown p-values. In virtually all applications giving
> GS> rise to NAs, the NAs represent missing p-values which
> GS> could not be computed because of missing data. In such
> GS> cases, the observed p-values should definitely be
> GS> adjusted as if the NAs weren't there, because NAs
> GS> represent p-values which genuinely don't exist.
>
> hmm, "definitely" being a bit strong. One could argue that
> ooonoe should use multiple imputation of the underlying missing
> data, or .. other scenarios.

Well, I'm sticking with "definitely" because it seems clear-cut. The purpose of adjustment methods is to maximise power while controling a chosen error rate (typically familywise error rate FWER or false discovery rate FDR). When the NAs represent missing p-values, it means that those null hypotheses have zero probability of being rejected. Hence the NA cases cannot add to FWER or FDR. Suppose you have p-values c(0.05,NA) corresponding to null hypotheses H1 and H2 and you want to control the FWER at 0.05. Then it is quite correct to reject H1 (and fail to reject H2). If H2 is TRUE then the FWER is exactly 0.05. If H2 is FALSE, then the FWER is lower. Hence the FWER is controlled at the desired level with no adjustment of the p-values. Doing any adjustment can only decrease power.

While imputation is a useful tool for making computations easier in some applications, I don't see how any good argument could be made for imputation or similar in the context of p.adjust(). Imputing data that agrees with the null hypotheses is equivalent to ignoring the null hypotheses. Imputing random data which rejects null hypothesis can only increase error rates.

Gordon

> I'll reply to your other, later, more detailed message
> separately and take the liberty to drop the other points here...
>
> Martin



R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue Jan 18 08:18:23 2005

This archive was generated by hypermail 2.1.8 : Tue 18 Jan 2005 - 08:21:32 EST