Re: [Rd] p.adjust(<NA>s), was "Re: [BioC] limma and p-values"

From: Martin Maechler <>
Date: Sun 09 Jan 2005 - 07:29:39 EST

I've thought more and made experiements with R code versions and just now committed a new version of p.adjust() to R-devel --> which does sensible NA handling by default and *additionally* has an "na.rm" argument (set to FALSE by default). The extended 'Examples' secion on the help page shows how the new NA handling is typically much more sensible than using "na.rm = TRUE".


>>>>> "MM" == Martin Maechler <>
>>>>> on Sat, 8 Jan 2005 17:19:23 +0100 writes:

>>>>> "GS" == Gordon K Smyth <>
>>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:     

    MM> <.............>

    GS> p.adjust() unfortunately gives incorrect results when
    GS> 'p' includes NAs.  The results from topTable are
    GS> correct.  topTable() takes care to remove NAs before
    GS> passing the values to p.adjust().

    MM> There's at least one bug in p.adjust(): The "hommel"
    MM> method currently does not work at all with NAs (and I
    MM> have an uncommitted fix ready for this bug).  OTOH, the
    MM> current version of p.adjust() ``works'' with NA's, apart
    MM> from Hommel's method, but by using "n = length(p)" in
    MM> the correction formulae, i.e. *including* the NAs for
    MM> determining sample size `n' {my fix to "hommel" would do     MM> this as well}.
    MM> My question is what p.adjust() should do when there are
    MM> NA's more generally, or more specifically which `n' to
    MM> use in the correction formula. Your proposal amounts to
    MM> ``drop NA's and forget about them till the very end''
    MM> (where they are wanted in the result), i.e., your sample
    MM> size `n' would be sum(! instead of length(p).

    MM> To me it doesn't seem obvious that this setting "n =
    MM> #{non-NA observations}" is desirable for all P-value
    MM> adjustment methods. One argument for keeping ``n = #{all
    MM> observations}'' at least for some correction methods is     MM> the following "continuity" one:
    MM> If only a few ``irrelevant'' (let's say > 0.5) P-values
    MM> are replaced by NA, the adjusted relevant small P-values
    MM> shouldn't change much, ideally not at all.  I'm really
    MM> no scholar on this topic, but e.g. for "holm" I think I
    MM> would want to keep ``full n'' because of the above
    MM> continuity argument.  BTW, for "fdr", I don't see a
    MM> straightforward way to achieve the desired continuity.
    MM> 5D Of course, p.adjust() could adopt the possibility of
    MM> chosing how NA's should be treated e.g. by another
    MM> argument `` = TRUE/FALSE'' and hence allow both
    MM> versions.

    MM> Feedback very welcome, particularly from ``P-value     MM> experts'' ;-)

    MM> Martin Maechler, ETH Zurich mailing list Received on Sun Jan 09 06:59:35 2005

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:02:31 EST