From: Martin Maechler <maechler_at_stat.math.ethz.ch>

Date: Sun 09 Jan 2005 - 03:19:23 EST

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Jan 09 02:27:56 2005

Date: Sun 09 Jan 2005 - 03:19:23 EST

>>>>> "GS" == Gordon K Smyth <smyth@wehi.edu.au>

>>>>> on Sat, 8 Jan 2005 01:11:30 +1100 (EST) writes:

<.............>

GS> p.adjust() unfortunately gives incorrect results when GS> 'p' includes NAs. The results from topTable are GS> correct. topTable() takes care to remove NAs before GS> passing the values to p.adjust().

There's at least one bug in p.adjust(): The "hommel" method
currently does not work at all with NAs (and I have an
uncommitted fix ready for this bug).

OTOH, the current version of p.adjust() ``works'' with NA's,
apart from Hommel's method, but by using "n = length(p)" in the
correction formulae, i.e. *including* the NAs for determining
sample size `n' {my fix to "hommel" would do this as well}.

My question is what p.adjust() should do when there are NA's more generally, or more specifically which `n' to use in the correction formula. Your proposal amounts to ``drop NA's and forget about them till the very end'' (where they are wanted in the result), i.e., your sample size `n' would be sum(!is.na(p)) instead of length(p).

To me it doesn't seem obvious that this setting "n = #{non-NA observations}" is desirable for all P-value adjustment methods. One argument for keeping ``n = #{all observations}'' at least for some correction methods is the following "continuity" one:

If only a few ``irrelevant'' (let's say > 0.5) P-values are
replaced by NA, the adjusted relevant small P-values shouldn't
change much, ideally not at all. I'm really no scholar on this
topic, but e.g. for "holm" I think I would want to keep ``full
n'' because of the above continuity argument.
BTW, for "fdr", I don't see a straightforward way to achieve the
desired continuity.

5D

Of course, p.adjust() could adopt the possibility of chosing how
NA's should be treated e.g. by another argument ``use.na = TRUE/FALSE''
and hence allow both versions.

Feedback very welcome, particularly from ``P-value experts'' ;-)

Martin Maechler, ETH Zurich

R-devel@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Sun Jan 09 02:27:56 2005

*
This archive was generated by hypermail 2.1.8
: Fri 18 Mar 2005 - 09:02:30 EST
*