From: Martin Maechler <maechler_at_stat.math.ethz.ch>

Date: Wed, 16 Jul 2008 18:02:47 +0200

>> shapiro.test(c(-1,-1,2))$p.value

MC> [1] -1.03e-06

MC> [26] gtools_2.4.0

>>>>> "MC" == Mark Cowley <m.cowley_at_garvan.org.au> >>>>> on Wed, 16 Jul 2008 15:32:30 +1000 writes:

MC> Dear list,

MC> I am analysing a set of quantitative proteomics data MC> from 16 patients which has a large numbers of missing MC> data, thus some proteins are only detected once, upto a MC> maximum of 16. I want to test each protein for MC> normality by the Shapiro Wilk test (function MC> shapiro.test in package stats), which can only be MC> applied to data with at least 3 measurements, which is MC> fine. In the case where I have only 3 observations, and MC> two of those observations are identical, then the MC> shapiro.test produces negative P-values, which should MC> never happen. This occurs for all of the situations MC> that I have tried for 3 values, where 2 are the same.

Yes. Since all such tests are location- and scale-invariant, you can reproduce it with

shapiro.test(c(0,0,1))

The irony is that the original papers by Roydon and the R help page all assert that the P-value for n = 3 is exact !

OTOH, the paper [Roydon (1982), Appl.Stat 31, p115-124] clearly states that

X(1) < X(2) < X(3) ... < X(n)

i.e., does not allow "ties" (two equal values).

If the exact formula in the paper were evaluated exactly (instead with a rounded value of about 6 digits), the "exact P-value" would be exactly 0.

Now that would count as a bug in the paper I think. More about this tomorrow or so.

Martin Maechler, ETH Zurich

MC> Reproducible code below:

MC> # these are the data points that raised the problem

>> shapiro.test(c(-0.644, 0.0566, 0.0566))

MC> Shapiro-Wilk normality test

MC> data: c(-0.644, 0.0566, 0.0566) MC> W = 0.75, p-value < 2.2e-16

>> shapiro.test(c(-0.644, 0.0566, 0.0566))$p.value

MC> [1] -7.69e-07 MC> # note the verbose output shows a small, but positive P-value, but MC> when you extract that P using $p.value, it becomes negative MC> # various other tests

>> shapiro.test(c(1,1,2))$p.value

MC> [1] -8.35e-07

>> shapiro.test(c(-1,-1,2))$p.value

MC> [1] -1.03e-06

MC> cheers,

MC> Mark

