Re: [R] negative P-values with shapiro.test

From: Martin Maechler <>
Date: Wed, 16 Jul 2008 18:02:47 +0200

>>>>> "MC" == Mark Cowley <> >>>>> on Wed, 16 Jul 2008 15:32:30 +1000 writes:

    MC> Dear list,

    MC> I am analysing a set of quantitative proteomics data
    MC> from 16 patients which has a large numbers of missing
    MC> data, thus some proteins are only detected once, upto a
    MC> maximum of 16.  I want to test each protein for
    MC> normality by the Shapiro Wilk test (function
    MC> shapiro.test in package stats), which can only be
    MC> applied to data with at least 3 measurements, which is
    MC> fine. In the case where I have only 3 observations, and
    MC> two of those observations are identical, then the
    MC> shapiro.test produces negative P-values, which should
    MC> never happen.  This occurs for all of the situations
    MC> that I have tried for 3 values, where 2 are the same.

Yes. Since all such tests are location- and scale-invariant, you can reproduce it with


The irony is that the original papers by Roydon and the R help page all assert that the P-value for n = 3 is exact !

OTOH, the paper [Roydon (1982), Appl.Stat 31, p115-124] clearly states that

        X(1) < X(2) < X(3) ... < X(n)

i.e., does not allow "ties" (two equal values).

If the exact formula in the paper were evaluated exactly (instead with a rounded value of about 6 digits), the "exact P-value" would be exactly 0.

Now that would count as a bug in the paper I think. More about this tomorrow or so.

Martin Maechler, ETH Zurich

    MC> Reproducible code below:
    MC> # these are the data points that raised the problem
>> shapiro.test(c(-0.644, 0.0566, 0.0566))

    MC> Shapiro-Wilk normality test

    MC> data: c(-0.644, 0.0566, 0.0566)     MC> W = 0.75, p-value < 2.2e-16

>> shapiro.test(c(-0.644, 0.0566, 0.0566))$p.value

    MC> [1] -7.69e-07
    MC> # note the verbose output shows a small, but positive P-value, but  
    MC> when you extract that P using $p.value, it becomes negative
    MC> # various other tests

>> shapiro.test(c(1,1,2))$p.value
MC> [1] -8.35e-07

>> shapiro.test(c(-1,-1,2))$p.value
    MC> [1] -1.03e-06

    MC> cheers,

    MC> Mark

>> sessionInfo()

    MC> R version 2.6.1 (2007-11-26)
    MC> i386-apple-darwin8.10.1

    MC> locale:
    MC> en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

    MC> attached base packages:
    MC> [1] tcltk     graphics  grDevices datasets  utils     stats      
    MC> methods   base

    MC> other attached packages:
    MC> [1] qvalue_1.12.0    Cairo_1.3-5      RSvgDevice_0.6.3  
    MC> SparseM_0.74     pwbc_0.1
    MC> [6] mjcdev_0.1       tigrmev_0.1      slfa_0.1          
    MC> sage_0.1         qtlreaper_0.1
    MC> [11] pajek_0.1        mjcstats_0.1     mjcspot_0.1       
    MC> mjcgraphics_0.1  mjcaffy_0.1
    MC> [16] haselst_0.1      geomi_0.1        geo_0.1           
    MC> genomics_0.1     cor_0.1
    MC> [21] bootstrap_0.1    blat_0.1         bitops_1.0-4      
    MC> mjcbase_0.1      gdata_2.3.1

    MC> [26] gtools_2.4.0
    MC> -----------------------------------------------------
    MC> Mark Cowley, BSc (Bioinformatics)(Hons)

    MC> Peter Wills Bioinformatics Centre     MC> Garvan Institute of Medical Research, Sydney, Australia

    MC> ______________________________________________
    MC> mailing list
    MC> PLEASE do read the posting guide     MC> and provide commented, minimal, self-contained, reproducible code. mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jul 2008 - 16:07:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 17 Jul 2008 - 17:31:49 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive