From: Mark Cowley <m.cowley_at_garvan.org.au>

Date: Wed, 16 Jul 2008 15:32:30 +1000

[26] gtools_2.4.0

Mark Cowley, BSc (Bioinformatics)(Hons)

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jul 2008 - 05:37:55 GMT

Date: Wed, 16 Jul 2008 15:32:30 +1000

Dear list,

I am analysing a set of quantitative proteomics data from 16 patients
which has a large numbers of missing data, thus some proteins are only
detected once, upto a maximum of 16.

I want to test each protein for normality by the Shapiro Wilk test
(function shapiro.test in package stats), which can only be applied to
data with at least 3 measurements, which is fine. In the case where I
have only 3 observations, and two of those observations are identical,
then the shapiro.test produces negative P-values, which should never
happen.

This occurs for all of the situations that I have tried for 3 values,
where 2 are the same.

Reproducible code below:

# these are the data points that raised the problem

> shapiro.test(c(-0.644, 0.0566, 0.0566))

Shapiro-Wilk normality test

data: c(-0.644, 0.0566, 0.0566)

W = 0.75, p-value < 2.2e-16

> shapiro.test(c(-0.644, 0.0566, 0.0566))$p.value

[1] -7.69e-07

# note the verbose output shows a small, but positive P-value, but
when you extract that P using $p.value, it becomes negative
# various other tests

> shapiro.test(c(1,1,2))$p.value

[1] -8.35e-07

> shapiro.test(c(-1,-1,2))$p.value

[1] -1.03e-06

cheers,

Mark

> sessionInfo()

R version 2.6.1 (2007-11-26)

i386-apple-darwin8.10.1

locale:

en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:

[1] tcltk graphics grDevices datasets utils stats
methods base

other attached packages:

[1] qvalue_1.12.0 Cairo_1.3-5 RSvgDevice_0.6.3 SparseM_0.74 pwbc_0.1 [6] mjcdev_0.1 tigrmev_0.1 slfa_0.1 sage_0.1 qtlreaper_0.1 [11] pajek_0.1 mjcstats_0.1 mjcspot_0.1 mjcgraphics_0.1 mjcaffy_0.1 [16] haselst_0.1 geomi_0.1 geo_0.1 genomics_0.1 cor_0.1 [21] bootstrap_0.1 blat_0.1 bitops_1.0-4 mjcbase_0.1 gdata_2.3.1

[26] gtools_2.4.0

Mark Cowley, BSc (Bioinformatics)(Hons)

Peter Wills Bioinformatics Centre

Garvan Institute of Medical Research, Sydney, Australia

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 16 Jul 2008 - 05:37:55 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 16 Jul 2008 - 16:32:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*