Re: [R] Wilcoxon signed rank test and its requirements

From: Daniel Malter <daniel_at_umd.edu>
Date: Fri, 25 Jun 2010 11:44:49 -0700 (PDT)

Atte, I would not wonder if you got lost and confused by the certainly interesting methodological discussion that has been going on in this thread.

Since the helpers do not seem to converge/agree, I propose to you to use a different nonparametric approach: The bootstrap. The important thing about the bootstrap is that you do not have to be concerned with the questions that have been discussed in this thread.

In the bootstrap you draw repeatedly samples with replacement from your data and compute the statistic you are interested in (for you this is the mean). The beauty of this approach is i) that the bootstrap distribution is normal and ii) that you can directly compare the quantiles/confidence intervals of the bootstrap distribution.

Let's say you have x and y, which both come from Poisson distributions with relatively low means. Note that this resembles your data in that the distributions are asymmetric, but contain a considerable number of ties.

#set seed for random number generation

set.seed(123)

#simulate x and y (these would be your data)
x=rpois(100,3)
y=rpois(100,4)

#plot histograms for x and y

par(mfcol=c(1,2))
hist(x,breaks=length(unique(x)))
hist(y,breaks=length(unique(y)))

Now we sample with replacement from x and y (i.e., we draw one observation from x and one from y, and afterwards we put the drawn observation back into x and y, respectively). For each bootstrap of x and y, respectively, we sample exactly as many observations as there are in x and y, respectively (here 100). We then compute the statistic of interest of this bootstrap (here the mean). We repeat this process many times (here 1000).

n=1000 #number of bootstraps to draw
x.boot1=numeric(n)
y.boot1=numeric(n)
for(i in 1:1000){
  x.boot1[i]=mean(sample(x,length(x),replace=T))   y.boot1[i]=mean(sample(y,length(y),replace=T)) }

Doing this, we draw the bootstrap distribution of the mean of x and y, respectively. Note that the bootstrap distribution is normally distributed and unbiased (the latter automatically because we bootstrap the mean):

par(mfcol=c(1,2))
hist(x.boot1)
hist(y.boot1)

The simple(st) way of comparing these distributions is by checking whether their confidence intervals overlap or not. You get the 95-percent confidence intervals by

quantile(x.boot1,p=c(0.025,0.975))
quantile(y.boot1,p=c(0.025,0.975))

If they do not overlap, you would conclude that they are significantly different. In the one-sample case, you would just compare whether value of interest is within or outside the confidence interval.

Finally, note that the little loop that we have programmed to draw the bootstraps are already implemented in an R package. Using the bootstrap package, you could draw the bootstraps analogously by:

library(bootstrap)
x.boot2=bootstrap(x,nboot=1000,mean)
y.boot2=bootstrap(y,nboot=1000,mean)

The bootstrapped means are then stored in x.boot2$thetastar and y.boot2$thetastar.

Hope that helps,
Daniel

This process we repeatAnd now we draw many bootstraps, r

-- 
View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2268801.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 25 Jun 2010 - 18:47:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 Jun 2010 - 18:50:36 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive