# Re: [R] Boostrap p-value in regression [indirectly related to R]

From: John Fox <jfox_at_mcmaster.ca>
Date: Tue, 22 May 2007 09:11:54 -0400

Dear Wolfgang,

I agree that it's preferable to compute the two-sided p-value without assuming symmetry. Another, equivalent, way of thinking about this is to use t^2 for the two-sided test in place of t.

BTW, the formula used in my appendix (for the one-sided p-value) is from Davison and Hinkley, I believe, and differs trivially from the one in Efron and Tibshirani.

Regards,
John

John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of
> Viechtbauer Wolfgang (STAT)
> Sent: Monday, May 21, 2007 10:41 AM
> To: r-help_at_stat.math.ethz.ch
> Subject: [R] Boostrap p-value in regression [indirectly related to R]
>
> Hello All,
>
> Despite my preference for reporting confidence intervals, I
> need to obtain a p-value for a hypothesis test in the context
> of regression using bootstrapping. I have read John Fox's
> chapter on bootstrapping regression models and have consulted
> Efron & Tibshirani's "An Introduction to the Bootstrap" but I
> just wanted to ask the experts here for some feedback to make
> sure that I am not doing something wrong.
>
> Let's take a simplified example where the model includes one
> independent variable and the idea is to test H0: beta1 = 0
> versus Ha: beta1 != 0.
>
> ########################################################
>
> ### generate some sample data
>
> n <- 50
> xi <- runif(n, min=1, max=5)
> yi <- 0 + 0.2 * xi + rnorm(n, mean=0, sd=1)
>
> ### fit simple regression model
>
> mod <- lm(yi ~ xi)
> summary(mod)
> b1 <- coef(mod)[2]
> t1 <- coef(mod)[2] / coef(summary(mod))[2,2]
>
> ### 1000 bootstrap replications using (X,Y)-pair resampling
>
> t1.star <- rep(NA,1000)
>
> for (i in 1:1000) {
>
> ids <- sample(1:n, replace=TRUE)
> newyi <- yi[ids]
> newxi <- xi[ids]
> mod <- lm(newyi ~ newxi)
> t1.star[i] <- ( coef(mod)[2] - b1) / coef(summary(mod))[2,2]
>
> }
>
> ### get bootstrap p-value
>
> hist(t1.star, nclass=40)
> abline(v=t1, lwd=3)
> abline(v=-1*t1, lwd=3)
> 2 * mean( t1.star > abs(t1) )
>
> ########################################################
>
> As suggested in the chapter on bootstrapping regression
> models by John Fox, the bootstrap p-value is 2 times the
> proportion of bootstrap t-values (with b1 subtracted so that
> we get the distribution under H0) larger than the absolute
> value of the actual t-value observed in the data.
>
> Doesn't this assume that the bootstrap sampling distribution
> is symmetric? And if yes, would it then not be more reasonable to
> calculate:
>
> mean( abs(t1.star) > abs(t1) )
>
> or in words: the number of bootstrap t-values that are more
> extreme on either side of the bootstrap distribution than the
> actual t-value observed?
>
> Any suggestions or comments would be appreciated!
>
> --
> Wolfgang Viechtbauer
> Department of Methodology and Statistics University of
> Maastricht, The Netherlands http://www.wvbauer.com
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help