# RE: [R] Interpreting Results of Bootstrapping

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sun 11 Jul 2004 - 19:40:34 EST

Hi!

Simply plot(x1,x2): you will see that there is one point (number 23) at (x1,x2) = (25.34,6.744) which is a very long way from all the other points (which, among themselves, form a somewhat diffuse cluster with some suggestion of further structure).

When you bootstrap, the correlation you obtain in any sample will depend on whether or not this outlying point is included in the sample. If it is included, this single point will generate a relatively high value of the correlation coefficient simply because it is such a long way from all the others (i.e. it is highly influential).

If it is not included, then the diffuse character of the other points will generate a very low value of the correlation coefficient.

> cor(x1,x2)
[1] 0.7471931
> cor(x1[-23],x2[-23])
[1] 0.03914653

Therefore your bootstrap distribution will have two peaks: one peak, around 0.75, corresponding to the bootstrap samples which include this outlying point, and the other, around 0, corresponding to the bootstrap samples which do not include it.

This is the explanation and, at the same time, the interpretation.

Best wishes,
Ted.

On 11-Jul-04 Y C Tao wrote:
> I tried to bootstrap the correlation between two
> variables x1 and x2. The resulting distribution has
> two distinct peaks, how should I interprete it?
>
> The original code is attached.
>
> Y. C. Tao
>
> ----------------
>
> library(boot);
>
> my.correl<-function(d, i) cor(d[i,1], d[i,2])
>
> x1<-c(-2.612,-0.7859,-0.5229,-1.246,1.647,1.647,0.1811,
> -0.07097,0.8711,0.4323,0.1721,2.143,4.33,0.5002,
> 0.4015,-0.5225,2.538,0.07959,-0.6645,4.521,-1.371,
> 0.3327,25.24,-0.5417,2.094,0.6064,-0.4476,-0.5891,
> -0.08879,-0.9487,-2.459e-05,-0.03887,0.2116,-0.0625,1.555,
> 0.2069,-0.2142,-0.807,-0.6499,2.384,-0.02063,1.179,
> -0.0003586,-1.408,0.6928,0.689,0.1854,0.4351,0.5663,
> 0.07171,-0.07004);
>
> x2<-c( 0.08742,0.2555,-0.00337,0.03995,-1.208,-1.208,-0.001374,
> -1.282,1.341,-0.9069,-0.2011,1.557,0.4517,-0.4376,
> 0.4747,0.04965,-0.1668,-0.6811,-0.7011,-1.457,0.04652,
> -1.117,6.744,-1.332,0.1327,-0.1479,-2.303,0.1235,
> 0.5916,0.05018,-0.7811,0.5869,-0.02608,0.9594,-0.1392,
> 0.4089,0.1468,-1.507,-0.6882,-0.1781,0.5434,-0.4957,
> 0.02557,-1.406,-0.5053,-0.7345,-1.314,0.3178,-0.2108,
> 0.4186,-0.03347);
>
> b<-boot(cbind(x1, x2), my.correl, 2000)
> hist(b\$t, breaks=50)

[The above rearranged to have 7 values in each conplete line]

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972
```Date: 11-Jul-04                                       Time: 10:40:34
------------------------------ XFMail ------------------------------

______________________________________________
```
R-help@stat.math.ethz.ch mailing list