RE: [R] Interpreting Results of Bootstrapping

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sun 11 Jul 2004 - 19:40:34 EST


Hi!

Simply plot(x1,x2): you will see that there is one point (number 23) at (x1,x2) = (25.34,6.744) which is a very long way from all the other points (which, among themselves, form a somewhat diffuse cluster with some suggestion of further structure).

When you bootstrap, the correlation you obtain in any sample will depend on whether or not this outlying point is included in the sample. If it is included, this single point will generate a relatively high value of the correlation coefficient simply because it is such a long way from all the others (i.e. it is highly influential).

If it is not included, then the diffuse character of the other points will generate a very low value of the correlation coefficient.

  > cor(x1,x2)
  [1] 0.7471931
  > cor(x1[-23],x2[-23])
  [1] 0.03914653

Therefore your bootstrap distribution will have two peaks: one peak, around 0.75, corresponding to the bootstrap samples which include this outlying point, and the other, around 0, corresponding to the bootstrap samples which do not include it.

This is the explanation and, at the same time, the interpretation.

Best wishes,
Ted.

On 11-Jul-04 Y C Tao wrote:
> I tried to bootstrap the correlation between two
> variables x1 and x2. The resulting distribution has
> two distinct peaks, how should I interprete it?
>
> The original code is attached.
>
> Y. C. Tao
>
> ----------------
>
> library(boot);
>
> my.correl<-function(d, i) cor(d[i,1], d[i,2])
>
> x1<-c(-2.612,-0.7859,-0.5229,-1.246,1.647,1.647,0.1811,
> -0.07097,0.8711,0.4323,0.1721,2.143,4.33,0.5002,
> 0.4015,-0.5225,2.538,0.07959,-0.6645,4.521,-1.371,
> 0.3327,25.24,-0.5417,2.094,0.6064,-0.4476,-0.5891,
> -0.08879,-0.9487,-2.459e-05,-0.03887,0.2116,-0.0625,1.555,
> 0.2069,-0.2142,-0.807,-0.6499,2.384,-0.02063,1.179,
> -0.0003586,-1.408,0.6928,0.689,0.1854,0.4351,0.5663,
> 0.07171,-0.07004);
>
> x2<-c( 0.08742,0.2555,-0.00337,0.03995,-1.208,-1.208,-0.001374,
> -1.282,1.341,-0.9069,-0.2011,1.557,0.4517,-0.4376,
> 0.4747,0.04965,-0.1668,-0.6811,-0.7011,-1.457,0.04652,
> -1.117,6.744,-1.332,0.1327,-0.1479,-2.303,0.1235,
> 0.5916,0.05018,-0.7811,0.5869,-0.02608,0.9594,-0.1392,
> 0.4089,0.1468,-1.507,-0.6882,-0.1781,0.5434,-0.4957,
> 0.02557,-1.406,-0.5053,-0.7345,-1.314,0.3178,-0.2108,
> 0.4186,-0.03347);
>
> b<-boot(cbind(x1, x2), my.correl, 2000)
> hist(b$t, breaks=50)

[The above rearranged to have 7 values in each conplete line]



E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972
Date: 11-Jul-04                                       Time: 10:40:34
------------------------------ XFMail ------------------------------

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Jul 11 20:03:34 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 02:35:33 EST