- This message: [ Message body ] [ More options ]
- Related messages: [ Next message ] [ Previous message ] [ In reply to ] [ RE: [R] Interpreting Results of Bootstrapping ] [ Next in thread ]

From: Y C Tao <nov_tao_at_yahoo.com>

Date: Sun 11 Jul 2004 - 23:55:50 EST

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 12 00:07:47 2004

Date: Sun 11 Jul 2004 - 23:55:50 EST

You are right, the outlier caused the problem. Using
Spearman or Kendall's correlation seems to solve the
problem. Thanks!

Y. C. Tao

- Ted.Harding@nessie.mcc.ac.uk wrote:

*> Hi!**>*

> Simply plot(x1,x2): you will see that there is one

*> point**> (number 23) at (x1,x2) = (25.34,6.744) which is a**> very**> long way from all the other points (which, among**> themselves,**> form a somewhat diffuse cluster with some suggestion**> of**> further structure).**>**> When you bootstrap, the correlation you obtain in**> any sample**> will depend on whether or not this outlying point is**> included**> in the sample. If it is included, this single point**> will generate**> a relatively high value of the correlation**> coefficient simply**> because it is such a long way from all the others**> (i.e. it is**> highly influential).**>**> If it is not included, then the diffuse character of**> the other**> points will generate a very low value of the**> correlation**> coefficient.**>**> > cor(x1,x2)**> [1] 0.7471931**> > cor(x1[-23],x2[-23])**> [1] 0.03914653**>**> Therefore your bootstrap distribution will have two**> peaks: one**> peak, around 0.75, corresponding to the bootstrap**> samples which**> include this outlying point, and the other, around**> 0, corresponding**> to the bootstrap samples which do not include it.**>**> This is the explanation and, at the same time, the**> interpretation.**>**> Best wishes,**> Ted.**>**> On 11-Jul-04 Y C Tao wrote:**> > I tried to bootstrap the correlation between two**> > variables x1 and x2. The resulting distribution**> has**> > two distinct peaks, how should I interprete it?**> >**> > The original code is attached.**> >**> > Y. C. Tao**> >**> > ----------------**> >**> > library(boot);**> >**> > my.correl<-function(d, i) cor(d[i,1], d[i,2])**> >**> >*

> x1<-c(-2.612,-0.7859,-0.5229,-1.246,1.647,1.647,0.1811,

> >

*> -0.07097,0.8711,0.4323,0.1721,2.143,4.33,0.5002,**> >**> 0.4015,-0.5225,2.538,0.07959,-0.6645,4.521,-1.371,**> >**> 0.3327,25.24,-0.5417,2.094,0.6064,-0.4476,-0.5891,**> >*

> -0.08879,-0.9487,-2.459e-05,-0.03887,0.2116,-0.0625,1.555,

> >

*> 0.2069,-0.2142,-0.807,-0.6499,2.384,-0.02063,1.179,**> >**> -0.0003586,-1.408,0.6928,0.689,0.1854,0.4351,0.5663,**> > 0.07171,-0.07004);**> >**> > x2<-c(*

> 0.08742,0.2555,-0.00337,0.03995,-1.208,-1.208,-0.001374,

> >

*> -1.282,1.341,-0.9069,-0.2011,1.557,0.4517,-0.4376,**> >*

> 0.4747,0.04965,-0.1668,-0.6811,-0.7011,-1.457,0.04652,

> >

*> -1.117,6.744,-1.332,0.1327,-0.1479,-2.303,0.1235,**>**> >*

> 0.5916,0.05018,-0.7811,0.5869,-0.02608,0.9594,-0.1392,

> >

*> 0.4089,0.1468,-1.507,-0.6882,-0.1781,0.5434,-0.4957,**> >*

> 0.02557,-1.406,-0.5053,-0.7345,-1.314,0.3178,-0.2108,

> > 0.4186,-0.03347);

*> >**> > b<-boot(cbind(x1, x2), my.correl, 2000)**> > hist(b$t, breaks=50)**>**> [The above rearranged to have 7 values in each**> conplete line]**>**>**>*

>

> E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk>

*> Fax-to-email: +44 (0)870 167 1972**> Date: 11-Jul-04**> Time: 10:40:34**> ------------------------------ XFMail**> ------------------------------*

>

R-help@stat.math.ethz.ch mailing list

https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Mon Jul 12 00:07:47 2004

*
This archive was generated by hypermail 2.1.8
: Wed 03 Nov 2004 - 22:54:50 EST
*