# Re: [R] Why are lagged correlations typically negative?

From: Thomas Lumley <tlumley_at_u.washington.edu>
Date: Fri 25 Aug 2006 - 01:27:04 EST

On Thu, 24 Aug 2006, Bliese, Paul D LTC USAMH wrote:

> Recently, I was working with some lagged designs where a vector of
> observations at one time was used to predict a vector of observations at
> another time using a lag 1 design. In the work, I noticed a lot of
> negative correlations, so I ran a simple simulation with 2 matched
> points. The crude simulation example below shows that the correlation
> can be -1 or +1, but interestingly if you do this basic simulation
> thousands of times, you get negative correlations 66 to 67% of the time.
> If you simulate three matched observations instead of three you get
> negative correlations about 74% of the time and then as you simulate 4
> and more observations the number of negative correlations asymptotically
> approaches an equal 50% for negative versus positive correlations
> (though then with 100 observations one has 54% negative correlations).
> Creating T1 and T2 so they are related (and not correlated 1 as in the
> crude simulation) attenuates the effect. A more advanced simulation is
> provided below for those interested.
>
> Can anyone explain why this occurs in a way a non-mathematician is
> likely to understand?

Consider the two points out of three case from the viewpoint of the middle point. The correlation is positive if the previous point is lower and the following point is higher, or vice versa. It is negative if the previous and following points are both higher or both lower.

Now, if the middle point is higher than the first point it is probably higher than average, and so it has a more than 50% chance of also being higher than the third point. Similarly, if it is lower than the first point it is likely to be lower than the third point.

So negative correlation is more likely than positive.

Working out the covariance may be useful even for non-mathematicians. Call the three points X,Y,Z

cov(X-Y, Y-Z) = cov(X,Y)-cov(Y,Y)-cov(X,Z)+cov(Y,Z)

• 0 - var(Y) - 0 - 0

-thomas

```Thomas Lumley			Assoc. Professor, Biostatistics
tlumley@u.washington.edu	University of Washington, Seattle

______________________________________________
```
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri Aug 25 01:39:00 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 25 Aug 2006 - 02:24:36 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.