[R] Correlations by group

From: Peter J. Lee <peterjl_at_bilkent.edu.tr>
Date: Mon 24 Jul 2006 - 20:34:11 EST

I'm aware that S N Krishna asked the same question. However, I have failed to implement the posted solution for running rank order
correlations on multiple subsets of data using the by() function.

Here is my problem:

Take a set of data from two subjects, who provided numerical infant mortality (IM) estimates for five countries:

         sub <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2) #grouping variable = 5 rows x 2 subjects

         est <- c(60, 20, 260, 160, 42, 2, 1, 3, 7, 12) #response variable = 5 estimates x 2 subjects

         im <- c(4, 5, 7, 8, 10, 4, 5, 7, 8, 10) #actual IM values x 2 subjects
         data <- cbind(sub, est, im)

Using the by() function:

         by(data, sub, function(x) cor(est, im, method = "spearman"))

does result in two correlation coefficients. But instead of by subject, the est x im correlation for the entire set is reported, and then assigned to both subjects. This can be checked using:

         cor(est, im, method = "spearman")

Nevertheless, the true coeff's and p-values should be:

         sub[1] cor.coef = 0.1 p > .1
         sub[2] cor.coef = 0.9 p < .05

I find it peculiar that running a simple regression by groups does work:

         by(data, sub, function(x) lm(est ~ im, data = x))

indicating that perhaps I'm using the wrong grouping function for correlations. I'm using a fairly standard Pentium 4 running Windows XP.

On occasion I am required to calculate up to a quarter of a million individual correlations, so any help would be very much appreciated.

Best wishes,

Peter James Lee

Peter James Lee
Assistant Professor

Psikoloji Bölümü
Bilkent University

e-mail: peterjl@bilkent.edu.tr
office: (90) 312 290 1807
home: (90) 312 290 3447

        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon Jul 24 21:02:24 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 00:19:53 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.