Re: [R] Correlations by group

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Mon 24 Jul 2006 - 22:32:04 EST

On 7/24/06, Peter J. Lee <peterjl@bilkent.edu.tr> wrote:
> I'm aware that S N Krishna asked the same
> question. However, I have failed to implement the
> posted solution for running rank order
> correlations on multiple subsets of data using the by() function.
>
> Here is my problem:
>
> Take a set of data from two subjects, who
> provided numerical infant mortality (IM) estimates for five countries:
>
> sub <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
> #grouping variable = 5 rows x 2 subjects
> est <- c(60, 20, 260, 160, 42, 2, 1, 3,
> 7, 12) #response variable = 5 estimates x 2 subjects
> im <- c(4, 5, 7, 8, 10, 4, 5, 7, 8, 10) #actual IM values x 2 subjects
> data <- cbind(sub, est, im)
> data
>
> Using the by() function:
>
> by(data, sub, function(x) cor(est, im, method = "spearman"))

The calculation in your function does not depend on x so its giving a constant return value. Try:

  by(data, sub, function(x) cor(x[,2], x[,3], method = "spearman"))

or

   tapply(1:length(sub), sub, function(i) cor(est[i], im[i], method = "spearman"))

or either the following which returns correlation matrices instead of the correlations:

   by(data[,2:3], sub, function(x) cor(x, method = "spearman"))    by(data[,2:3], sub, cor, method = "spearman")

>
> does result in two correlation coefficients. But
> instead of by subject, the est x im correlation
> for the entire set is reported, and then assigned
> to both subjects. This can be checked using:
>
> cor(est, im, method = "spearman")
>
> Nevertheless, the true coeff's and p-values should be:
>
> sub[1] cor.coef = 0.1 p > .1
> sub[2] cor.coef = 0.9 p < .05
>
> I find it peculiar that running a simple regression by groups does work:
>
> by(data, sub, function(x) lm(est ~ im, data = x))
>
> indicating that perhaps I'm using the wrong
> grouping function for correlations. I'm using a
> fairly standard Pentium 4 running Windows XP.
>
> On occasion I am required to calculate up to a
> quarter of a million individual correlations, so
> any help would be very much appreciated.
>
> Best wishes,
>
> Peter James Lee
> _________________________
>
> Peter James Lee
> Assistant Professor
>
> Psikoloji Bölümü
> Bilkent University
> Bilkent
> Ankara
> Turkey
> 06800
>
> e-mail: peterjl@bilkent.edu.tr
> office: (90) 312 290 1807
> home: (90) 312 290 3447
> website: http://www.bilkent.edu.tr/~peterjl/index.html
> _________________________
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon Jul 24 22:43:35 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 00:19:54 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.