# Re: [R] need help on computing double summation

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 16 Jun 2005 - 22:22:53 EST

If I understood correctly, the following might be simpler (dat is the data frame holding the data):

> sum(ave(dat\$x, dat\$id, FUN=scale, scale=FALSE) *
+ ave(dat\$y, dat\$id, FUN=scale, scale=FALSE)) [1] 6.229377

Andy

> From: Huntsinger, Reid
>
> You could do something like
>
> ids <- unique(mydata\$id)
> ans <- vector(length=length(ids), mode="list")
> for (i in ids) {
> g <- which(mydata\$id == i)
> ans[[i]] <- (length(g) - 1)*cov(mydata\$x[g], mydata\$y[g])
> }
> ans
>
> but cov() returns NA for length 1 vectors, so you'd want an
> if (length(g) ==
> 1) ans[i] <- 0 else ans[i] <- ... construction.
>
> This is almost brute force; you could also use tapply, as follows:
>
> sx <- tapply(mydata\$x,INDEX=mydata\$id,FUN=sum)
> sy <- tapply(mydata\$y,INDEX=mydata\$id,FUN=sum)
> sxy <- tapply(mydata\$x*mydata\$y, INDEX=mydata\$id, FUN=sum)
> n <- tapply(mydata\$id,INDEX=mydata\$id,FUN=length) # or use table()!
>
> and now your inner sum is
>
> sxy - 2*sx*(sy/n) + n*(sx/n)*(sy/n) = sxy - sx*sy/n
>
> so
>
> sum(sxy - sx*sy/n) should do.
>
> One more approach is to make your dataset into a list of data
> frames, one
> for each id, then use lapply(). The list can be created by
> split(). In one
> line,
>
> lapply(split(mydata,f=mydata\$id),function(z) (length(z\$x) -
> 1)*cov(z\$x,z\$y))
>
> and take sum(,na.rm=TRUE) to remove the NAs due to single ids
> that you want
> to be zeros.
>
> Reid Huntsinger
>
>
>
>
> Reid Huntsinger
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Kerry Bush
> Sent: Wednesday, June 15, 2005 11:41 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] need help on computing double summation
>
>
> Dear helpers in this forum,
>
> This is a clarified version of my previous
> questions in this forum. I really need your generous
> help on this issue.
>
> > Suppose I have the following data set:
> >
> >
> > ......
> >
>
> Now I want to compute the following double summation:
>
> sum_{i=1}^k
> sum_{j=1}^{n_i}(x_{ij}-mean(x_i))*(y_{ij}-mean(y_i))
>
> i is from 1 to k,
> indexing the ith subject id; and j is from 1 to n_i,
> indexing the jth observation for the ith subject.
>
> in the above expression, mean(x_i) is the mean of x
> values for the ith
> subject, mean(y_i) is the mean of y values for the ith
> subject.
>
> Is there a simple way to do this in R?
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachments,
> contains information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New Jersey, USA 08889), and/or its
> affiliates (which may be known outside the United States as
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
> Banyu) that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for the use
> of the individual or entity named on this message. If you
> are not the intended recipient, and have received this