From: Peter Dalgaard <P.Dalgaard_at_biostat.ku.dk>

Date: Wed 24 Jan 2007 - 14:35:15 GMT

Date: Wed 24 Jan 2007 - 14:35:15 GMT

Jose Quesada wrote:

> Thanks Charles, Martin,

*>
**> Substantial improvement with the vectorized solution. Here is a quick benchmark:
**>
**> # The loop-based solution:
**> nestedCos = function (x) {
**> if (is(x, "Matrix") ) {
**> cos = array(NA, c(ncol(x), ncol(x)))
**> for (i in 2:ncol(x)) {
**> for (j in 1:(i - 1)) {
**> cos[i, j] = cosine(x[, i], x[, j])
**> }
**> }
**> }
**> return(cos)
**> }
**> # Charles C. Berry's vectorized approach
**> flatCos = function (x) {
**> res = crossprod( x , x )
**> diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res )))
**> cos = diagnl %*% res %*% diagnl
**> return(cos)
**> }
**>
**> Benchmarking:
**>
**>
**>> system.time(for(i in 1:10)nestedCos(x))
**>>
**> (I stopped because it was taking too long)
**> Timing stopped at: 139.37 3.82 188.76 NA NA
**>
**>> system.time(for(i in 1:10)flatCos(x))
**>>
**> [1] 0.43 0.00 0.48 NA NA
**>
**> #------------------------------------------------------
**> As much as I like to have faster code, I'm still wondering WHY flatCos gets the same results; i.e., why multiplying the inverse sqrt root of the diagonal of x BY x, then BY the diagonal again produces the expected result. I checked the wikipedia page for crossprod and other sources, but it still eludes me. I can see that scaling by the sqrt of the diagonal once makes sense with 'res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by the diagonal again.
**>
**>
*

Didn't follow this thread too closely, but the point would seem to be
that the inner product of two normalized vectors is the cosine of the
angle. So basically, you want

crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl

I think, BTW, that another version not requiring Matrix is

Cr <- crossprod(X)

D <- sqrt(diag(C))

Cr/outer(D, D)

> Maybe trying to attack a simpler problem might help my understanding: e.g., calculating the cos of a column to all other colums of x (that is, the inner part of the nested loop). How would that work in a vectorized way? I'm trying to get some general technique that I can reuse later from this excellent answer.

*>
**> Thanks,
**> -Jose
**>
**>
**>> I am rusty on 'Matrix', but I see there are crossprod methods for those
**>> classes.
**>>
**>> res <- crossprod( x , x )
**>>
**>> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
**>> something like
**>>
**>> diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
**>>
**>>
**>
**> OOPS! Better make that
**>
**> diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )
**>
**>
**>> final.res <- diagnl %*% res %*% diagnl
**>>
**>> should do it.
**>>
**>>
**>
**>
*

-- O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu Jan 25 13:09:40 2007

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Thu 25 Jan 2007 - 02:30:30 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*