From: Bob Green <bgreen_at_dyson.brisnet.org.au>

Date: Tue 25 Oct 2005 - 06:51:51 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Oct 25 06:53:12 2005

Date: Tue 25 Oct 2005 - 06:51:51 EST

Hello,

I am hoping for some advice on using R - my experience with statistical programs has been limited to SPSS.

I have been using a textual analysis program and wanted to add some rigour to making a choice between two models of self-reported cannabis effects. To do this, I need to compare the two resulting word co-occurence matrices. The program itself,doesn't offer this as an option and the person who wrote the program, told me where the numerical data was located and offered this advice:

"For two vectors a and b, the cosine similarity is: therefore cos theta = a . b / magn(a)*magn(b) & that the formula is really identical for matrices. The dot product (or inner product) is calculated by multiplying each pair of corresponding elements from the two matrices, and summing these products. Calculating the magnitude of a matrix is really the same as a vector: square each element of the matrix, sum the squares, then take the square root of the sum."

I have been advised that when matrices are multiplied I should use %*%, whereas if I want a point estimate I omit the %.

I have tried to run syntax with and without the %, however my efforts at either syntax below (a) or syntax (b) remain unsuccessful.

With (a) I obtain the message - Warning message: Error in A %*% B : non-conformable arguments

With (b) I obtain the message - Warning message:NAs produced by integer overflow in: sum(A * A) * sum(B * B) :

(a) Matrix

testA <-read.table("c:\\matrixA.txt",header=T) testB <-read.table("c:\\matrixB.txt",header=T)

A<-as.matrix(testA)

B<-as.matrix(testB)

cosineDissimilarity <- sum(A%*%B)/sqrt(sum(A%*%A)*sum(B%*%B))

(b) pointwise

testA <-read.table("c:\\matrixA.txt",header=T) testB <-read.table("c:\\matrixB.txt",header=T)

A<-as.matrix(testA)

B<-as.matrix(testB)

cosineDissimilarity <- sum(A*B)/sqrt(sum(A*A)*sum(B*B))

Any suggestions are appreciated, regarding either the above logic about analysis selection or the necessary syntax.

regards

Bob

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Tue Oct 25 06:53:12 2005

*
This archive was generated by hypermail 2.1.8
: Tue 25 Oct 2005 - 08:43:19 EST
*