From: Jose Claudio Faria <joseclaudio.faria_at_terra.com.br>

Date: Sat 09 Jul 2005 - 03:57:15 EST

ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))

D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,]) print(D2.13)

D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,]) print(D2.23)

*#
*

# --- End R script 1 ---

*#
*

}

}

Date: Sat 09 Jul 2005 - 03:57:15 EST

I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data as obtained below:

Squared Distance to Species From Species:

Setosa Versicolor Virginica Setosa 0 89.86419 179.38471 Versicolor 89.86419 0 17.20107 Virginica 179.38471 17.20107 0

These distances were obtained with proc 'CANDISC' of SAS, please, see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm

From these distances my intention is to make a cluster analysis as below, using the package 'mclust':

In prior mail, my basic question was: how to obtain this matrix with R from 'iris' data?

Well, I think that the basic soluction to calculate this distances is:

*#
*

# --- Begin R script 1 ---

*#
*

x = as.matrix(iris[,1:4])

tra = iris[,5]

man = manova(x ~ tra)

# Mahalanobis

E = summary(man)$SS[2] #Matrix E S = as.matrix(E$Residuals)/man$df.residual InvS = solve(S)

ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))

colnames(ms) = names(iris[1:4]) rownames(ms) = c('Set', 'Ver', 'Vir') D2.12 = (ms[1,] - ms[2,])%*%InvS%*%(ms[1,] - ms[2,])print(D2.12)

D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,]) print(D2.13)

D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,]) print(D2.23)

# --- End R script 1 ---

Well, I would like to generalize a soluction to obtain the matrices like 'Mah' (below) or a complete matrix like in the Output 21.1.2. Somebody could help me?

*#
*

# --- Begin R script 2 ---

*#
*

Mah = c( 0, 89.86419, 0, 179.38471, 17.20107, 0)

n = 3

D = matrix(0, n, n)

nam = c('Set', 'Ver', 'Vir')

rownames(D) = nam

colnames(D) = nam

k = 0

for (i in 1:n) {

for (j in 1:i) {

k = k+1 D[i,j] = Mah[k] D[j,i] = Mah[k]

}

}

D=sqrt(D) #D2 -> D

library(mclust)

dendroS = hclust(as.dist(D), method='single')
dendroC = hclust(as.dist(D), method='complete')

win.graph(w = 3.5, h = 6)

split.screen(c(2, 1))

screen(1)

plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')

screen(2)

plot(dendroC, main='Complete', sub='', xlab='', col='red')

*#
*

# --- End R script 2 ---

*#
*

I always need of this type of analysis and I'm not founding how to make it in the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).

Regards,

-- Jose Claudio Faria Brasil/Bahia/UESC/DCET Estatistica Experimental/Prof. Adjunto mails: joseclaudio.faria@terra.com.br ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Sat Jul 09 04:02:15 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:26 EST
*