Re: [R] Help with Mahalanobis

From: Jose Claudio Faria <joseclaudio.faria_at_terra.com.br>
Date: Sat 09 Jul 2005 - 20:46:39 EST

Christian Hennig wrote:

> Dear Jose,
> 
> normal mixture clustering (mclust) operates on points times variables data
> and not on a distance matrix. Therefore
> it doesn't make sense to compute Mahalanobis distances before using
> mclust.
> Furthermore, cluster analysis based on distance matrices (hclust or pam,
> say) operates on a point by point distance matrix (be it Mahalanobis or
> something else). You show a group by group matrix below, for which I don't
> see any purpose in cluster analysis.
> Have you looked at function mahalanobis?
> 
> Christian

Dear Christian,

First of all, thanks for the reply!

So, multivariate analysis is not my field of domain, I'm studying this because it is necessary in my works.

I'm using 'iris' only as an example of my real problem, because I normally work with many response variables (5 or more), with replicates (10 or more) of many groups (20 or more). In these cases, I think, the final dendogram using 'mclust' package is not very good/clear.

I learned, in these cases, that the generalized distance of Mahalanobis, obtained as in the prior example (see script), is one of the best choice to study the similarity between the groups. Do you agree?

If yes, I need to cluster the objects from this matrix of distances between the groups. My option by 'mclust' package was because I'm studying also it, no more, and I think that, for the purpose, it works nice.

Could you help me about another (and simple) choice of analyze?

JCFaria

> On Fri, 8 Jul 2005, Jose Claudio Faria wrote:
> 
> 

>>Dear R list,
>>
>>I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data
>>as obtained below:
>>
>>Squared Distance to Species From Species:
>>
>> Setosa Versicolor Virginica
>>Setosa 0 89.86419 179.38471
>>Versicolor 89.86419 0 17.20107
>>Virginica 179.38471 17.20107 0
>>
>>These distances were obtained with proc 'CANDISC' of SAS, please,
>>see Output 21.1.2: Iris Data: Squared Mahalanobis Distances from
>>http://www.id.unizh.ch/software/unix/statmath/sas/sasdoc/stat/chap21/sect19.htm
>>
>> From these distances my intention is to make a cluster analysis as below, using
>>the package 'mclust':
>>
>>In prior mail, my basic question was: how to obtain this matrix with R
>>from 'iris' data?
>>
>>Well, I think that the basic soluction to calculate this distances is:
>>
>>#
>># --- Begin R script 1 ---
>>#
>>x = as.matrix(iris[,1:4])
>>tra = iris[,5]
>>
>>man = manova(x ~ tra)
>>
>># Mahalanobis
>>E = summary(man)$SS[2] #Matrix E
>>S = as.matrix(E$Residuals)/man$df.residual
>>InvS = solve(S)
>>ms = matrix(unlist(by(x, tra, mean)), byrow=T, ncol=ncol(x))
>>colnames(ms) = names(iris[1:4])
>>rownames(ms) = c('Set', 'Ver', 'Vir')
>>D2.12 = (ms[1,] - ms[2,])%*%InvS%*%(ms[1,] - ms[2,])
>>print(D2.12)
>>D2.13 = (ms[1,] - ms[3,])%*%InvS%*%(ms[1,] - ms[3,])
>>print(D2.13)
>>D2.23 = (ms[2,] - ms[3,])%*%InvS%*%(ms[2,] - ms[3,])
>>print(D2.23)
>>#
>># --- End R script 1 ---
>>#
>>
>>Well, I would like to generalize a soluction to obtain
>>the matrices like 'Mah' (below) or a complete matrix like in the
>>Output 21.1.2. Somebody could help me?
>>
>>#
>># --- Begin R script 2 ---
>>#
>>
>>Mah = c( 0,
>> 89.86419, 0,
>> 179.38471, 17.20107, 0)
>>
>>n = 3
>>D = matrix(0, n, n)
>>
>>nam = c('Set', 'Ver', 'Vir')
>>rownames(D) = nam
>>colnames(D) = nam
>>
>>k = 0
>>for (i in 1:n) {
>> for (j in 1:i) {
>> k = k+1
>> D[i,j] = Mah[k]
>> D[j,i] = Mah[k]
>> }
>>}
>>
>>D=sqrt(D) #D2 -> D
>>
>>library(mclust)
>>dendroS = hclust(as.dist(D), method='single')
>>dendroC = hclust(as.dist(D), method='complete')
>>
>>win.graph(w = 3.5, h = 6)
>>split.screen(c(2, 1))
>>screen(1)
>>plot(dendroS, main='Single', sub='', xlab='', ylab='', col='blue')
>>
>>screen(2)
>>plot(dendroC, main='Complete', sub='', xlab='', col='red')
>>#
>># --- End R script 2 ---
>>#
>>
>>I always need of this type of analysis and I'm not founding how to make it in
>>the CRAN documentation (Archives, packages: mclust, cluster, fpc and mva).
>>
>>Regards,
>>--
>>Jose Claudio Faria
>>Brasil/Bahia/UESC/DCET
>>Estatistica Experimental/Prof. Adjunto
>>mails:
>> joseclaudio.faria@terra.com.br
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>
> 
> 
> *** NEW ADDRESS! ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> 
> Esta mensagem foi verificada pelo E-mail Protegido Terra.
> Scan engine: McAfee VirusScan / Atualizado em 08/07/2005 / Versão: 4.4.00 - Dat 4531
> Proteja o seu e-mail Terra: http://mail.terra.com.br/
> 
> 


-- 
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
  joseclaudio.faria@terra.com.br
  jc_faria@uesc.br
  jc_faria@uol.com.br
tel: 73-3634.2779

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Sat Jul 09 21:26:19 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:26 EST