# Re: [R] Finding distance matrix for categorical data

From: kapil mahant <kapil_mahant_at_yahoo.com>
Date: Mon, 14 Jun 2010 12:11:11 +0530 (IST)

Thanks Guys ,
I am able to generate the distance matrix for mixed column values ( categorical and ordinal ) using daisy function

But can anyone tell me how to generate clusters out of it , The point being i dont know the number of cluster beforehand

Let me give an overview of the problem i am trying to solve is

Given a dataset , something like below

```                    var1         var2       var3             Size
element1-1   yes            x         present          100
element1-2   no             y         absent            294
element1-3   maybe       x         absent            45

```

The first 3 variables being categorical and last one being ordinal

I need to do the following
1 ) Generate clusters out of it ( let say they are "training clusters" )

I am able to compute distance matrix ( using daisy ) , but not sure how to create unknown numbers of clusters , dbscan work on a distance matrix 2 ) Once that is done i want to spread some new data points in the above plot space ( lets say these are "test points" ) 3) Find out which "test points" are lying within a boundary of any above discovered training clusters

If anyone know how to get this done then please let me know Its for an academic project and i am unable to make any progress

Thanks and Regards
K

From: Ingmar Visser <i.visser_at_uva.nl>

Sent: Fri, 11 June, 2010 2:19:33 PM
Subject: Re: [R] Finding distance matrix for categorical data

latent class analysis may be more appropriate depending on your hypotheses, best, Ingmar

e:

All,
>
>>How can we find a distance matrix for categorical data
>
>>ie. given a csv below
>
>> var1 var2 var3 var4
>>element1-1 yes x a k
>>element1-2 no y b l
>>element1-3 maybe y c m
>
>>how can i compute the distance matrix between all the elements
>
>>Actually i need it to create clusters on top of it
>
>>Thanks & Regards
>>Kapil
>
>
>> [[alternative HTML version deleted]]
>
>>______________________________________________
>R-help_at_r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>>and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 14 Jun 2010 - 06:44:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 14 Jun 2010 - 08:00:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.