# Re: [R] Finding distance matrix for categorical data

From: kapil mahant <kapil_mahant_at_yahoo.com>
Date: Mon, 14 Jun 2010 12:11:11 +0530 (IST)

Thanks Guys ,
I am able to generate the distance matrix for mixed column values ( categorical and ordinal ) using daisy function

But can anyone tell me how to generate clusters out of it , The point being i dont know the number of cluster beforehand

Let me give an overview of the problem i am trying to solve is

Given a dataset , something like below

```                    var1         var2       var3             Size
element1-1   yes            x         present          100
element1-2   no             y         absent            294
element1-3   maybe       x         absent            45

```

The first 3 variables being categorical and last one being ordinal

I need to do the following
1 ) Generate clusters out of it ( let say they are "training clusters" )

I am able to compute distance matrix ( using daisy ) , but not sure how to create unknown numbers of clusters , dbscan work on a distance matrix 2 ) Once that is done i want to spread some new data points in the above plot space ( lets say these are "test points" ) 3) Find out which "test points" are lying within a boundary of any above discovered training clusters

If anyone know how to get this done then please let me know Its for an academic project and i am unable to make any progress

Thanks and Regards
K

