Re: [R] Help in kmeans

From: raji sankaran <raji.sankaran_at_gmail.com>
Date: Thu, 07 Apr 2011 04:06:31 +0530

Hi,

 I have herewith attached the results of the 2 commands.

> *set.seed(1234)
>
kmeans_model<-kmeans((SepalLength+SepalWidth+PetalLength+PetalWidth),centers=3)
> kmeans_model$cluster

* [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 2 3 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 [101] 3 2 3 3 3 3 2 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3 3 3 3 2 2 3 3 3 2 3 3 3 2 3 3 3 2 3 3 2
> *kmeansM<-kmeans(dataFrame[,c(1,2,3,4)],centers=3)
> kmeansM$cluster
* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51   1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3  52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102   3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
  2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2 2 3 We can notice that, the first one is less accurate that the second results.Can you please let me know how i can get the first command with column names to get the result of the second one?

Many thanks.

Regards,
Raji

On Thu, Apr 7, 2011 at 4:02 AM, raji sankaran <raji.sankaran_at_gmail.com>wrote:

> Hi,
>
> Thanks for the information.But , i am already using set.seed().My problem
> is that, when i use column names instead of column indices, the result seems
> to be less accurate consistently.Hence, we wanted to understand how kmeans
> differentiates between column names and column indices. Is there any way we
> can bridge the gap so that we get the same result for column names and
> column indices?
>
> Regards,
> Raji
>
> On Wed, Apr 6, 2011 at 5:30 PM, Christian Hennig <chrish_at_stats.ucl.ac.uk
> > wrote:
>
>> I'm not going to comment on column names, but this is just to make you
>> aware that the results of k-means depend on random initialisation.
>>
>> This means that it is possible that you get different results if you run
>> it several times. It basically gives you a local optimum and there may be
>> more than one of these.
>> Use set.seed to see whether this explains your problem.
>>
>> Best regards,
>> Christian
>>
>>
>> On Wed, 6 Apr 2011, Raji wrote:
>>
>> Hi All,
>>>
>>> I was using the following command for performing kmeans for Iris
>>> dataset.
>>>
>>> Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3)
>>>
>>> This was giving proper results for me. But, in my application we generate
>>> the R commands dynamically and there was a requirement that the column
>>> names
>>> will be sent instead of column indices to the R commands.Hence, to
>>> incorporate this, i tried using the R commands in the following way.
>>>
>>>
>>> kmeans_model<-kmeans((SepalLength+SepalWidth+PetalLength+PetalWidth),centers=3)
>>>
>>> or
>>>
>>>
>>> kmeans_model<-kmeans(as.matrix(SepalLength,SepalWidth,PetalLength,PetalWidth),centers=3)
>>>
>>> In both the ways, we found that the results are different from what we
>>> saw
>>> with the first command (with column indices).
>>>
>>> can you please let us know what is going wrong here.If so, can you
>>> please
>>> let us know how the column names can be used in kmeans to obtain the
>>> correct
>>> results?
>>>
>>> Many thanks,
>>> Raji
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Help-in-kmeans-tp3430433p3430433.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> *** --- ***
>> Christian Hennig
>> University College London, Department of Statistical Science
>> Gower St., London WC1E 6BT, phone +44 207 679 1698
>> chrish_at_stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>>
>
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 22:38:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 06 Apr 2011 - 22:40:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive