[R] question regarding using weights in the hierarchical/ kmeans clustering process

From: eugen pircalabelu <eugen_pircalabelu_at_yahoo.com>
Date: Thu, 28 Feb 2008 12:02:10 -0800 (PST)


Hi R users!

I have a bit of a problem with using an hierarchical clustering algorithm:

 a<-c(1:15)
 b<-rep(seq(1:3), 5)
 c<-rnorm(15, 0,1)
 d<-c(sample(1:100, 15, replace=T))
 e<-c(sample(1:100, 15, replace=T))
 f<-c(sample(1:100, 15, replace=T))

 data<-data.frame(a,b,c,d,e,f)
 q<-data.frame(data$d, data$e, data$f)
 q<-scale(q)

What i want to do is to use an hierarchical cluster analysis on q data.frame, but using data$c as a weighting variable, could it be done? or is there a package that would let me use my weights in the clustering process, but an hierarchical process?

Another question:
say i wanted to t.test data$d, data$e but having again data$c as weights, how could it be done?

and the last 2 questions:
1. how can i weight a whole dataframe in order for me to keep my weights for a specific analysis, like cluster or t.test or any other analysis that does not let me incorporate a "weight" option? I am looking for something like in spss where i can weight a whole data frame and use it for a subsequent analysis, or something like the survey package from R but one that offers flexibility to use any analysis that i want (i saw that survey package offers limited connectivity to such analyses )  2. why does a kmeans cluster analysis offer a multitude of different results? I tried both several times
>cclust(scale(q), 3, verbose=T)
>kmeans(scale(q), 3)

 and they both seem vary unstable even with this small data.frame with respect to the cluster sizing, and i don't know why? Does it always behave like this ?

Thank you and have a great day!!        


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 28 Feb 2008 - 20:13:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 28 Feb 2008 - 20:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive