Re: [R] About clustering techniques

From: Christian Hennig <chrish_at_stats.ucl.ac.uk>
Date: Tue, 29 Jul 2008 15:27:54 +0100 (BST)

A quick comment on this: imputation is an option to make things technically work, but it is not
necessarily good. Imputation always introduces some noise, ie, it fakes information that is not really there.

Whether it is good depends strongly on the data, the situation and the imputation method ("random" often not being a very sensible choice).

Christian

On Tue, 29 Jul 2008, ctu_at_bigred.unl.edu wrote:

> Hi Paco,
> I got the same problem with you before. Thus, I just impute the missing
> values
> For example:
>
> newdata<-as.matrix(impute(olddata, fun="random"))
> then I believe that you could analyze your data.
>
> Hopefully it helps.
> Chunhao
>
>
> Quoting pacomet <pacomet_at_gmail.com>:
>
>> Hello R users
>>
>> It's some time I am playing with a dataset to do some cluster
>> analysis. The
>> data set consists of 14 columns being geographical coordinates and
>> monthly
>> temperatures in annual files
>>
>> latitutde - longitude - temperature 1 -..... - temperature 12
>>
>> I have some missing values in some cases, maybe there are 8 monthly
>> valid
>> values at some points with four non valid. I don't want to supress the
>> whole
>> row with 8 good/4 bad values as I wanna try annual and monthy
>> analysis.
>>
>> I first tried kmeans but found a problem with missing values. When
>> trying
>> without omitting missing values kmeans gives an error and when
>> excluding
>> invalid data too many values are excluded in some years of the data
>> series.
>>
>> Now I have been reading about pam, pamk and clara, I think they can
>> handle
>> missing values. But can't find out the way to perform the analysis
>> with
>> these functions. As I'm not an statistics nor an R expert the fpc or
>> cluster
>> package documentation is not enough for me. If you know about a
>> website or a
>> tutorial explaining the way to use that functions, with examples to
>> check if
>> possible, please post them.
>>
>> Any other help or suggestion is greatly appreciated.
>>
>> Thanks in advance
>>
>> Paco
>>
>> --
>> _________________________
>> El ponent la mou, el llevant la plou
>> Usuari Linux registrat: 363952
>> -------
>> Fotos: http://picasaweb.google.es/pacomet
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 29 Jul 2008 - 14:36:10 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Jul 2008 - 18:33:03 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive