Re: [R] cluster.stats

From: Laura Poggio <laura.poggio_at_gmail.com>
Date: Sat, 14 Jun 2008 22:04:43 +0100


Thank you very much for all the info and support. Now I managed to make it working on a small subset of the original data set. I think that the first error message I got (Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript) logical subscript too long) is generated when the 2 objects required by cluster.stats do not have the same length.

Thanks!
Laura

> Dear Laura,
>
> I have R 2.6.0. I tried dist on a vector of length 200,000 and it told me
> that it is too long. Theoretically, if you have 260,000 observations, the
> length of the dist object should be 260,000*259,999/2, which is too large
> for our computers, I guess. Which means that unfortunately cluster.stats
> won't work for such a large data set, because it needs the full casewise
> dissimilarity information.
>
> I don't understand how you managed to produce a dist object of length
> of only 130,000 out of your data, but it certainly doesn't give all
> pairwise distance information for 260,000 points and therefore cannot be
> used in cluster.stats with a clustering vector of length 260,000 or so.
>
> Sorry,
> Christian
>
> On Sat, 14 Jun 2008, Laura Poggio wrote:
>
> > Thank. See below.
> >
> > Laura
> >
> > 2008/6/14 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
> >
> >> What does str(ddata) give?
> >
> >
> > Class 'dist' atomic [1:130816] 69.2 117.1 145.6 179.9 195.6 ...
> >
> >
> >>
> >> dcent doesn't make sense as input for cluster.stats, because you need a
> >> dissimilarity matrix between all objects.
> >>
> >
> > Yes I know ... I simply try to see if something was changing with different
> > structure of data
> >
> >
> >
> >>
> >> Christian
> >>
> >> On Sat, 14 Jun 2008, Laura Poggio wrote:
> >>
> >> I am sorry I did not provide enough information.
> >>> I am not using img later, but data that is data.frame.
> >>> I wrote that img is a "image" just to explain what kind of data is coming
> >>> from, but the object I am using is data and it is a data.frame (checked
> >>> many
> >>> times).
> >>>
> >>> I am not using as.dist, but dist in order to calculate the distance matrix
> >>> among the data I have. Then the whole code I am using is:
> >>>
> >>> data <- <- as(img, "data.frame")[1:1] #(where img is an image 256x256
> >>> px)
> >>> kl <- kmeans(data, 5)
> >>> library(fpc)
> >>> ddata <- dist(data)
> >>> dcent <- dist(kl$centers)
> >>>
> >>> cluster.stats(ddata, kl$cluster)
> >>> cluster.stats(dcent, kl$cluster)
> >>>
> >>> In both cases I got the same error:
> >>> Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
> >>> logical subscript too long
> >>>
> >>> Below the structure of the different objects is detailed below:
> >>> data is "'data.frame': 262144 obs. of 1 variable"
> >>> kl$centers is "num [1:5, 1]"
> >>> kl$cluster is "Named int [1:262144]"
> >>>
> >>> I hope it is more informative. I am sorry but I did not find any
> >>> explanation
> >>> for the error message I am getting.
> >>>
> >>> Thank you very much in advance
> >>>
> >>> Laura
> >>>
> >>>
> >>>
> >>> 2008/6/14 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
> >>>
> >>> The given information is not enough to tell you what's going on. as.dist
> >>>> doesn't appear in the given code and it's not clear to me what kind of
> >>>> object img is ("a small image" doesn't tell me what R makes of it).
> >>>> Also, try to read the help pages first and find out whether img is of the
> >>>> format that is required by the functions. And check (using str for
> >>>> example)
> >>>> whether "data" is what you expect it to be.
> >>>>
> >>>> Christian
> >>>>
> >>>>
> >>>> On Sat, 14 Jun 2008, Laura Poggio wrote:
> >>>>
> >>>> Thank you very much for your answer.
> >>>>
> >>>>> I tried to run the function on my data and now I am getting this message
> >>>>> of
> >>>>> error
> >>>>> Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
> >>>>> logical subscript too long
> >>>>>
> >>>>> Below the code I am using (version2.7.0 of R with all packages updated):
> >>>>>
> >>>>> data <- <- as(img, "data.frame")[1:1] #(where img is a small image
> >>>>> 256
> >>>>> px
> >>>>> x 256 px)
> >>>>> kl <- kmeans(data, 5)
> >>>>> library(fpc)
> >>>>> cluster.stats(data, kl$cluster)
> >>>>>
> >>>>> Thank you for any hints on the reasons and meaning of the error!
> >>>>>
> >>>>> Laura
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2008/6/13 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
> >>>>>
> >>>>> Dear Laura,
> >>>>>
> >>>>>>
> >>>>>> Dear list,
> >>>>>>
> >>>>>> I just tried to use the function cluster.stat in the package fpc.
> >>>>>>> I just have a couple of questions about the syntax:
> >>>>>>>
> >>>>>>> cluster.stats(d,clustering,alt.clustering=NULL,
> >>>>>>> silhouette=TRUE,G2=FALSE,G3=FALSE)
> >>>>>>>
> >>>>>>> 1) the distance object (d) is an object obtained by the function
> >>>>>>> dist()
> >>>>>>> on
> >>>>>>> my own original matrix?
> >>>>>>>
> >>>>>>>
> >>>>>>> d is allowed to be an object of class dist or a dissimilarity matrix.
> >>>>>> The answer to your question depends on what your "original matrix" is.
> >>>>>> If
> >>>>>> it is something on which you can compute a distance by dist(), you're
> >>>>>> right,
> >>>>>> at least if dist() delivers the distance you are interested in.
> >>>>>>
> >>>>>> 2) clustering is the clusters vector as result of one of the many
> >>>>>>
> >>>>>> clustering
> >>>>>>> methods?
> >>>>>>>
> >>>>>>>
> >>>>>>> The help page tells you what clustering can be. So it could be the
> >>>>>> clustering/partition vector of a clustering method or it could be
> >>>>>> something
> >>>>>> else. Note that cluster.stats doesn't depend on any particular
> >>>>>> clustering
> >>>>>> method. It computes the statistics regardless of where the clustering
> >>>>>> vector
> >>>>>> comes from.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Christian
> >>>>>>
> >>>>>>
> >>>>>> Thank you very much in advance and sorry for such basic question, but
> >>>>>> I
> >>>>>>
> >>>>>>> did
> >>>>>>> not manage to clarify my mind.
> >>>>>>>
> >>>>>>> Laura
> >>>>>>>
> >>>>>>> [[alternative HTML version deleted]]
> >>>>>>>
> >>>>>>> ______________________________________________
> >>>>>>> R-help_at_r-project.org mailing list
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>>
> >>>>>>>
> >>>>>>> *** --- ***
> >>>>>>>
> >>>>>> Christian Hennig
> >>>>>> University College London, Department of Statistical Science
> >>>>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >>>>>> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> *** --- ***
> >>>> Christian Hennig
> >>>> University College London, Department of Statistical Science
> >>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >>>> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>>>
> >>>>
> >>>
> >> *** --- ***
> >> Christian Hennig
> >> University College London, Department of Statistical Science
> >> Gower St., London WC1E 6BT, phone +44 207 679 1698
> >> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
> >>
> >
> > ?[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish_at_stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 20:09:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 22:30:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive