From: Laura Poggio <laura.poggio_at_gmail.com>

Date: Sat, 14 Jun 2008 22:04:43 +0100

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 20:09:29 GMT

Date: Sat, 14 Jun 2008 22:04:43 +0100

Thank you very much for all the info and support.
Now I managed to make it working on a small subset of the original data set.
I think that the first error message I got (Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript) logical subscript too long)
is generated when the 2 objects required by cluster.stats do not have the same length.

Thanks!

Laura

- Original message ------- Da: Christian Hennig <chrish_at_stats.ucl.ac.uk> Inviato: 14.6.'08, 20:46

> Dear Laura,

*>
**> I have R 2.6.0. I tried dist on a vector of length 200,000 and it told me
**> that it is too long. Theoretically, if you have 260,000 observations, the
**> length of the dist object should be 260,000*259,999/2, which is too large
**> for our computers, I guess. Which means that unfortunately cluster.stats
**> won't work for such a large data set, because it needs the full casewise
**> dissimilarity information.
**>
**> I don't understand how you managed to produce a dist object of length
**> of only 130,000 out of your data, but it certainly doesn't give all
**> pairwise distance information for 260,000 points and therefore cannot be
**> used in cluster.stats with a clustering vector of length 260,000 or so.
**>
**> Sorry,
**> Christian
**>
**> On Sat, 14 Jun 2008, Laura Poggio wrote:
**>
**> > Thank. See below.
**> >
**> > Laura
**> >
**> > 2008/6/14 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
**> >
**> >> What does str(ddata) give?
**> >
**> >
**> > Class 'dist' atomic [1:130816] 69.2 117.1 145.6 179.9 195.6 ...
**> >
**> >
**> >>
**> >> dcent doesn't make sense as input for cluster.stats, because you need a
**> >> dissimilarity matrix between all objects.
**> >>
**> >
**> > Yes I know ... I simply try to see if something was changing with different
**> > structure of data
**> >
**> >
**> >
**> >>
**> >> Christian
**> >>
**> >> On Sat, 14 Jun 2008, Laura Poggio wrote:
**> >>
**> >> I am sorry I did not provide enough information.
**> >>> I am not using img later, but data that is data.frame.
**> >>> I wrote that img is a "image" just to explain what kind of data is coming
**> >>> from, but the object I am using is data and it is a data.frame (checked
**> >>> many
**> >>> times).
**> >>>
**> >>> I am not using as.dist, but dist in order to calculate the distance matrix
**> >>> among the data I have. Then the whole code I am using is:
**> >>>
**> >>> data <- <- as(img, "data.frame")[1:1] #(where img is an image 256x256
**> >>> px)
**> >>> kl <- kmeans(data, 5)
**> >>> library(fpc)
**> >>> ddata <- dist(data)
**> >>> dcent <- dist(kl$centers)
**> >>>
**> >>> cluster.stats(ddata, kl$cluster)
**> >>> cluster.stats(dcent, kl$cluster)
**> >>>
**> >>> In both cases I got the same error:
**> >>> Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
**> >>> logical subscript too long
**> >>>
**> >>> Below the structure of the different objects is detailed below:
**> >>> data is "'data.frame': 262144 obs. of 1 variable"
**> >>> kl$centers is "num [1:5, 1]"
**> >>> kl$cluster is "Named int [1:262144]"
**> >>>
**> >>> I hope it is more informative. I am sorry but I did not find any
**> >>> explanation
**> >>> for the error message I am getting.
**> >>>
**> >>> Thank you very much in advance
**> >>>
**> >>> Laura
**> >>>
**> >>>
**> >>>
**> >>> 2008/6/14 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
**> >>>
**> >>> The given information is not enough to tell you what's going on. as.dist
**> >>>> doesn't appear in the given code and it's not clear to me what kind of
**> >>>> object img is ("a small image" doesn't tell me what R makes of it).
**> >>>> Also, try to read the help pages first and find out whether img is of the
**> >>>> format that is required by the functions. And check (using str for
**> >>>> example)
**> >>>> whether "data" is what you expect it to be.
**> >>>>
**> >>>> Christian
**> >>>>
**> >>>>
**> >>>> On Sat, 14 Jun 2008, Laura Poggio wrote:
**> >>>>
**> >>>> Thank you very much for your answer.
**> >>>>
**> >>>>> I tried to run the function on my data and now I am getting this message
**> >>>>> of
**> >>>>> error
**> >>>>> Error in as.dist(dmat[clustering == i, clustering == i]) : (subscript)
**> >>>>> logical subscript too long
**> >>>>>
**> >>>>> Below the code I am using (version2.7.0 of R with all packages updated):
**> >>>>>
**> >>>>> data <- <- as(img, "data.frame")[1:1] #(where img is a small image
**> >>>>> 256
**> >>>>> px
**> >>>>> x 256 px)
**> >>>>> kl <- kmeans(data, 5)
**> >>>>> library(fpc)
**> >>>>> cluster.stats(data, kl$cluster)
**> >>>>>
**> >>>>> Thank you for any hints on the reasons and meaning of the error!
**> >>>>>
**> >>>>> Laura
**> >>>>>
**> >>>>>
**> >>>>>
**> >>>>>
**> >>>>>
**> >>>>> 2008/6/13 Christian Hennig <chrish_at_stats.ucl.ac.uk>:
**> >>>>>
**> >>>>> Dear Laura,
**> >>>>>
**> >>>>>>
**> >>>>>> Dear list,
**> >>>>>>
**> >>>>>> I just tried to use the function cluster.stat in the package fpc.
**> >>>>>>> I just have a couple of questions about the syntax:
**> >>>>>>>
**> >>>>>>> cluster.stats(d,clustering,alt.clustering=NULL,
**> >>>>>>> silhouette=TRUE,G2=FALSE,G3=FALSE)
**> >>>>>>>
**> >>>>>>> 1) the distance object (d) is an object obtained by the function
**> >>>>>>> dist()
**> >>>>>>> on
**> >>>>>>> my own original matrix?
**> >>>>>>>
**> >>>>>>>
**> >>>>>>> d is allowed to be an object of class dist or a dissimilarity matrix.
**> >>>>>> The answer to your question depends on what your "original matrix" is.
**> >>>>>> If
**> >>>>>> it is something on which you can compute a distance by dist(), you're
**> >>>>>> right,
**> >>>>>> at least if dist() delivers the distance you are interested in.
**> >>>>>>
**> >>>>>> 2) clustering is the clusters vector as result of one of the many
**> >>>>>>
**> >>>>>> clustering
**> >>>>>>> methods?
**> >>>>>>>
**> >>>>>>>
**> >>>>>>> The help page tells you what clustering can be. So it could be the
**> >>>>>> clustering/partition vector of a clustering method or it could be
**> >>>>>> something
**> >>>>>> else. Note that cluster.stats doesn't depend on any particular
**> >>>>>> clustering
**> >>>>>> method. It computes the statistics regardless of where the clustering
**> >>>>>> vector
**> >>>>>> comes from.
**> >>>>>>
**> >>>>>> Best regards,
**> >>>>>> Christian
**> >>>>>>
**> >>>>>>
**> >>>>>> Thank you very much in advance and sorry for such basic question, but
**> >>>>>> I
**> >>>>>>
**> >>>>>>> did
**> >>>>>>> not manage to clarify my mind.
**> >>>>>>>
**> >>>>>>> Laura
**> >>>>>>>
**> >>>>>>> [[alternative HTML version deleted]]
**> >>>>>>>
**> >>>>>>> ______________________________________________
**> >>>>>>> R-help_at_r-project.org mailing list
**> >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
**> >>>>>>> PLEASE do read the posting guide
**> >>>>>>> http://www.R-project.org/posting-guide.html
**> >>>>>>> and provide commented, minimal, self-contained, reproducible code.
**> >>>>>>>
**> >>>>>>>
**> >>>>>>> *** --- ***
**> >>>>>>>
**> >>>>>> Christian Hennig
**> >>>>>> University College London, Department of Statistical Science
**> >>>>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
**> >>>>>> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>>>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>>>>>
**> >>>>>>
**> >>>>>>
**> >>>>> *** --- ***
**> >>>> Christian Hennig
**> >>>> University College London, Department of Statistical Science
**> >>>> Gower St., London WC1E 6BT, phone +44 207 679 1698
**> >>>> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>>> <http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>>>
**> >>>>
**> >>>
**> >> *** --- ***
**> >> Christian Hennig
**> >> University College London, Department of Statistical Science
**> >> Gower St., London WC1E 6BT, phone +44 207 679 1698
**> >> chrish@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche>
**> >>
**> >
**> > ?[[alternative HTML version deleted]]
**> >
**> > ______________________________________________
**> > R-help_at_r-project.org mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**> *** --- ***
**> Christian Hennig
**> University College London, Department of Statistical Science
**> Gower St., London WC1E 6BT, phone +44 207 679 1698
**> chrish_at_stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 14 Jun 2008 - 20:09:29 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 14 Jun 2008 - 22:30:56 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*