[R] pam() clustering for large data sets

From: Lilia Nedialkova <lbravewo_at_princeton.edu>
Date: Mon, 16 May 2011 18:26:25 -0400


Hello everyone,

I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam().

I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point.

I can't use clara(), because I want to use the already computed distances.

What is it that people do to perform clustering for such large data sets?

I would greatly appreciate any form of suggestions that people may have.

Thank you very much in advance.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 17 May 2011 - 01:16:21 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 17 May 2011 - 10:40:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive