Re: [R] Information criteria for kmeans

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed, 5 Dec 2007 11:24:38 +0000 (GMT)

This is not primarily an R question: if you tell us how you want to define it, we may be able to help you compute it. I presume you are talking about Schwarz (1978), which is not billed as an 'information criterion'.

AFAIK, all Gideon Schwarz did was to define a criterion for linear regression. People have applied it to other situations with a vector space of parameters. However in many clustering methods (including kmeans, and as for example in classification trees) there is also a combinatorial part of the fit: you optimize over both the cluster centres and the allocation of units to clusters. It does not come close to the Schwarz framework.

Nor does clustering fit into Akaike (1973, 1974)'s information framework.

There is discussion in Banfield & Raftery (1993) of a Schwarz-like criterion for clustering, but with a rather different derivation and I don't think it should be attributed to Schwarz.

On Wed, 5 Dec 2007, Serguei Kaniovski wrote:

>
> Hello,
>
> how is, for example, the Schwarz criterion is defined for kmeans? It should
> be something like:
>
> k <- 2
> vars <- 4
> nobs <- 100
>
> dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars),
> matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars))
>
> colnames(dat) <- paste("var",1:4)
>
> (cl <- kmeans(dat, k))
>
> schwarz <- sum(cl$withinss)+ vars*k*log(nobs)
>
> Thanks for your help,
> Serguei
> ________________________________________
> Austrian Institute of Economic Research (WIFO)
>
> P.O.Box 91 Tel.: +43-1-7982601-231
> 1103 Vienna, Austria Fax: +43-1-7989386
>
> Mail: Serguei.Kaniovski_at_wifo.ac.at
> http://www.wifo.ac.at/Serguei.Kaniovski
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 05 Dec 2007 - 11:32:56 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 05 Dec 2007 - 12:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.