Re: [R] Large determinant problem

From: Ravi Varadhan <rvaradhan_at_jhmi.edu>
Date: Mon, 10 Dec 2007 10:08:32 -0500

It is evident that you do not have enough information in the data to estimate 9 mixture components. This is clearly indicated by a positive semi-definite information matrix, S, that is less than full rank. You can monitor the rank of the information matrix, as you increase the number of components, and stop when you suspect rank-deficiency.

Ravi.



Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan_at_jhmi.edu

Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html  



-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of maj_at_stats.waikato.ac.nz
Sent: Sunday, December 09, 2007 2:45 AM
To: Prof Brian Ripley
Cc: r-help_at_r-project.org
Subject: Re: [R] Large determinant problem

I tried crossprod(S) but the results were identical. The term -0.5*log(det(S)) is a complexity penalty meant to make it unattractive to include too many components in a finite mixture model. This case was for a 9-component mixture. At least up to 6 components the determinant behaved as expected and increased with the number of components.

Thanks for your comments.

> Hmm, S'S is numerically singular. crossprod(S) would be a better way to
> compute it than crossprod(S,S) (it does use a different algorithm), but
> look at the singular values of S, which I suspect will show that S is
> numerically singular.
>
> Looks like the answer is 0.
>
>
> On Sun, 9 Dec 2007, maj_at_stats.waikato.ac.nz wrote:
>
>> I thought I would have another try at explaining my problem. I think
>> that
>> last time I may have buried it in irrelevant detail.
>>
>> This output should explain my dilemma:
>>
>>> dim(S)
>> [1] 1455 269
>>> summary(as.vector(S))
>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>> -1.160e+04 0.000e+00 0.000e+00 -4.132e-08 0.000e+00 8.636e+03
>>> sum(as.vector(S)==0)/(1455*269)
>> [1] 0.8451794
>> # S is a large moderately sparse matrix with some large elements
>>> SS <- crossprod(S,S)
>>> (eigen(SS,only.values = TRUE)$values)[250:269]
>> [1] 9.264883e+04 5.819672e+04 5.695073e+04 1.948626e+04
>> 1.500891e+04
>> [6] 1.177034e+04 9.696327e+03 8.037049e+03 7.134058e+03
>> 1.316449e-07
>> [11] 9.077244e-08 6.417276e-08 5.046411e-08 1.998775e-08
>> -1.268081e-09
>> [16] -3.140881e-08 -4.478184e-08 -5.370730e-08 -8.507492e-08
>> -9.496699e-08
>> # S'S fails to be non-negative definite.
>>
>> I can't show you how to produce S easily but below I attempt at a
>> reproducible version of the problem:
>>
>>> set.seed(091207)
>>> X <- runif(1455*269,-1e4,1e4)
>>> p <- rbinom(1455*269,1,0.845)
>>> Y <- p*X
>>> dim(Y) <- c(1455,269)
>>> YY <- crossprod(Y,Y)
>>> (eigen(YY,only.values = TRUE)$values)[250:269]
>> [1] 17951634238 17928076223 17725528630 17647734206 17218470634
>> 16947982383
>> [7] 16728385887 16569501198 16498812174 16211312750 16127786747
>> 16006841514
>> [13] 15641955527 15472400630 15433931889 15083894866 14794357643
>> 14586969350
>> [19] 14297854542 13986819627
>> # No sign of negative eigenvalues; phenomenon must be due
>> # to special structure of S.
>> # S is a matrix of empirical parameter scores at an approximate
>> # mle for a model with 269 paramters fitted to 1455 observations.
>> # Thus, for example, its column sums are approximately zero:
>>> summary(apply(S,2,sum))
>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>> -1.148e-03 -2.227e-04 -7.496e-06 -6.011e-05 7.967e-05 8.254e-04
>>
>> I'm starting to think that it may not be a good idea to attempt to
>> compute
>> large information matrices and their determinants!
>>
>> Murray Jorgensen
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 10 Dec 2007 - 15:15:57 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 10 Dec 2007 - 15:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.