Re: [R] Large determinant problem

From: <maj_at_stats.waikato.ac.nz>
Date: Sun, 9 Dec 2007 23:34:27 +1300 (NZDT)

What you say about mixture models is true in general, however this fit was the best of 100 random EM starts. Unbounded likelihoods I believe are only a problem for continuous data mixture models and mine was discrete. Anyway it's nearly midnight now here so I'd better sleep. Before I go, here are the singular values:

> svd(S)$d

  [1] 1.207593e+05 1.049068e+05 9.308082e+04 8.332758e+04 6.929102e+04
  [6] 6.323142e+04 5.977638e+04 5.723191e+04 4.375631e+04 2.723792e+04
 [11] 2.592586e+04 2.411705e+04 2.392963e+04 2.196578e+04 2.169200e+04
 [16] 2.123290e+04 2.054479e+04 1.948157e+04 1.927687e+04 1.777423e+04
 [21] 1.768510e+04 1.754492e+04 1.735954e+04 1.643881e+04 1.600038e+04
 [26] 1.588009e+04 1.584179e+04 1.419902e+04 1.401829e+04 1.332706e+04
 [31] 1.310741e+04 1.282854e+04 1.240196e+04 1.229453e+04 1.198187e+04
 [36] 1.168831e+04 1.069801e+04 1.063407e+04 1.060623e+04 1.056741e+04
 [41] 1.037193e+04 1.018307e+04 9.954778e+03 9.691297e+03 9.544900e+03
 [46] 9.353932e+03 9.084223e+03 9.023719e+03 8.538460e+03 8.260557e+03
 [51] 7.789166e+03 7.624562e+03 7.552246e+03 7.371003e+03 7.249892e+03
 [56] 7.170754e+03 7.143712e+03 7.041465e+03 7.019497e+03 6.918243e+03
 [61] 6.725985e+03 6.635220e+03 6.610919e+03 6.600485e+03 6.378983e+03
 [66] 6.255341e+03 6.252620e+03 5.944109e+03 5.890990e+03 5.875790e+03
 [71] 5.812950e+03 5.786653e+03 5.754739e+03 5.743921e+03 5.729494e+03
 [76] 5.588519e+03 5.558093e+03 5.511866e+03 5.447340e+03 5.436718e+03
 [81] 5.390440e+03 5.389862e+03 5.351446e+03 5.323460e+03 5.231327e+03
 [86] 5.154886e+03 5.146495e+03 5.103094e+03 5.062339e+03 5.016310e+03
 [91] 5.007371e+03 5.003195e+03 4.987950e+03 4.984937e+03 4.971855e+03
 [96] 4.963557e+03 4.913927e+03 4.891866e+03 4.845879e+03 4.841233e+03

[101] 4.807681e+03 4.789150e+03 4.768244e+03 4.752387e+03 4.685244e+03
[106] 4.667949e+03 4.662146e+03 4.655817e+03 4.615451e+03 4.542832e+03
[111] 4.463354e+03 4.448647e+03 4.420757e+03 4.393323e+03 4.368262e+03
[116] 4.330368e+03 4.322231e+03 4.280486e+03 4.269604e+03 4.266072e+03
[121] 4.227934e+03 4.210416e+03 4.197196e+03 4.169111e+03 4.168029e+03
[126] 4.145750e+03 4.137148e+03 4.117092e+03 4.102093e+03 4.031528e+03
[131] 3.997150e+03 3.989493e+03 3.960800e+03 3.954143e+03 3.921214e+03
[136] 3.892764e+03 3.861505e+03 3.831798e+03 3.821399e+03 3.816648e+03
[141] 3.813275e+03 3.797050e+03 3.788435e+03 3.765362e+03 3.753526e+03
[146] 3.750739e+03 3.717638e+03 3.704314e+03 3.700483e+03 3.683338e+03
[151] 3.669548e+03 3.651310e+03 3.645356e+03 3.636891e+03 3.634490e+03
[156] 3.631998e+03 3.598744e+03 3.578298e+03 3.577353e+03 3.492344e+03
[161] 3.457991e+03 3.438116e+03 3.401560e+03 3.398088e+03 3.390086e+03
[166] 3.362965e+03 3.328079e+03 3.306448e+03 3.289258e+03 3.283123e+03
[171] 3.268046e+03 3.254232e+03 3.238759e+03 3.176306e+03 3.173192e+03
[176] 3.145273e+03 3.132647e+03 3.124703e+03 3.116454e+03 3.028187e+03
[181] 3.026404e+03 3.003130e+03 2.985991e+03 2.952215e+03 2.946402e+03
[186] 2.937366e+03 2.902973e+03 2.867319e+03 2.855981e+03 2.843939e+03
[191] 2.830485e+03 2.788518e+03 2.761445e+03 2.753757e+03 2.752846e+03
[196] 2.725580e+03 2.723263e+03 2.669216e+03 2.640574e+03 2.545404e+03
[201] 2.543216e+03 2.508090e+03 2.486351e+03 2.465191e+03 2.447437e+03
[206] 2.431466e+03 2.424620e+03 2.423907e+03 2.399220e+03 2.369538e+03
[211] 2.305238e+03 2.261185e+03 2.252992e+03 2.171784e+03 2.169940e+03
[216] 2.127546e+03 2.094436e+03 2.074605e+03 2.056932e+03 2.053942e+03
[221] 2.011659e+03 1.993672e+03 1.934327e+03 1.893751e+03 1.848455e+03
[226] 1.838315e+03 1.763492e+03 1.728018e+03 1.726965e+03 1.623798e+03
[231] 1.617925e+03 1.554590e+03 1.498835e+03 1.421876e+03 1.256465e+03
[236] 1.200904e+03 1.118300e+03 1.101870e+03 1.055408e+03 9.238208e+02
[241] 8.125509e+02 7.031272e+02 6.943645e+02 6.338677e+02 5.772709e+02
[246] 5.077392e+02 4.566595e+02 4.025622e+02 3.118065e+02 3.043827e+02
[251] 2.412400e+02 2.386435e+02 1.395932e+02 1.225108e+02 1.084912e+02
[256] 9.846993e+01 8.964959e+01 8.446336e+01 2.486490e-05 5.362792e-11
[261] 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12
[266] 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12

Murray

> On Sun, 9 Dec 2007, maj_at_stats.waikato.ac.nz wrote:
>
>> I tried crossprod(S) but the results were identical. The term
>> -0.5*log(det(S)) is  a complexity penalty meant to make it unattractive
>> to
>> include too many components in a finite mixture model. This case was for
>> a
>> 9-component mixture. At least up to 6 components the determinant behaved
>> as expected and increased with the number of components.
>
> And the singular values were?
>
> I am not surprised at this: if you have too many components some of them
> may not be contributing to the fit or duplicating others: both lead to
> numerically singular information matrices.  In many mixture-fitting
> problems the log-likelihood is unbounded but with many local maxima: it is
> very easy to find a poor one.
>
>>
>> Thanks for your comments.
>>
>>> Hmm, S'S is numerically singular.  crossprod(S) would be a better way
>>> to
>>> compute it than crossprod(S,S) (it does use a different algorithm), but
>>> look at the singular values of S, which I suspect will show that S is
>>> numerically singular.
>>>
>>> Looks like the answer is 0.
>>>
>>>
>>> On Sun, 9 Dec 2007, maj_at_stats.waikato.ac.nz wrote:
>>>
>>>> I thought I would have another try at explaining my problem. I think
>>>> that
>>>> last time I may have buried it in irrelevant detail.
>>>>
>>>> This output should explain my dilemma:
>>>>
>>>>> dim(S)
>>>> [1] 1455  269
>>>>> summary(as.vector(S))
>>>>      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
>>>> -1.160e+04  0.000e+00  0.000e+00 -4.132e-08  0.000e+00  8.636e+03
>>>>> sum(as.vector(S)==0)/(1455*269)
>>>> [1] 0.8451794
>>>> # S is a large moderately sparse matrix with some large elements
>>>>> SS <- crossprod(S,S)
>>>>> (eigen(SS,only.values = TRUE)$values)[250:269]
>>>> [1]  9.264883e+04  5.819672e+04  5.695073e+04  1.948626e+04
>>>> 1.500891e+04
>>>> [6]  1.177034e+04  9.696327e+03  8.037049e+03  7.134058e+03
>>>> 1.316449e-07
>>>> [11]  9.077244e-08  6.417276e-08  5.046411e-08  1.998775e-08
>>>> -1.268081e-09
>>>> [16] -3.140881e-08 -4.478184e-08 -5.370730e-08 -8.507492e-08
>>>> -9.496699e-08
>>>> # S'S fails to be non-negative definite.
>>>>
>>>> I can't show you how to produce S easily but below I attempt at a
>>>> reproducible version of the problem:
>>>>
>>>>> set.seed(091207)
>>>>> X <- runif(1455*269,-1e4,1e4)
>>>>> p <- rbinom(1455*269,1,0.845)
>>>>> Y <- p*X
>>>>> dim(Y) <- c(1455,269)
>>>>> YY <- crossprod(Y,Y)
>>>>> (eigen(YY,only.values = TRUE)$values)[250:269]
>>>> [1] 17951634238 17928076223 17725528630 17647734206 17218470634
>>>> 16947982383
>>>> [7] 16728385887 16569501198 16498812174 16211312750 16127786747
>>>> 16006841514
>>>> [13] 15641955527 15472400630 15433931889 15083894866 14794357643
>>>> 14586969350
>>>> [19] 14297854542 13986819627
>>>> # No sign of negative eigenvalues; phenomenon must be due
>>>> # to special structure of S.
>>>> # S is a matrix of empirical parameter scores at an approximate
>>>> # mle for a model with 269 paramters fitted to 1455 observations.
>>>> # Thus, for example, its column sums are approximately zero:
>>>>> summary(apply(S,2,sum))
>>>>      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
>>>> -1.148e-03 -2.227e-04 -7.496e-06 -6.011e-05  7.967e-05  8.254e-04
>>>>
>>>> I'm starting to think that it may not be a good idea to attempt to
>>>> compute
>>>> large information matrices and their determinants!
>>>>
>>>> Murray Jorgensen
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> --
>>> Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
>>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>
>>>
>>
>>
>
> --
> Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 09 Dec 2007 - 10:42:16 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 09 Dec 2007 - 13:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.