[R] Loss of numerical precision from conversion to list ?

From: Fabian Scheipl <f.abian_at_gmx.net>
Date: Fri 21 Jul 2006 - 07:09:19 EST

I´m working on an R-implementation of the simulation-based finite-sample null-distribution of (R)LR-Test in Mixed Models (i.e. testing for Var(RandomEffect)=0) derived by C. M. Crainiceanu and D. Ruppert.

I'm in the beginning stages of this project and while comparing quick and dirty grid-search-methods and more exact optim()/optimize()-based methods to find the maximum of a part of the RLR-Test-Statistic i stumbled upon the following problem:

It seems to me that R produces different results depending on whether originally identical numbers involved in the exact same computations are read from a matrix or a list. (I need both in order to do quick vectorized computation for the grid-search with matrices and "list-based" computation so that i can put the function to be maximized in something like mapply(...,optim(foo),...)- I can elaborate if desired)

However, the problem goes away once a number involved in the computation is set from almost zero (e-15) to 4. I'm completely mystified by this; especially since this number that I change is NOT one of the numbers that are switched from matrix to list.

Here's the code:

library(nlme)
data(Orthodont) #108 dental measurements on 27 subjects
# m1<-lme(distance~age,random=~1|Subject,data=Orthodont)
# summary(m1)
# ...
# Random effects:
# Formula: ~1 | Subject
# (Intercept) Residual
# StdDev: 2.114724 1.431592 -> lambda.REML=2.114^2/1.431^2 = 2.182382

#DesignMatrix for fixed Effects

X<-cbind(rep(1,108),Orthodont$age)
#DesignMatrix of RandomEffects

Z<-matrix(data=c(rep(1,4),rep(0,108)),nrow=108,ncol=27)

#Corr(RanEf)^0.5 = 27 x 27 Identity, since RandomIntercepts are independent
sqrt.Sigma<-diag(27)

K<-27 #number of subjects/ random intercepts
n<-nrow(X)
p<-ncol(X)

lambda0 <- 2.182382 #actually not a sensible choice as Null-Hypothesis, but that doesn't pertain to the problem

#Projection-Matrix for Fixed-Effects-Model: Y -> errors
P0=diag(n)-X%*%solve((t(X)%*%X))%*%t(X)

mu<-eigen(sqrt.Sigma%*%t(Z)%*%P0%*%Z%*%sqrt.Sigma)$values
# mu
# [1] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[11] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[21] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 5.77316e-15
# ! Notice the last (27th) value very close to 0

nsim<-10
set.seed(10)
#nsim x K array of ChiSq(1)-variates

w.k.sq.mat<-matrix(rchisq(nsim*K,1),nrow=nsim)
#nsim x 1 array of ChiSq(n-p-K)-variates
w.sum2<-rchisq(nsim,n-p-K)

### vectorized computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<- cbind(w.k.sq.mat,w.sum2) #nsim x (K+1)
#vector-based results:

num.v<- rowSums(((lambda-lambda0)*mu*w.k.sq[,-(K+1)])/(1+lambda*mu)) den.v<- rowSums(((1+lambda0*mu)*w.k.sq[,-(K+1)]) / (1+lambda*mu)) + w.k.sq[,K+1]

### list-based computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<-list()
length(w.k.sq)<-nsim
#put the nsim rows into list-slots:

for(i in 1:nsim) w.k.sq[[i]]<-c(w.k.sq.mat[i,],w.sum2[i]) num.l<-numeric(0)
den.l<-numeric(0)
for(i in 1:nsim)
{
num.l[i]<-sum(((lambda-lambda0)*mu*w.k.sq[[i]][-(K+1)])/(1+lambda*mu))
#exactly analogous to num.v & den.v, except list-elements instead of vector
den.l[i]<-sum(((1+lambda0*mu)*w.k.sq[[i]][-(K+1)]) / (1+lambda*mu)) + w.k.sq[[i]][K+1] }

# Now the actual problem:
# notice the discrepancies between the results from vectorized computation
# and the results from list-based computation
# Since discrepancies disappear if mu[27] is changed
# from 5.77316e-15 to 4, i'm guessing somewhere in the conversion to
# "list" there must be a loss of precision or is there an entirely
# different problem?

num.l
# [1] -25.93322 -17.65486 -18.80239 -19.49974 ....
num.v
# [1] -23.84733 -17.62233 -27.22975 -19.50294 ....

den.l
# [1] 117.30246 92.59041 92.91491 112.90113 ...
den.v
# [1] 115.21657 92.55789 101.34228 112.90433 ...

#now i set

mu[27]<-4
#and reran the computation of num.l /.v and den.l /.v from above:

num.l
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
num.v
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
den.l
# [1] 117.62489 92.60979 101.58511 114.38100 ...
den.v
# [1] 117.62489 92.60979 101.58511 114.38100 ...

what i would like to know now is:

  1. which of the two calculations yields a more precise result? or rather:
  2. how can i avoid these discrepancies in the future since i need to be able to compare these two methods? and, most importantly,
  3. what in R.A.Fisher's name is happening here?

version information:

Version 2.3.1 (2006-06-01)
i386-pc-mingw32
.Machine$double.eps is 2.220446e-16 (does it matter?)

thanks for your time,

-- 
Fabian Scheipl

f.abian@gmx.net

"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri Jul 21 08:24:05 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 21 Jul 2006 - 10:19:36 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.