[R] CORR! confidence interval for the logit - predict.glm

From: Christine Sonvilla <csonvilla_at_gmx.at>
Date: Thu, 29 May 2008 04:09:27 +0200

Sorry, in my previous email I forgot two commands for the prediction:

upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit

I've included them in the email below again!

Dear Brian and list members,

Thanks very much for your response Brian! I applied the adjusted calculation that you advised me to use [1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my upper confidence interval values.

Yet, some outcomes are very akward, since for very small values such as 1.98E-10, the lower values very reasonably turns out to be 0, yet the upper value is simply set to 1, which kind of makes the confidence interval redundant and doesn't give any proper idea how good of a predicted value this 1.98E-10 value and other similar ones are.

In the following I try to retrace what excatly I am doing, yet I suppose the principle of "reproducible code" will be difficult to accomplish, since I can't add my input files here, I try my best and hope it kind of fits the "criteria":

My input txt.files are:
testfishes.txt (column names are the fish species, row names are the planning units, where fish surveys where conducted, so the matrix is binary, presence/absence data, 0s and 1s)

predy.txt (column names are the predictor variables, numerical, one is categorical, row names are again the planning units where fish surveys were conducted)

predallx.txt (column names are the predictor variables, row names now are ALL the planning units, also those where no fish survey was conducted)

# I do a glm analysis for my fish models, loading the files first:

predallx<-read.table("predallx.txt",header=TRUE,row.names=1,sep="\t") predallx$exposure<-factor(predallx$exposure) predallx$exposure<-ordered(predallx$exposure,levels=c("Sheltered","Medium","Medexposed","Exposed")) attach(predallx)

predy<-read.table("predy.txt",header=TRUE, row.names=1,sep="\t") predy$exposure<-factor(predy$exposure)

testfishes<-read.table("testfishes.txt",header=TRUE,row.names=1,sep="\t") attach(testfishes)

# Creating a table for my glm output, models are compared on the basis of AIC values:

cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',"\n",


# Creating a matrix for the predicted values (fishoccur) and a matrix for the upper and lower ci:

fishoccur<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(fishoccur)<-colnames(testfishes) rownames(fishoccur)<-rownames(predallx)

upperresponse<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(upperresponse)<-colnames(testfishes) rownames(upperresponse)<-rownames(predallx)

lowerresponse<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(lowerresponse)<-colnames(testfishes) rownames(lowerresponse)<-rownames(predallx)

# Now I run the anlysis in a loop + stepwise approach and enter results in the tables created above, "i" stands for every single fish that in turn should run through the loop:

for (i in 1:5)
  predy$eachfish <- testfishes[,i]
modelfish <- glm(eachfish~depth500m + exposure + nearest.est + island + mangroves + seagrass,

                 family=binomial, data=predy)
stepmodfi <- step(modelfish, trace= 0)

 cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi),

    df.residual(stepmodfi),deviance(stepmodfi),"\n", file="AICfish.txt",append=TRUE)

# Eventually I want to get predicted values for all those planning units where no fish surveys were conducted:

# First I get my predicted values from the response, this yields a predicted value for every single fish and every single planning unit, and these results will be fed into the previously constructed matrix "fishoccur":

predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE) fishoccur[,i] <- predictionresponse

# Now I want to get confidence intervals for these predicted values. I figured out that I would need to get these from the logit scale first and only thereafter transfer them to the response scale. So, first I get again predicted values, but on the logit scale. This time though I get just ONE result for every planning unit, when actually I also expected to get a result for every fish and every planning unit. So, it seems that there is just one value for each planning unit, but I thought it would be nessesary to get one value for each fish and planning unit combination(?)

predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE)

# ! here the previously missing commands, sorry for that again, so here the predicted values and standard errors of the logit scale are used to calculate the upper and lower logit ci which after that are being transferred to the response scale:

upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit

# Then I calculated the upper and lower ci values on the reponse scale and putting them into the previously created matrices of upperresponse and lowerresponse, the "i" stands for the number of fishes used in this run, so that I get one value for every combination of fish and planning unit. Yet, I wonder whether this can be done, if first I only get one predicted value from the logit scale:

upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit)) lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))

# Or I use your adjusted formula:

upperresponse[,i] <- 1/(1+exp(-upperlogit)) lowerresponse[,i] <- 1/(1+exp(-lowerlogit))

# Close the loop:


# Finally extracting tables:


When I use the formula exp(upperlogit)/(1+exp(upperlogit)) I get quite many NAs in my upper ci values, the lower though seems fine. Using your formular eliminated the NAs but produces some confidence intervals which - particularly for very small predicted values - stretch from 0 to 1, which of course doesn't qualify as a confidence interval.

I hope that the description here is more stringent and I would be very grateful if you had any further idea, why I either get NAs or for some values 0-1 confidence intervals and whether the prediction on the logit scale in the first place is conducted correctly.

Thank you very much and sorry again for the double-posting!



Your response:

Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit)) could be replaced by 1/(1+exp(-upperlogit)), or even better by plogis(upperlogit). This could happen via the Hauck-Donner effect: the fitted probabilities are very near one and the standard errors are very large.

As for your other points, please follow the posting guide and 'provide commented, minimal, self-contained, reproducible code'.

Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 


R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 29 May 2008 - 03:28:31 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 03:30:48 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive