From: Christine Sonvilla <csonvilla_at_gmx.at>

Date: Thu, 29 May 2008 04:09:27 +0200

Date: Thu, 29 May 2008 04:09:27 +0200

Sorry, in my previous email I forgot two commands for the prediction:

upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit

I've included them in the email below again!

Dear Brian and list members,

Thanks very much for your response Brian! I applied the adjusted calculation that you advised me to use [1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my upper confidence interval values.

Yet, some outcomes are very akward, since for very small values such as 1.98E-10, the lower values very reasonably turns out to be 0, yet the upper value is simply set to 1, which kind of makes the confidence interval redundant and doesn't give any proper idea how good of a predicted value this 1.98E-10 value and other similar ones are.

In the following I try to retrace what excatly I am doing, yet I suppose the principle of "reproducible code" will be difficult to accomplish, since I can't add my input files here, I try my best and hope it kind of fits the "criteria":

My input txt.files are:

testfishes.txt (column names are the fish species, row names are the planning units, where fish surveys where conducted, so the matrix is binary, presence/absence data, 0s and 1s)

predy.txt (column names are the predictor variables, numerical, one is categorical, row names are again the planning units where fish surveys were conducted)

predallx.txt (column names are the predictor variables, row names now are ALL the planning units, also those where no fish survey was conducted)

# I do a glm analysis for my fish models, loading the files first:

predallx<-read.table("predallx.txt",header=TRUE,row.names=1,sep="\t") predallx$exposure<-factor(predallx$exposure) predallx$exposure<-ordered(predallx$exposure,levels=c("Sheltered","Medium","Medexposed","Exposed")) attach(predallx)

predy<-read.table("predy.txt",header=TRUE, row.names=1,sep="\t")
predy$exposure<-factor(predy$exposure)

attach(predy)

testfishes<-read.table("testfishes.txt",header=TRUE,row.names=1,sep="\t") attach(testfishes)

# Creating a table for my glm output, models are compared on the basis of AIC values:

cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',"\n",

file="AICfish.txt",append=FALSE)

# Creating a matrix for the predicted values (fishoccur) and a matrix for the upper and lower ci:

fishoccur<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(fishoccur)<-colnames(testfishes) rownames(fishoccur)<-rownames(predallx)

upperresponse<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(upperresponse)<-colnames(testfishes) rownames(upperresponse)<-rownames(predallx)

lowerresponse<-matrix(0,nrow(predallx),ncol(testfishes)) colnames(lowerresponse)<-colnames(testfishes) rownames(lowerresponse)<-rownames(predallx)

for (i in 1:5)

{

predy$eachfish <- testfishes[,i]

modelfish <- glm(eachfish~depth500m + exposure + nearest.est + island + mangroves + seagrass,

family=binomial, data=predy)stepmodfi <- step(modelfish, trace= 0)

cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi),

df.residual(stepmodfi),deviance(stepmodfi),"\n", file="AICfish.txt",append=TRUE)

predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE) fishoccur[,i] <- predictionresponse

predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE)

# ! here the previously missing commands, sorry for that again, so here the predicted values and standard errors of the logit scale are used to calculate the upper and lower logit ci which after that are being transferred to the response scale:

upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit

# Then I calculated the upper and lower ci values on the reponse scale and putting them into the previously created matrices of upperresponse and lowerresponse, the "i" stands for the number of fishes used in this run, so that I get one value for every combination of fish and planning unit. Yet, I wonder whether this can be done, if first I only get one predicted value from the logit scale:

upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit)) lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))

# Or I use your adjusted formula:

upperresponse[,i] <- 1/(1+exp(-upperlogit)) lowerresponse[,i] <- 1/(1+exp(-lowerlogit))

# Close the loop:

}

write.table(fishoccur,file="fishoccur.txt",row.names=TRUE,sep=",") write.table(upperresponse,file="upperresponse.txt",row.names=TRUE,sep=",") write.table(lowerresponse,file="lowerresponse.txt",row.names=TRUE,sep=",")

When I use the formula exp(upperlogit)/(1+exp(upperlogit)) I get quite many NAs in my upper ci values, the lower though seems fine. Using your formular eliminated the NAs but produces some confidence intervals which - particularly for very small predicted values - stretch from 0 to 1, which of course doesn't qualify as a confidence interval.

I hope that the description here is more stringent and I would be very grateful if you had any further idea, why I either get NAs or for some values 0-1 confidence intervals and whether the prediction on the logit scale in the first place is conducted correctly.

Thank you very much and sorry again for the double-posting!

Regards,

Christine

Your response:

Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit)) could be replaced by 1/(1+exp(-upperlogit)), or even better by plogis(upperlogit). This could happen via the Hauck-Donner effect: the fitted probabilities are very near one and the standard errors are very large.

As for your other points, please follow the posting guide and 'provide commented, minimal, self-contained, reproducible code'.

-- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten -- ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu 29 May 2008 - 03:28:31 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 03:30:48 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*