From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Fri, 20 Jun 2008 07:52:17 +0100 (BST)

})

Date: Fri, 20 Jun 2008 07:52:17 +0100 (BST)

On Thu, 19 Jun 2008, Bryan Hanson wrote:

> Thanks so much to all who offered assistance. I have to say it would have

*> taken me a long time to figure this out, so I am most grateful. Plus,
**> studying your examples greatly improves my understanding.
**>
**> As a follow up, the fit process gives the following error:
**>
**> Warning messages:
**> 1: In eval(expr, envir, enclos) :
**> non-integer #successes in a binomial glm!
**>
**> Is this something I should worry about? It doesn't arise from glm or
**> glm.fit
*

It does arise from glm(), which runs the initialization code for the family. Specifically

*> binomial()$initialize
*

expression({

if (NCOL(y) == 1) { if (is.factor(y)) y <- y != levels(y)[1] n <- rep.int(1, nobs) if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") mustart <- (weights * y + 0.5)/(weights + 1) m <- weights * y if (any(abs(m - round(m)) > 0.001)) warning("non-integer #successes in a binomial glm!") } else if (NCOL(y) == 2) { if (any(abs(y - round(y)) > 0.001)) warning("non-integer counts in a binomial glm!") n <- y[, 1] + y[, 2] y <- ifelse(n == 0, 0, y[, 1]/n) weights <- weights * n mustart <- (n * y + 0.5)/(n + 1) } else stop("for the binomial family, y must be a vector of 0 and 1's\n", "or a 2 column matrix where col 1 is no. successes and col 2 isno. failures")

})

You specified a binomial family with responses 0.1 and 0.9. That makes no sense -- the binomial distribution is for integers. Perhaps you mean 1/10 and 9/10, in which case you needed a weight of 10, but I'm forced to guess.

*>
*

> Thanks, Bryan

*>
**> For the record, a final complete working code is appended below.
**>
**> x <- c(1:40)
**> y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
**> y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
**> data <- as.data.frame(cbind(x,y1,y2))
**> plot(x, y1, pch = 1, ylim = c(0,1), col = "red", main = "Logistic Regression
**> w/Confidence Bounds", ylab = "y values", xlab = "x values")
**> points(x, y2, pch = 3, col = "blue")
**> abline(h = 0.5, col = "gray")
**> fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
**> na.omit)
**> fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
**> na.omit)
**> lines(fit1$model$x, fit1$fitted.values, col = "red")
**> lines(fit2$model$x, fit2$fitted.values, col = "blue")
**>
**> ## predictions on scale of link function
**> pred1 <- predict(fit1, se.fit = TRUE)
**> pred2 <- predict(fit2, se.fit = TRUE)
**>
**> ## constant for 95% confidence bands
**> ## getting two t values is redundant here as fit1 and fit2
**> ## have same residual df, but the real world may be different
**> res.df <- c(fit1$df.residual, fit2$df.residual)
**> ## 0.975 because we want 2.5% in upper and lower tail
**> const <- qt(0.975, df = res.df)
**>
**> ## confidence bands on scale of link function
**> upper1 <- pred1$fit + (const[1] * pred1$se.fit)
**> lower1 <- pred1$fit - (const[1] * pred1$se.fit)
**> upper2 <- pred2$fit + (const[2] * pred2$se.fit)
**> lower2 <- pred2$fit - (const[2] * pred2$se.fit)
**>
**> ## bind together into a data frame
**> bands <- data.frame(upper1, lower1, upper2, lower2)
**>
**> ## transform on to scale of response
**> bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))
**>
**> ## plot confidence bands
**> lines(fit1$model$x, bands$upper1, col = "pink")
**> lines(fit1$model$x, bands$lower1, col = "pink")
**> lines(fit2$model$x, bands$upper2, col = "lightblue")
**> lines(fit2$model$x, bands$lower2, col = "lightblue")
**>
**>
**>
**> On 6/19/08 12:28 PM, "Gavin Simpson" <gavin.simpson_at_ucl.ac.uk> wrote:
**>
**>> On Thu, 2008-06-19 at 10:42 -0400, Bryan Hanson wrote:
**>>> [I've ommitted some of the conversation so far...]
**>>>
**>>>> E.g. in a logistic model, with (say) eta = beta_0 + beta_1*x one may
**>>>> find, on the
**>>>> linear predictor scale, A and B (say) such that P(A <= eta <= B) = 0.95.
**>>>>
**>>>> Then P(expit(A) <= expit(eta) <= expit(B)) = 0.95, which is exactly
**>>>> what is wanted.
**>>>
**>>> I think I follow the above conceptually, but I don't know how to implement
**>>> it, though I fooled around (unsuccessfully) with some of the variations on
**>>> predict().
**>>>
**>>> I'm trying to learn this in response to a biology colleague who did
**>>> something similar in SigmaPlot. I can already tell that SigmaPlot did a lot
**>>> of stuff for him in the background. The responses are 0/1 of a particular
**>>> observation by date. The following code simulates what's going on (note
**>>> that I didn't use 0/1 since this leads to a recognized condition/warning of
**>>> fitting 1's and 0's. I've requested Brian's Pattern Recognition book so I
**>>> know what the problem is and how to solve it). My colleague is looking at
**>>> two populations in which the "LD50" would differ. I'd like to be able to
**>>> put the "pointwise confidence bounds" on each curve so he can tell if the
**>>> populations are really different.
**>>>
**>>> By the way, this code does give a (minor?) error from glm (which you will
**>>> see).
**>>>
**>>> Can you make a suggestion about how to get those "confidence bounds" on
**>>> there?
**>>>
**>>> Also, is a probit link more appropriate here?
**>>>
**>>> Thanks, Bryan
**>>>
**>>> x <- c(1:40)
**>>> y1 <- c(rep(0.1,10), rep(NA, 10), rep(0.9,20))
**>>> y2 <- c(rep(0.1,15), rep(NA, 10), rep(0.9,15))
**>>> data <- as.data.frame(cbind(x,y1,y2))
**>>> plot(x, y1, pch = 1, ylim = c(0,1), col = "red")
**>>> points(x, y2, pch = 3, col = "blue")
**>>> abline(h = 0.5, col = "gray")
**>>> fit1 <- glm(y1~x, family = binomial(link = "logit"), data, na.action =
**>>> na.omit)
**>>> fit2 <- glm(y2~x, family = binomial(link = "logit"), data, na.action =
**>>> na.omit)
**>>> lines(fit1$model$x, fit1$fitted.values, col = "red")
**>>> lines(fit2$model$x, fit2$fitted.values, col = "blue")
**>>
**>> The point is to get predictions on the scale of the link function,
**>> generate 95% confidence bands in the normal way and then transform the
**>> confidence bands onto the scale of the response using the inverse of the
**>> link function used to fit the model.
**>>
**>> [note, am doing this from memory, so best to check this is right -- I'm
**>> sure someone will tell me very quickly if I have gone wrong anywhere!]
**>>
**>> ## predictions on scale of link function
**>> pred1 <- predict(fit1, se.fit = TRUE)
**>> pred2 <- predict(fit2, se.fit = TRUE)
**>>
**>> ## constant for 95% confidence bands
**>> ## getting two t values is redundant here as fit1 and fit2
**>> ## have same residual df, but the real world may be different
**>> res.df <- c(fit1$df.residual, fit2$df.residual)
**>> ## 0.975 because we want 2.5% in upper and lower tail
**>> const <- qt(0.975, df = res.df)
**>>
**>> ## confidence bands on scale of link function
**>> upper1 <- pred1$fit + (const[1] * pred1$se.fit)
**>> lower1 <- pred1$fit - (const[1] * pred1$se.fit)
**>> upper2 <- pred2$fit + (const[2] * pred2$se.fit)
**>> lower2 <- pred2$fit - (const[2] * pred2$se.fit)
**>>
**>> ## bind together into a data frame
**>> bands <- data.frame(upper1, lower1, upper2, lower2)
**>>
**>> ## transform on to scale of response
**>> bands <- data.frame(lapply(bands, binomial(link = "logit")$linkinv))
**>>
**>> ## plot confidence bands
**>> lines(fit1$model$x, bands$upper1, col = "red", lty = "dotted")
**>> lines(fit1$model$x, bands$lower1, col = "red", lty = "dotted")
**>> lines(fit2$model$x, bands$upper2, col = "blue", lty = "dotted")
**>> lines(fit2$model$x, bands$lower2, col = "blue", lty = "dotted")
**>>
**>> HTH
**>>
**>> G
**>>
**>>> ______________________________________________
**>>> R-help_at_r-project.org mailing list
**>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**>>> and provide commented, minimal, self-contained, reproducible code.
**>
**>
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Fri 20 Jun 2008 - 09:20:26 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 20 Jun 2008 - 10:30:45 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*