# Re: [R] modeling binary response variables

From: Daniel Malter <daniel_at_umd.edu>
Date: Mon, 14 Jul 2008 18:07:15 -0700 (PDT)

This sounds like a good application for a binomial model as fitting a linear model may give you a fit outside the limits of the interval that you are allowed to observe (0,1). The binomial logit (or probit, or cloglog) fixes that issue.

Since you have a proportion (the probability of success), you have something between 0 and 1. I suggest you to transform that by multiplying that proportion by say 100 (or 1000). Then you round this value to the next integer. Say Y is currently your proportion, do new.Y=round(Y*100). Then you create the number of observations that make up the counter-probability of your observation. counter.Y=100-Y.

Then you can run the binomial as follows:

reg=glm(cbind(new.Y,counter.Y)~predictors,binomial) ##runs the regression summary(reg) ##shows the summary output of your regression fitted(reg) ##shows the predicted values given your data matrix and your estimated model

You will want to check a.) whether you need a binomial (if your probabilities are actually reasonably distributed in a much smaller interval than 0,1, then you may be okay with a linear model). b.) if a binomial is more appropriate, you will want to check whether your data is overdispersed. Look at whether your degrees of freedom in the summary of your model are about equal to the log-likelihood of the model. If not, choose option quasibinomial instead of option binomial when fitting the model.

Best,
Daniel

Kevin J Emerson wrote:
>
> R-devotees,
>
> I have a question about modeling in the case where the response variable
> is
> binary.
>
> I have a case where I have a response variable that is the probability of
> success, and four descriptor variables, The response has a sigmoid
> response
> with one of the variables. I would like to test for the effect of the
> various descriptor variables on the percentage success of the binary
> trait.
> I have looked at glm with family = "binomial" but am not sure I totally
> understand its use (and therefore am not sure it is the appropriate test)
> and am looking for two things: (1) is glm with family = 'binomial' the
> right
> way to do this, and (2) are there any good references on how it works.
> I have posted a plot of a sample of the data I am looking at as well as
> the
> sample data used to generate the plots.
>
> Sample Plot: http://www.uoregon.edu/~kemerson/tmp/plot.pdf
> Sample Data: http://www.uoregon.edu/~kemerson/tmp/data.csv
>
> Response variable is percent.dev (se2.dev are the errors from binomial
> estimates given probability and number of samples).
>
> Descriptor variables are num.days, ppd, temp, and pop.
>
> Any help would be greatly appreciated.
>
> Cheers,
> Kevin Emerson
>
>
> ====================================
> Kevin J. Emerson
> 1210 University of Oregon
> Eugene, OR, 97403
> email: kemerson_at_uoregon.edu
> web: http://evodevo.uoregon.edu/people/emerson.html
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

```--
View this message in context: http://www.nabble.com/modeling-binary-response-variables-tp18456116p18456275.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help