Re: [R] Is it possible to use glm() with 30 observations?

From: David Firth <d.firth_at_warwick.ac.uk>
Date: Sat 02 Jul 2005 - 19:01:06 EST

On 2 Jul 2005, at 06:01, Spencer Graves wrote:

> The issue is not 30 observations but whether it is possible to
> perfectly separate the two possible outcomes. Consider the following:
>
> tst.glm <- data.frame(x=1:3, y=c(0, 1, 0))
> glm(y~x, family=binomial, data=tst.glm)
>
> tst2.glm <- data.frame(x=1:1000,
> y=rep(0:1, each=500))
> glm(y~x, family=binomial, data=tst2.glm)
>
> The algorithm fits y~x to tst.glm without complaining for tst.glm,
> but issues warnings for tst2.glm. This is called the Hauck-Donner
> effect, and RSiteSearch("Hauck-Donner") just now produced 8 hits. For
> more information, look for "Hauck-Donnner" in the index of Venables, W.
> N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New
> York: Springer.

Not exactly. The phenomenon that causes the warning for tst2.glm above is more commonly known as "complete separation". For some comments on its implications you might look at another work by B D Ripley, the 1996 book "Pattern Recognition and Neural Networks". There are some further references in the help files of the "brlr" package on CRAN.

The problem noted by Hauck and Donner (1997, JASA) is slightly related, but not the same. See the aforementioned book by Venables and Ripley, for example. The glm function does not routinely warn us about the "Hauck-Donner effect", afaik.

The original poster did not say what was the purpose of the logistic regression was, so it is hard to advise. Depending on the purpose, the separation that was detected may or may not be a problem.

Regards,
David

> (If you don't already have this book, I recommend you
> give serious consideration to purchasing a copy. It is excellent on
> many issues relating to statistical analysis and R.
>
> Spencer Graves
>
> Kerry Bush wrote:
>
>> I have a very simple problem. When using glm to fit
>> binary logistic regression model, sometimes I receive
>> the following warning:
>>
>> Warning messages:
>> 1: fitted probabilities numerically 0 or 1 occurred
>> in: glm.fit(x = X, y = Y, weights = weights, start =
>> start, etastart = etastart,
>> 2: fitted probabilities numerically 0 or 1 occurred
>> in: glm.fit(x = X, y = Y, weights = weights, start =
>> start, etastart = etastart,
>>
>> What does this output tell me? Since I only have 30
>> observations, i assume this is a small sample problem.

>> Is it possible to fit this model in R with only 30
>> observations? Could any expert provide suggestions to
>> avoid the warning?
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> --
> Spencer Graves, PhD
> Senior Development Engineer
> PDF Solutions, Inc.
> 333 West San Carlos Street Suite 700
> San Jose, CA 95110, USA
>
> spencer.graves@pdf.com
> www.pdf.com <http://www.pdf.com>
> Tel: 408-938-4420
> Fax: 408-280-7915
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 02 19:04:19 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:11 EST