Re: [R] "fitted probabilities numerically 0 or 1 occurr

From: Ted Harding <>
Date: Tue, 11 Mar 2008 10:08:09 +0000 (GMT)

On 11-Mar-08 08:58:55, Werner Wernersen wrote:
> Hi,
> could anyone explain to me what this warning message
> exactly means and what the consequences are?
> Is it due to the fact that there are very extreme
> observations / outliers included or what is the reason
> for it?
> Thanks so much,
> Werner

What it means is exactly what it says. How it arises will probably be some variant of the following kind of data (I'm guessing that your application of glm() was to data with 0/1 responses, as in a logistic regression):

X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... Y = 0 0 0 1 1 1 1 ...

i.e. all the 0's occur on one side of a value (say 1.25) of X, and all the 1's occur on the other side.

If you take a model (e.g. logistic):

  P(Y=1 | X) = exp((X-a)*b)/(1 + exp((X-a)*b))

then, for any finite values of a and b, the formula will give a value >0 for P(Y=1 | X) where X < 1.25 (i.e. where Y=0) so P(Y=0 | X) < 1; and a value <1 for P(Y=1 | X) where X > 1.25 (i.e. Y=1).

However, if you take say a=1.25 (a value which separates the 0's from the 1,s), and then let b -> infinity, then you will find that

  P(Y=0 | X) -> 1, P(Y=1 | X) -> 0, for X < 1.25   P(Y=0 | X) -> 0, P(Y=1 | X) -> 1, for X > 1.25

so the limit as b -> inf perfectly predicts the observed outcome.

However, the value of a is indeterminate so long as it is between the largest X for the Y=0 observations, and the smallest X for the Y=1 observations.

This situation cannot arise with data where the largest X for which Y=0 is greater than the smallest X for which Y=1, e.g.

X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... Y = 0 0 1 0 1 1 1 ...

The above example is a very simple example of what is called "linear separation". It arises more generally when there are several covariates X1, X2, ... , Xk and there is a linear function

  L = a1*X1 + a2*X2 + ... + ak*Xk

for which (with the data as observed) there is a value L0 such that

  Y = 0 for all the data such that L < L0   Y = 1 for all the data such that L > L0

In particular, if ever the number of covariates (k) is greater than (n-2), where n is the number of cases in your data, then you have (k+1) or fewer points in k dimensions, and there will be a k-dimensional plane (as given by L above) which will separate the (X1,...,Xk)-points where Y=0 from the (X1,...,Xk)-points where Y=1. Regardless of how you assign labels "Y=0" and "Y=1" to (k+1) or fewer points, they will be linearly separable.

Even if k < n-1, so that they are not *in general* linearly separated, there is still a a positive probability that you can get data for which they are linerally separated; and then the same situation arises. This probability increases as the number of covariates (k) increases.

What the warning message is telling you is that a perfect fit is possible within the parametrisation of the model: a probability P(Y=1)=0 is fitted to cases where the observed Y = 0; and a probability P(Y=1)=1 is fitted to cases where the observed Y = 1.

Best wishes,

E-Mail: (Ted Harding) <> Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-08                                       Time: 10:08:04
------------------------------ XFMail ------------------------------

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Tue 11 Mar 2008 - 18:02:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 12 Mar 2008 - 20:30:22 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive