From: Werner Wernersen <pensterfuzzer_at_yahoo.de>

Date: Wed, 12 Mar 2008 20:33:35 +0100 (CET)

> E-Mail: (Ted Harding) <Ted.Harding@manchester.ac.uk>

*> Fax-to-email: +44 (0)870 094 0861
*

*> Date: 11-Mar-08
*

*> Time: 10:08:04
*

*> ------------------------------ XFMail
*

*> ------------------------------
*

*>
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 19:38:35 GMT

Date: Wed, 12 Mar 2008 20:33:35 +0100 (CET)

Thanks Ted and Professor Ripley for the very helpful
answers! Now I know what the problem is in my case.

All the best,

Werner

- Ted.Harding_at_manchester.ac.uk schrieb:

> On 11-Mar-08 08:58:55, Werner Wernersen wrote:

*> > Hi,
**> >
**> > could anyone explain to me what this warning
**> message
**> > exactly means and what the consequences are?
**> > Is it due to the fact that there are very extreme
**> > observations / outliers included or what is the
**> reason
**> > for it?
**> >
**> > Thanks so much,
**> > Werner
**>
**> What it means is exactly what it says. How it arises
**> will
**> probably be some variant of the following kind of
**> data
**> (I'm guessing that your application of glm() was to
**> data
**> with 0/1 responses, as in a logistic regression):
**>
**> X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ...
**> Y = 0 0 0 1 1 1 1 ...
**>
**> i.e. all the 0's occur on one side of a value (say
**> 1.25)
**> of X, and all the 1's occur on the other side.
**>
**> If you take a model (e.g. logistic):
**>
**> P(Y=1 | X) = exp((X-a)*b)/(1 + exp((X-a)*b))
**>
**> then, for any finite values of a and b, the formula
**> will
**> give a value >0 for P(Y=1 | X) where X < 1.25 (i.e.
**> where
**> Y=0) so P(Y=0 | X) < 1; and a value <1 for P(Y=1 |
**> X)
**> where X > 1.25 (i.e. Y=1).
**>
**> However, if you take say a=1.25 (a value which
**> separates the
**> 0's from the 1,s), and then let b -> infinity, then
**> you will
**> find that
**>
**> P(Y=0 | X) -> 1, P(Y=1 | X) -> 0, for X < 1.25
**> P(Y=0 | X) -> 0, P(Y=1 | X) -> 1, for X > 1.25
**>
**> so the limit as b -> inf perfectly predicts the
**> observed outcome.
**>
**> However, the value of a is indeterminate so long as
**> it is
**> between the largest X for the Y=0 observations, and
**> the smallest
**> X for the Y=1 observations.
**>
**> This situation cannot arise with data where the
**> largest X for
**> which Y=0 is greater than the smallest X for which
**> Y=1, e.g.
**>
**> X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ...
**> Y = 0 0 1 0 1 1 1 ...
**>
**> The above example is a very simple example of what
**> is called
**> "linear separation". It arises more generally when
**> there are
**> several covariates X1, X2, ... , Xk and there is a
**> linear
**> function
**>
**> L = a1*X1 + a2*X2 + ... + ak*Xk
**>
**> for which (with the data as observed) there is a
**> value L0
**> such that
**>
**> Y = 0 for all the data such that L < L0
**> Y = 1 for all the data such that L > L0
**>
**> In particular, if ever the number of covariates (k)
**> is greater
**> than (n-2), where n is the number of cases in your
**> data, then
**> you have (k+1) or fewer points in k dimensions, and
**> there will
**> be a k-dimensional plane (as given by L above) which
**> will
**> separate the (X1,...,Xk)-points where Y=0 from the
**> (X1,...,Xk)-points where Y=1. Regardless of how you
**> assign labels
**> "Y=0" and "Y=1" to (k+1) or fewer points, they will
**> be linearly
**> separable.
**>
**> Even if k < n-1, so that they are not *in general*
**> linearly
**> separated, there is still a a positive probability
**> that you
**> can get data for which they are linerally separated;
**> and
**> then the same situation arises. This probability
**> increases
**> as the number of covariates (k) increases.
**>
**> What the warning message is telling you is that a
**> perfect
**> fit is possible within the parametrisation of the
**> model:
**> a probability P(Y=1)=0 is fitted to cases where the
**> observed
**> Y = 0; and a probability P(Y=1)=1 is fitted to cases
**> where
**> the observed Y = 1.
**>
**> Best wishes,
**> Ted.
**>
**>
*

> E-Mail: (Ted Harding) <Ted.Harding@manchester.ac.uk>

Lesen Sie Ihre E-Mails auf dem Handy.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 12 Mar 2008 - 19:38:35 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 12 Mar 2008 - 20:30:22 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*