Re: [R] Logistic regression model + precision/recall

From: Frank E Harrell Jr <>
Date: Wed 24 Jan 2007 - 14:59:44 GMT

nitin jindal wrote:
> On 1/24/07, Frank E Harrell Jr <> wrote:

>> Why 0.5?

> The probability has to adjusted based on some hit and trials. I just
> mentioned it as an example

Using a cutoff is not a good idea unless the utility (loss) function is discontinuous and is the same for every subject (in the medical field utilities are almost never constant). And if you are using the data to find the cutoff, this will require bootstrapping to penalize for the cutoff not being pre-specified.


>> Those are improper scoring rules that can be tricked.  If the outcome is
>> rare (say 0.02 incidence) you could just predict that no one will have
>> the outcome and be correct 0.98 of the time.  I suggest validating the
>> model for discrimination (e.g., AUC) and calibration.

> I just have to calculate precision/recall for rare outcome. If the positive
> outcome is rare ( say 0.02 incidence) and I predict it to be negative all
> the time, my recall would be 0, which is bad. So, precision and recall can
> take care of skewed data.

No, that is not clear. The overall classification error would only be 0.02 in that case. It is true though that one of the two conditional probabilities would not be good.

> Frank

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Thu Jan 25 13:03:22 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 25 Jan 2007 - 05:30:29 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.