Re: [Rd] informal conventions/checklist for new predictive modeling packages

From: Paul Johnson <>
Date: Thu, 05 Jan 2012 14:44:41 -0600

I agree with almost all, except the last point. Since I have participated in wheel-reinvention lately, I agree with the bulk of your comment. I don't think the fix is as easy as you suspect, RSiteSearch won't help me find a function I need when I don't know the magic words. Some R functions have such unexpected names that only a fastidious source-code reader would find them ("pretty", for example).  But I agree with your concern.

But, as far as the last one is concerned, I think you are mistaken. Explanation below.

On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn <> wrote:
> (14) [OCD] For binary classification models, model the probability of
> the first level of a factor as the event of interest (again, for
> consistency) Note that glm() does not do this but most others use the
> first level.
When the DV is thought of as 0 and 1, and 1 is an "event" "success" or "win" and 0 is a "non event" "failure" or "loss", if there is to be a single predicted probability, I want it to be the probability of the higher outcome.

glm is doing the thing I want, and I don't know of others that go the other way, except PROC LOGISTIC in SAS. And that has a long history of causing confusion and despair.

I'd like to consider adding one thing to your list, though. I have wished (in this list and elsewhere) that there were a more regular approach for calculating "newdata" objects that are used in predict. Many packages have re-invented this (datadist in rms, effects), and almost nobody here agreed with my wish for a more standard approach. But if there were a standard approach, it would be much easier to hold up R as an alternative to Stata when users pop up with "marginal effects tables" from Stata that are very difficult to reproduce with R.


Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________ mailing list
Received on Thu 05 Jan 2012 - 20:48:41 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 Jan 2012 - 21:30:06 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive