Re: [Rd] informal conventions/checklist for new predictive modeling packages

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Thu, 05 Jan 2012 14:44:41 -0600

I agree with almost all, except the last point. Since I have participated in wheel-reinvention lately, I agree with the bulk of your comment. I don't think the fix is as easy as you suspect, RSiteSearch won't help me find a function I need when I don't know the magic words. Some R functions have such unexpected names that only a fastidious source-code reader would find them ("pretty", for example).  But I agree with your concern.

But, as far as the last one is concerned, I think you are mistaken. Explanation below.

On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn <mxkuhn_at_gmail.com> wrote:
>
> (14) [OCD] For binary classification models, model the probability of
> the first level of a factor as the event of interest (again, for
> consistency) Note that glm() does not do this but most others use the
> first level.
>
When the DV is thought of as 0 and 1, and 1 is an "event" "success" or "win" and 0 is a "non event" "failure" or "loss", if there is to be a single predicted probability, I want it to be the probability of the higher outcome.

glm is doing the thing I want, and I don't know of others that go the other way, except PROC LOGISTIC in SAS. And that has a long history of causing confusion and despair.

I'd like to consider adding one thing to your list, though. I have wished (in this list and elsewhere) that there were a more regular approach for calculating "newdata" objects that are used in predict. Many packages have re-invented this (datadist in rms, effects), and almost nobody here agreed with my wish for a more standard approach. But if there were a standard approach, it would be much easier to hold up R as an alternative to Stata when users pop up with "marginal effects tables" from Stata that are very difficult to reproduce with R.

Regards,
pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Thu 05 Jan 2012 - 20:48:41 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 Jan 2012 - 21:30:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive