Re: [Rd] standard format for newdata objects

From: Terry Therneau <therneau_at_mayo.edu>
Date: Wed, 27 Apr 2011 08:27:20 -0500

On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:
> Er... No, I don't think Paul is being particularly rude here (and he
> has been doing us some substantial favors in the past, notably his
> useful Rtips page). I know the kind of functionality he is looking
> for; e.g., SAS JMP has some rather nice interactive displays of
> regression effects for which you'll need to fill in "something" for
> the other variables.
>
> However, that being said, I agree with Duncan that we probably do not
> want to canonicalize any particular method of filling in "average"
> values for data frame variables. Whatever you do will be statistically
> dubious (in particular, using the mode of a factor variable gives me
> the creeps: Do a subgroup analysis and your "average person" switches
> from male to female?), so I think it is one of those cases where it is
> best to provide mechanism, not policy.
>

  I agree with Peter. There are two tasks in newdata: deciding what the default reference levels should be, and building the data frame with those levels. It's the first part that is hard. For survival curves from a Cox model the historical default has been to use the mean of each covariate, which can be awful (sex coded as 0/1 leads to prediction for a hermaphrodite?). Nevertheless, I've not been able to think of a strategy that would give sensible answers for most of the data I use and coxph retains the flawed default for lack of a better idea. When teaching a class on this, I tell listeners "bite the bullet" and build the newdata that makes clinical sense, because package defaults are always unwise for some of the variables. How can a package possibly know that it should use bilirubin=1.0 (upper limit of normal) and AST = 45 when the data set is one of my liver transplant studies?

   Frank Harrell would argue that his "sometimes misguided" default in cph is better than the "almost always wrong" one in coxph though, and there is certainly some strength in that position.

Terry Therneau



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 27 Apr 2011 - 13:34:54 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Apr 2011 - 16:40:53 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive