Re: [Rd] Wish R Core had a standard format (or generic function) for "newdata" objects

From: Christophe Dutang <dutangc_at_gmail.com>
Date: Wed, 27 Apr 2011 18:53:59 +0200

Among many solutions, I generally use the following code, which avoids the ideal average individual, by considering the mean across of the predicted values:

averagingpredict <- function(model, varname, varseq, type, subset=NULL) {

    if(is.null(subset))

        mydata <- model$data
    else

        mydata <- model$data[subset, ]

    f <- function(x)
    {

        mydata[, varname] <- x
        mean(predict(model, newdata=mydata, type=type), na.rm=TRUE)
    }

    sapply(varseq, f)
}

It is time consuming, but it deals with non numeric variables.

Christophe

2011/4/26 Paul Johnson <pauljohn32_at_gmail.com>

> Is anybody working on a way to standardize the creation of "newdata"
> objects for predict methods?
>
> When using predict, I find it difficult/tedious to create newdata data
> frames when there are many variables. It is necessary to set all
> variables at the mean/mode/median, and then for some variables of
> interest, one has to insert values for which predictions are desired.
> I was at a presentation by Scott Long last week and he was discussing
> the increasing emphasis in Stata on calculations of marginal
> predictions and "Spost" an several other packages, and,
> co-incidentally, I had a student visit who is learning to use R MASS's
> polr (W.Venables and B. Ripley) and we wrestled for quite a while to
> try to make the same calculations that Stata makes automatically. It
> spits out predicted probabilities each independent variable, keeping
> other variables at a reference level.
>
> I've found R packages that aim to do essentially the same thing.
>
> In Frank Harrell's Design/rms framework, he uses a "data.dist"
> function that generates an object that the user has to put into the R
> options. I think many users trip over the use of "options" there. If
> I don't use that for a month or two, I completely forget the fine
> points and have to fight with it. But it does "work" to give plots
> and predict functions the information they require.
>
> In Zelig ( by Kosuke Imai, Gary King, and Olivia Lau), a function
> "setx" does the work of creating "newdata" objects. That appears to be
> about right as a candidate for a generic "newdata" function. Perhaps
> it could directly generalize to all R regression functions, but right
> now it is tailored to the models in Zelig. It has separate methods for
> the different types of models, and that is a bit confusing to me,since
> the "newdata" in one model should be the same as the newdata in
> another, I'm guessing. But his code is all there, I'll keep looking.
>
> In Effects (by John Fox), there are internal functions to create
> newdata and plot the marginal effects. If you load effects and run,
> for example, "effects:::effect.lm" you see Prof Fox has his own way of
> grabbing information from model columns and calculating predictions.
>
> I think it is time the R Core Team would look at this tell "us" what
> is the right way to do this. I think the interface to setx in Zelig is
> pretty easy to understand, at least for numeric variables.
>
> In R's termplot function, such a thing could be put to use. As far as
> I can tell now, termplot is doing most of the work of creating a
> newdata object, but not exactly.
>
> It seems like it would be a shame to proliferate more functions that
> do the same function, when it is such a common thing.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Christophe DUTANG
Ph. D. student at ISFA, Lyon, France

	[[alternative HTML version deleted]]

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 27 Apr 2011 - 16:57:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 27 Apr 2011 - 19:10:52 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive