[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Wed, 14 Dec 2011 00:30:33 -0600

I'm making some functions to illustrate regressions and I have been staring at termplot and predict.lm and residuals.lm to see how this is done. I've wondered who wrote predict.lm originally, because I think it is very clever.

I got interested because termplot doesn't work with interactive models:

> m1 <- lm(y ~ x1*x2)
> termplot(m1)

Error in `[.data.frame`(mf, , i) : undefined columns selected

Digging into that, I realized some surprising implications of nonlinear formulas.

This issue arises when there are math functions in the regression formula. The question focuses on what we mean by the mean of "x" when we are discussing predictions and deviations.

Suppose one fits:

m1 <- lm (y ~ x1 + log(x2), data=dat)

I had thought the partial residual was calculated with reference to the log of the mean of x2. But that's not right. It is calculated with reference to mean(log(x2)). That seems misleading, termplot shows a graph illustrating the effect of x2 on the horizontal axis (not "log(x2)"). I should not say misleading. Rather, it is unexpected. I think users who want the reference value in the plot of x2 to be the mean of x2 have a legitimate concern here.

With a more elaborate formula, the mismatch gets more confusing. Suppose the regression formula is

m2 <- lm (y ~ x1 + poly(x2,3), data=dat)

The model frame has these variables:

  y x1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3

and the partial residual calculation for variable x1, which I had expected would be based on a polynomial transformation of mean(x2), is the weighted sum of the means of the 3 polys.

Can you help me see this more clearly? (Or less wrongly?)

Perhaps you think I don't understand partial residuals in termplot, but I am pretty sure I do. I made notes about it. See slides 54 and 55 in here: http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf

Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

R-devel_at_r-project.org mailing list
Received on Wed 14 Dec 2011 - 06:32:56 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 14 Dec 2011 - 12:00:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive