[Rd] using predict() with poly(x, raw=TRUE)

From: John Fox <jfox_at_mcmaster.ca>
Date: Wed, 14 Mar 2012 18:45:06 -0400


Dear r-devel list members,

I've recently encountered the following problem using predict() with a model that has raw-polynomial terms. (Actually, I encountered the problem using model.frame(), but the source of the error is the same.) The problem is technical and concerns the design of poly(), which is why I'm sending this message to r-devel rather than r-help.

To illustrate:

> mod.pres <- lm(prestige ~ log(income, 10) + poly(education, 3) +
poly(women, 2),

+                data=Prestige)  # Prestige data from car package

> predict(mod.pres, newdata=data.frame(education=10, income=6000, women=30))
# works

       1
39.66414

> model.frame(delete.response(terms(mod.pres)), data.frame(education=10,
income=6000, women=30))
  log(income, 10) poly(education, 3).1 poly(education, 3).2 poly(education, 3).3 poly(women, 2).1 poly(women, 2).2

1        3.778151          -0.02691558          -0.08720086
0.07199804      0.003202256     -0.138777837

> mod.pres.raw <- lm(prestige ~ log(income, 10) + poly(education, 3,
raw=TRUE) + poly(women, 2, raw=TRUE),

+                    data=Prestige)

> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30)) # doesn't work
Error in poly(education, 3, raw = TRUE) :   'degree' must be less than number of unique points

> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
Error in poly(education, 3, raw = TRUE) :   'degree' must be less than number of unique points

I understand the source of the error, but what's unclear to me is why it's necessary for poly() to check the degree of the polynomial against the number of unique supplied points when raw=TRUE. For example, if I simply remove this check from poly(), then

> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30))

       1
39.66414

> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
  log(income, 10) poly(education, 3, raw = TRUE).1 poly(education, 3, raw = TRUE).2 poly(education, 3, raw = TRUE).3 poly(women, 2, raw = TRUE).1 poly(women, 2, raw = TRUE).2

1        3.778151                               10
100                             1000                           30
900

Of course, if one then used the modified poly() in a regression with fewer unique Xs than the degree of the polynomial, the model matrix would be singular; but why not just let the error appear at that stage?

I could solve my problem by maintaining a local version of poly(), but why should it not be possible to get predictions under these circumstances?

Best,
 John



John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox

R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 14 Mar 2012 - 22:49:08 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 15 Mar 2012 - 05:10:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive