Re: [R] pulling items out of a lm() call

From: Peter Dalgaard <>
Date: Mon 01 May 2006 - 21:37:21 EST

Andrew Gelman <> writes:

> I want to write a function to standardize regression predictors, which
> will require me to do some character-string manipulation to parse the
> variables in a call to lm() or glm().
> For example, consider the call
> lm (y ~ female + I(age^2) + female:black + (age + education)*female).
> I want to be able to parse this to pick out the input variables
> ("female", "age", "black", "education"). Then I can transform these as
> appropriate (to get "z.female", "z.age", etc), feed them back into the
> lm() function, and go from there.
> Does anyone know an easy way to pull out the variables? I basically
> have to parse out the symbols "+", ":", "*", and " ", but there's also
> the problem of handling parentheses and the I() operator.

At which level of generality do you want this?

> attr(terms(y ~ female + I(age^2) + female:black + (age +
+ education)*female),"variables")

list(y, female, I(age^2), black, age, education)

> attr(delete.response(terms(y ~ female + I(age^2) + female:black +
+ (age + education)*female)),"variables") list(female, I(age^2), black, age, education)

This gets you some of the way. However, there are complications: You can't just remove composite terms like "I(age^2)" because it is not guaranteed that "age" is in among the other terms:

> attr(terms( ~ I(speed^2)),"variables")

So you need some way to tease out the individual variables inside I().

Here's a first cut.

l <- attr(delete.response(terms(y ~ female + I(age^2) + female:black

             + (age + education)*female)),"variables")

getterms <- function(e) {

    if ( e
    else if ( lapply(e[-1], getterms)}

unique(c(lapply(l[-1],getterms), recursive=TRUE))

and possibly throw in an as.character() to get a vector of strings, rather than a list of symbols. Notice that since anything can go inside I(), you can get in trouble if parts of the expression is not intended as a variable (e.g., y^lambda where lambda is a scalar). The getterms function above pragmatically assumes that at least function names need to be discarded.

   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (                  FAX: (+45) 35327907

______________________________________________ mailing list
PLEASE do read the posting guide!
Received on Mon May 01 21:44:27 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 01 May 2006 - 22:09:56 EST.

Mailing list information is available at Please read the posting guide before posting to the list.