Re: [R] using lm() with variable formula

From: Vladimir Eremeev <wl2776_at_gmail.com>
Date: Mon, 21 May 2007 04:17:31 -0700 (PDT)

I was solving similar problem some time ago. Here is my script.
I had a data frame, containing a response and several other variables, which were assumed predictors.
I was trying to choose the best linear approximation. This approach now seems to me useless, please, don't blame me for that. However, the script might be useful to you.

<code>
library(forward)

# dfr is a data.frame, that contains everything.
# The response variable is named med5x
# The following lines construct linear models for all possibe formulas
# of the form 
# med5x~T+a+height
# med5x~a+height+RH
# T, a, RH, etc are the names of possible predictors

inputs<-names(dfr)[c(10:30,1)] # dfr was a very large data frame, containing lot of variables.
# here we have chosen only a subset of them.

for(nc in 11:length(inputs)){ # the linear models were assumed to have at least 11 terms
# now we are generating character vectors containing formulas.

  formulas<-paste("med5x",sep="~",                  

fwd.combn(inputs,nc,fun=function(x){paste(x,collapse="+")}))

# and then, are trying to fit every

  for(f in formulas){
    lms<-lm(eval(parse(text=f)),data=dfr)    

cat(file="linear_models.txt",f,sum(residuals(lms)^2),"\n",sep="\t",append=TRUE)   }
}

</code>

Hmm, looking back, I see that this is rather inefficient script. For example, the inner cycle can easily be replaced with the apply function.

Chris Elsaesser wrote:
>
> New to R; please excuse me if this is a dumb question. I tried to RTFM;
> didn't help.
>
> I want to do a series of regressions over the columns in a data.frame,
> systematically varying the response variable and the the terms; and not
> necessarily including all the non-response columns. In my case, the
> columns are time series. I don't know if that makes a difference; it
> does mean I have to call lag() to offset non-response terms. I can not
> assume a specific number of columns in the data.frame; might be 3, might
> be 20.
>
> My central problem is that the formula given to lm() is different each
> time. For example, say a data.frame had columns with the following
> headings: height, weight, BP (blood pressure), and Cals (calorie intake
> per time frame). In that case, I'd need something like the following:
>
> lm(height ~ weight + BP + Cals)
> lm(height ~ weight + BP)
> lm(height ~ weight + Cals)
> lm(height ~ BP + Cals)
> lm(weight ~ height + BP)
> lm(weight ~ height + Cals)
> etc.
>
> In general, I'll have to read the header to get the argument labels.
>
> Do I have to write several functions, each taking a different number of
> arguments? I'd like to construct a string or list representing the
> varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp
> programmer where that part would be very simple. Anyone have a Lisp API
> for R? :-}]
>
>

-- 
View this message in context: http://www.nabble.com/using-lm%28%29-with-variable-formula-tf3772540.html#a10716815
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 21 May 2007 - 11:32:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 May 2007 - 12:31:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.