From: Greg Snow <Greg.Snow_at_imail.org>

Date: Mon, 12 May 2008 10:03:18 -0600

Date: Mon, 12 May 2008 10:03:18 -0600

I would have thought that:

*> lm( C1 ~ M^2, data=DF )
*

Would give the main effects and 2 way interaction(s) (but a quick test did not match my expectation). Possibly a feature request is in order if people plan to use this a lot.

-- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111Received on Mon 12 May 2008 - 17:23:13 GMT

> -----Original Message-----

> From: r-help-bounces_at_r-project.org> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Ted Harding> Sent: Sunday, May 11, 2008 2:07 PM> To: Myers, Brent> Cc: r-help_at_r-project.org> Subject: Re: [R] Fundamental formula and dataframe question.>> On 11-May-08 18:58:45, Myers, Brent wrote:> > There is a very useful and apparently fundamental feature> of R (or of> > the package pls) which I don't understand.> >> > For datasets with many independent (X) variables such as> chemometric> > datasets there is a convenient formula and dataframe> construction that> > allows one to access the entire X matrix with a single term.> >> > Consider the gasoline dataset available in the pls package. For the> > model statement in the plsr function one can write: Octane ~ NIR> >> > NIR refers to a (wide) matrix which is a portion of a> dataframe. The> > naming of the columns is of the form: 'NIR.xxxx nm'> >> > names(gasoline) returns...> >> > $names> > [1] "octane" "NIR"> >> > instead of...> >> > $names> > [1] "octane" "NIR.1000 nm" "NIR.1001 nm" ...> >> > How do I construct and manipulate such dataframes and the> column names> > that go with?> >> > Does the use of these types of formulas and dataframes> generalize to> > other modeling functions?> >> > Some specific clues on a help search might be enough, I've> tried many.> >> > Regards,> > Brent>> I don't have the 'gasoline' dataset to hand, but I can> produce something to which your descrption applies as follows:>> C1 <- c(1.1,1.2,1.3,1.4)> C2 <- c(2.1,2.2,2.3,2.4)> M <- cbind(M1=c(11.1,11.2,11.3,11.4),> M2=c(12.1,12.2,12.3,12.4))> DF <- data.frame(C1=C1,C2=C2,M=M)> DF> # C1 C2 M.M1 M.M2> # 1 1.1 2.1 11.1 12.1> # 2 1.2 2.2 11.2 12.2> # 3 1.3 2.3 11.3 12.3> # 4 1.4 2.4 11.4 12.4>> so the two columns C1 and C2 have gone in as named, and the> matrix M (with named columns M1 and M2) has gone in with> columns M.M1, M.M2>> Now let's fuzz the numbers a bit, so that the lm() fit makes sense:>> C1 <- C1 + round(0.1*runif(4),2)> C1 <- C1 + round(0.1*runif(4),2)> M <- cbind(M1=c(11.1,11.2,11.3,11.4),> M2=c(12.1,12.2,12.3,12.4)) +> round(0.1*runif(8),2)> DF <- data.frame(C1=C1,C2=C2,M=M)> DF> # C1 C2 M.M1 M.M2> # 1 1.21 2.1 11.19 12.13> # 2 1.34 2.2 11.23 12.23> # 3 1.38 2.3 11.36 12.30> # 4 1.50 2.4 11.43 12.48>> summary(lm(C1 ~ M),data=DF)> # Call:> # lm(formula = C1 ~ M)> # Residuals:> # 1 2 3 4> # -0.02422 0.02448 0.01309 -0.01335> # Coefficients:> # Estimate Std. Error t value Pr(>|t|)> # (Intercept) -8.28435 2.48952 -3.328 0.186> # MM1 -0.05411 0.66909 -0.081 0.949> # MM2 0.83463 0.50687 1.647 0.347> # Residual standard error: 0.03919 on 1 degrees of freedom> # Multiple R-Squared: 0.9642, Adjusted R-squared: 0.8925> # F-statistic: 13.46 on 2 and 1 DF, p-value: 0.1893>> In other words, a perfectly standard LM fit, equivalent to>> summary(lm(C1 ~ M[,1]+M[,2]))>> (as you can check). So all that looks straightforward.>> One thing, however, is not clear to me in this scenario.> Suppose, for example, that the columns M1 and M2 of M were> factors (and that you had more rows than I've used above, so> that the fit is non-trivial).>> Then, in the standard specification of an LM, you could write>> summary(lm(C1 ~ M[,1]*M[,2]))>> and get the main effects and interactions. But how would you> do that in the other type of specification:>> Where you used> summary(lm(C1 ~ M, data=DF))> to get the equivalent of> summary(lm(C1 ~ M[,1]+M[,2]))> what would you use to get the equivalent of> summary(lm(C1 ~ M[,1]*M[,2]))??>> Would you have to "spell out" the interaction term[s] in> additional columns of M?>> Hmmm, interesting! I hadn't been aware of this aspect of> formula and dataframe construction for modellinng, until you> pointed it out!>> Best wishes,> Ted.>> --------------------------------------------------------------------> E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk>> Fax-to-email: +44 (0)870 094 0861> Date: 11-May-08 Time: 21:06:49> ------------------------------ XFMail ------------------------------>> ______________________________________________> R-help_at_r-project.org mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide> http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.>

______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 12 May 2008 - 18:30:36 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*