[R] Fundamental formula and dataframe question.

From: Myers, Brent <MyersDB_at_missouri.edu>
Date: Sun, 11 May 2008 13:58:45 -0500


There is a very useful and apparently fundamental feature of R (or of the package pls) which I don't understand.

For datasets with many independent (X) variables such as chemometric datasets there is a convenient formula and dataframe construction that allows one to access the entire X matrix with a single term.

Consider the gasoline dataset available in the pls package. For the model statement in the plsr function one can write: Octane ~ NIR

NIR refers to a (wide) matrix which is a portion of a dataframe. The naming of the columns is of the form: 'NIR.xxxx nm'

names(gasoline) returns...

$names

[1] "octane" "NIR"

instead of...

$names

[1] "octane" "NIR.1000 nm" "NIR.1001 nm" ...

How do I construct and manipulate such dataframes and the column names that go with?

Does the use of these types of formulas and dataframes generalize to other modeling functions?

Some specific clues on a help search might be enough, I've tried many.

Regards,
Brent

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 11 May 2008 - 19:01:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 11 May 2008 - 20:31:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive