From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Fri 22 Apr 2005 - 10:59:40 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 22 11:11:33 2005

Date: Fri 22 Apr 2005 - 10:59:40 EST

> -----Original Message-----

*> From: r-help-bounces@stat.math.ethz.ch
**> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of McGehee, Robert
**> Sent: Thursday, April 21, 2005 7:03 PM
**> To: r-help@stat.math.ethz.ch
**> Subject: [R] Strange data frame
**>
**>
**> Hello,
**> I'm playing around with the PLS package and found a data set
**> (NIR) whose
**> structure I don't understand. Forgive me if this is a stupid question,
**> as I feel like it must be since I am less experienced with aspects of
**> modeling.
**>
**> My problem, the pls NIR data frame does not seem to be a typical data
**> frame as, while it is a list, its variables are not of equal length.
**> Furthermore, I have no idea how to reproduce such a structure.
**>
**> But, let's look at the NIR data...
**>
**> > require(pls)
**> > data(NIR)
**> > class(NIR)
**> [1] "data.frame"
**>
**> > str(NIR)
**> `data.frame': 28 obs. of 3 variables:
**> $ X : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
**> ..- attr(*, "dimnames")=List of 2
**> .. ..$ : NULL
**> .. ..$ : NULL
**> $ y : num 100.0 80.2 79.5 60.8 60.0 ...
**> $ train: logi TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
**> TRUE ...
**>
**> > class(NIR$X)
**> [1] "matrix"
**> > class(NIR$y)
**> [1] "numeric"
**>
**> > length(NIR$X)
**> [1] 7504
**> > length(NIR$y)
**> [1] 28
**>
**> Ok, what this looks like to me is that NIR is a data frame
**> (i.e. "a list
**> of variables of the same length with unique row names"), with a matrix
**> of length 7504 as one variable, and a numeric vector of length 28 as
**> another variable, which seems to contradict the definition of a data
**> frame.
**>
**> Moreover, despite my best efforts, I'm unable to put any of
**> my own data
**> in this structure, as the data.frame() and as.data.frame() functions
**> removes the matrix structure i.e.
**> > data.frame(y = NIR$y, X = NIR$X) ## or
**> > as.data.frame(list(y = NIR$y, X = NIR$X))
**> return a different animal altogether.
*

Variables in a data frame can be a matrix whose number of rows matches that of the data frame. Here's one possible ways to do that:

*> dat <- data.frame(y=1:2)
*

> dat$x <- matrix(runif(4),2)

> str(dat)

`data.frame': 2 obs. of 2 variables:

$ y: int 1 2

$ x: num [1:2, 1:2] 0.562 0.670 0.738 0.903

If the number of rows doesn't match, you get:

> dat$x <- matrix(runif(6),3)

Error in "$<-.data.frame"(`*tmp*`, "x", value = c(0.669958727201447,
0.111689866287634, :

replacement has 3 rows, data has 2

> Lastly, this particular structure is useful, because the PLS

*> authors are
**> able to concisely write models such as,
**>
**> mvr(y ~ X, data = NIR[NIR$train, ])
**>
**> instead of what I imagine would be a more complicated alternative if
**> they didn't have a data frame of a matrix and a vector as they do. Any
**> pointers to something I overlooked is appreciated.
*

Many modeling functions will accept matrix predictors, including lm()/glm()/rpart()/etc.

Andy

*> Best,
*

> Robert

*>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
**>
*

>

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 22 11:11:33 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:31:22 EST
*