RE: [R] Strange data frame

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 22 Apr 2005 - 10:59:40 EST

> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of McGehee, Robert
> Sent: Thursday, April 21, 2005 7:03 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Strange data frame
>
>
> Hello,
> I'm playing around with the PLS package and found a data set
> (NIR) whose
> structure I don't understand. Forgive me if this is a stupid question,
> as I feel like it must be since I am less experienced with aspects of
> modeling.
>
> My problem, the pls NIR data frame does not seem to be a typical data
> frame as, while it is a list, its variables are not of equal length.
> Furthermore, I have no idea how to reproduce such a structure.
>
> But, let's look at the NIR data...
>
> > require(pls)
> > data(NIR)
> > class(NIR)
> [1] "data.frame"
>
> > str(NIR)
> `data.frame': 28 obs. of 3 variables:
> $ X : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
> ..- attr(*, "dimnames")=List of 2
> .. ..$ : NULL
> .. ..$ : NULL
> $ y : num 100.0 80.2 79.5 60.8 60.0 ...
> $ train: logi TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> TRUE ...
>
> > class(NIR$X)
> [1] "matrix"
> > class(NIR$y)
> [1] "numeric"
>
> > length(NIR$X)
> [1] 7504
> > length(NIR$y)
> [1] 28
>
> Ok, what this looks like to me is that NIR is a data frame
> (i.e. "a list
> of variables of the same length with unique row names"), with a matrix
> of length 7504 as one variable, and a numeric vector of length 28 as
> another variable, which seems to contradict the definition of a data
> frame.
>
> Moreover, despite my best efforts, I'm unable to put any of
> my own data
> in this structure, as the data.frame() and as.data.frame() functions
> removes the matrix structure i.e.
> > data.frame(y = NIR$y, X = NIR$X) ## or
> > as.data.frame(list(y = NIR$y, X = NIR$X))
> return a different animal altogether.

Variables in a data frame can be a matrix whose number of rows matches that of the data frame. Here's one possible ways to do that:

> dat <- data.frame(y=1:2)
> dat$x <- matrix(runif(4),2)
> str(dat)

`data.frame': 2 obs. of 2 variables:
 $ y: int 1 2
 $ x: num [1:2, 1:2] 0.562 0.670 0.738 0.903

If the number of rows doesn't match, you get:

> dat$x <- matrix(runif(6),3)

Error in "$<-.data.frame"(`*tmp*`, "x", value = c(0.669958727201447, 0.111689866287634, :

        replacement has 3 rows, data has 2    

> Lastly, this particular structure is useful, because the PLS
> authors are
> able to concisely write models such as,
>
> mvr(y ~ X, data = NIR[NIR$train, ])
>
> instead of what I imagine would be a more complicated alternative if
> they didn't have a data frame of a matrix and a vector as they do. Any
> pointers to something I overlooked is appreciated.

Many modeling functions will accept matrix predictors, including lm()/glm()/rpart()/etc.

Andy  

> Best,
> Robert
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Apr 22 11:11:33 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:31:22 EST