From: McGehee, Robert <Robert.McGehee_at_geodecapital.com>
Date: Fri 22 Apr 2005 - 09:02:32 EST

I'm playing around with the PLS package and found a data set (NIR) whose structure I don't understand. Forgive me if this is a stupid question, as I feel like it must be since I am less experienced with aspects of modeling.

My problem, the pls NIR data frame does not seem to be a typical data frame as, while it is a list, its variables are not of equal length. Furthermore, I have no idea how to reproduce such a structure.

But, let's look at the NIR data...

> require(pls)
> data(NIR)
> class(NIR)

[1] "data.frame"

> str(NIR)

`data.frame': 28 obs. of 3 variables:
 $ X : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...

  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL

 $ y : num 100.0 80.2 79.5 60.8 60.0 ...  $ train: logi TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ...
> class(NIR$X)

[1] "matrix"
> class(NIR$y)

[1] "numeric"

> length(NIR$X)

[1] 7504
> length(NIR$y)

[1] 28

Ok, what this looks like to me is that NIR is a data frame (i.e. "a list of variables of the same length with unique row names"), with a matrix of length 7504 as one variable, and a numeric vector of length 28 as another variable, which seems to contradict the definition of a data frame.

Moreover, despite my best efforts, I'm unable to put any of my own data in this structure, as the data.frame() and as.data.frame() functions removes the matrix structure i.e.
> data.frame(y = NIR$y, X = NIR$X) ## or
> as.data.frame(list(y = NIR$y, X = NIR$X))
return a different animal altogether.

Lastly, this particular structure is useful, because the PLS authors are able to concisely write models such as,

mvr(y ~ X, data = NIR[NIR$train, ])

instead of what I imagine would be a more complicated alternative if they didn't have a data frame of a matrix and a vector as they do. Any pointers to something I overlooked is appreciated.


