

**> Hello,
**> I'm playing around with the PLS package and found a data set
**> (NIR) whose
**> structure I don't understand. Forgive me if this is a stupid question,
**> as I feel like it must be since I am less experienced with aspects of
**> modeling.

**> My problem, the pls NIR data frame does not seem to be a typical data
**> frame as, while it is a list, its variables are not of equal length.
**> Furthermore, I have no idea how to reproduce such a structure.

> But, let's look at the NIR data...

> require(pls)
> data(NIR)
> class(NIR)
[1] "data.frame"

> str(NIR)
`data.frame': 28 obs. of 3 variables:
$ X : num [1:28, 1:268] 3.07 3.07 3.08 3.08 3.10 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : NULL
$ y : num 100.0 80.2 79.5 60.8 60.0 ...
$ train: logi TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ...


> class(NIR$X)
[1] "matrix"
> class(NIR$y)
[1] "numeric"

> length(NIR$X)
[1] 7504
> length(NIR$y)
[1] 28

> Ok, what this looks like to me is that NIR is a data frame (i.e. "a list
**> (i.e. "a list
of length 7504 as one variable, and a numeric vector of length 28 as
another variable, which seems to contradict the definition of a data
frame.


> Moreover, despite my best efforts, I'm unable to put any of my own data
**> my own data
removes the matrix structure i.e.

> data.frame(y = NIR$y, X = NIR$X) ## or
> as.data.frame(list(y = NIR$y, X = NIR$X))
return a different animal altogether.


Variables in a data frame can be a matrix whose number of rows matches that of the data frame. Here's one possible ways to do that:

> dat <- data.frame(y=1:2)


> dat$x <- matrix(runif(4),2)

> str(dat)

`data.frame': 2 obs. of 2 variables:

$ y: int 1 2

$ x: num [1:2, 1:2] 0.562 0.670 0.738 0.903

If the number of rows doesn't match, you get:

> dat$x <- matrix(runif(6),3)

Error in "$<-.data.frame"(`*tmp*`, "x", value = c(0.669958727201447,
0.111689866287634, :

replacement has 3 rows, data has 2

> Lastly, this particular structure is useful, because the PLS authors are

*> authors are

**>

**>
they didn't have a data frame of a matrix and a vector as they do. Any
pointers to something I overlooked is appreciated.



Many modeling functions will accept matrix predictors, including lm()/glm()/rpart()/etc.

Andy

> Best,


> Robert









