From: Daryl Morris <darylm_at_u.washington.edu>

Date: Tue, 12 Aug 2008 18:30:40 -0700

* > glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found

* > glmout = glm(out~x+y+yy,data=df)
Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :

invalid type (list) for variable 'yy'

* > glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, : invalid type (NULL) for variable 'yy$VI'

R-help_at_r-project.org mailing list

Received on Wed 13 Aug 2008 - 02:11:43 GMT

Date: Tue, 12 Aug 2008 18:30:40 -0700

Hello,

Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.

* > y <- matrix(1:3,nrow=3) # y is a single-column matrix
** > df <-data.frame(x=1:3,y=y)
** > sapply(df,data.class)
x y

"numeric" "numeric"

* > df$yy <- y
** > sapply(df,data.class)
x y yy

"numeric" "numeric" "matrix"

I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error.

* > df$out = df$x+df$y+df$yy + rnorm(3)
** > df
x y yy out

1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452

* > glmout = glm(out~x+y+yy,data=df)
** > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
*

Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"
was supplied

** > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
*

Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"
was supplied

* > predict(glmout,newdata=df[,-4])
1 2 3

2.548387 6.551939 10.555491

Warning message:

In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here.

This is also weird to me:

> df$yy <- as.data.frame(y)

*

x y V1 out

1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452

Error in eval(expr, envir, enclos) : object "V1" not found

Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :

invalid type (list) for variable 'yy'

Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, : invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?

thanks, Daryl Morris

(Biostatistics, Univ. of Washington)

*