[R] issue building dataframes with matrices.

From: Daryl Morris <darylm_at_u.washington.edu>
Date: Tue, 12 Aug 2008 18:30:40 -0700


Hello,
Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.

> y <- matrix(1:3,nrow=3) # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)

        x y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)

        x y yy
"numeric" "numeric" "matrix"

I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error.

> df$out = df$x+df$y+df$yy + rnorm(3)
> df


  x y yy out

1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452

 

> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric" was supplied
> predict(glmout,newdata=df[,-4])

        1 2 3
 2.548387 6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :   prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here.

This is also weird to me:

> df$yy <- as.data.frame(y)
> df
  x y V1 out

1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452

> glmout = glm(out~x+y+V1,data=df)

Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)

Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :
  invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)

Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :   invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?

thanks, Daryl Morris
(Biostatistics, Univ. of Washington)



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Aug 2008 - 02:11:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 13 Aug 2008 - 05:33:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive