Re: [R] issue building dataframes with matrices.

From: <Bill.Venables_at_csiro.au>
Date: Wed, 13 Aug 2008 15:03:36 +1000

It's a feature and it's been there forever. (It's even present in another system not unlike R.)

Suppose you set

y <- matrix(1:3)

and construct

dfr <- data.frame(x=1:3, y)

Then you invoke the constructor function, data.frame, which by default simplifies things like matrices to single columns, naming them as necessary.

Now if you directly modify dfr by adding another component, like

dfr$yy <- y

You bypass the constructor function and its default simplifications, but you do not bypass the structure tests. This is, in fact the simplest way to put a matrix inside a data frame intact, but it must have the same number of rows as has the data frame itself.

There are other ways of getting a matrix into a data frame intact, and sometimes it is mildly useful to do this. Consider, for example, the following:

dfr <- within(data.frame(x = 1:5), {

    y <- rbinom(5, 100, plogis((x-3)/2))     SF <- cbind(S = y, F = 100-y)
    rm(y)
  })   

names(dfr) ### Note the apparent discrepancy dfr ### with the printed version.

(fm <- glm(SF ~ x, binomial, dfr))

Bill Venables
http://www.cmis.csiro.au/bill.venables/

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Daryl Morris
Sent: Wednesday, 13 August 2008 11:31 AM To: r-help_at_r-project.org
Subject: [R] issue building dataframes with matrices.

Hello,
Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.

> y <- matrix(1:3,nrow=3) # y is a single-column matrix
> df <-data.frame(x=1:3,y=y)
> sapply(df,data.class)

        x y
"numeric" "numeric"
> df$yy <- y
> sapply(df,data.class)

        x y yy
"numeric" "numeric" "matrix"

I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like

there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error.

> df$out = df$x+df$y+df$yy + rnorm(3)
> df
  x y yy out

1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452

 

> glmout = glm(out~x+y+yy,data=df)
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"

was supplied
>
> predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"

was supplied
> predict(glmout,newdata=df[,-4])

        1 2 3
 2.548387 6.551939 10.555491
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
  prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here.

This is also weird to me:

> df$yy <- as.data.frame(y)
> df
  x y V1 out

1 1 1  1  3.066348
2 2 2  2  5.516017
3 3 3  3 11.073452

> glmout = glm(out~x+y+V1,data=df)
Error in eval(expr, envir, enclos) : object "V1" not found
> glmout = glm(out~x+y+yy,data=df)

Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :
  invalid type (list) for variable 'yy'
> glmout = glm(out~x+y+yy$VI,data=df)
Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :
  invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?

thanks, Daryl Morris
(Biostatistics, Univ. of Washington)



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Aug 2008 - 05:09:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 13 Aug 2008 - 06:33:49 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive