From: <Bill.Venables_at_csiro.au>

Date: Wed, 13 Aug 2008 15:03:36 +1000

> glmout = glm(out~x+y+V1,data=df)

Error in eval(expr, envir, enclos) : object "V1" not found

* > glmout = glm(out~x+y+yy,data=df)
*

Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :

invalid type (list) for variable 'yy'

> glmout = glm(out~x+y+yy$VI,data=df)

Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :

invalid type (NULL) for variable 'yy$VI'

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Aug 2008 - 05:09:58 GMT

Date: Wed, 13 Aug 2008 15:03:36 +1000

It's a feature and it's been there forever. (It's even present in another system not unlike R.)

Suppose you set

y <- matrix(1:3)

and construct

dfr <- data.frame(x=1:3, y)

Then you invoke the constructor function, data.frame, which by default simplifies things like matrices to single columns, naming them as necessary.

Now if you directly modify dfr by adding another component, like

dfr$yy <- y

You bypass the constructor function and its default simplifications, but you do not bypass the structure tests. This is, in fact the simplest way to put a matrix inside a data frame intact, but it must have the same number of rows as has the data frame itself.

There are other ways of getting a matrix into a data frame intact, and sometimes it is mildly useful to do this. Consider, for example, the following:

dfr <- within(data.frame(x = 1:5), {

y <- rbinom(5, 100, plogis((x-3)/2))
SF <- cbind(S = y, F = 100-y)

rm(y)

})

names(dfr) ### Note the apparent discrepancy dfr ### with the printed version.

(fm <- glm(SF ~ x, binomial, dfr))

Bill Venables

http://www.cmis.csiro.au/bill.venables/

-----Original Message-----

From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
On Behalf Of Daryl Morris

Sent: Wednesday, 13 August 2008 11:31 AM
To: r-help_at_r-project.org

Subject: [R] issue building dataframes with matrices.

Hello,

Is this a bug or a feature? I am using R 2.7.1 on Apple OS X.

* > y <- matrix(1:3,nrow=3) # y is a single-column matrix
*

> df <-data.frame(x=1:3,y=y)

* > sapply(df,data.class)
*

x y

"numeric" "numeric"

* > df$yy <- y
** > sapply(df,data.class)
*

x y yy

"numeric" "numeric" "matrix"

I'm not sure why dataframes are allowed to have matrices as members. It's also weird to me that y & yy have different classes. It seems like

there has been a blurring of the line between lists and dataframes. When did dataframes start taking members other than vectors?

This is an issue if one for example builds a dataframe to fit a model, and then subsequently wants to use predict. You have to work a bit to avoid a type mismatch error.

> df$out = df$x+df$y+df$yy + rnorm(3)

* > df
*

x y yy out

1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452

* > glmout = glm(out~x+y+yy,data=df)
** > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=1:3))
*

Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"

was supplied

* >
** > predict(glmout,newdata=data.frame(x=1:3,y=1:3,yy=matrix(1:3)))
*

Error: variable 'yy' was fitted with type "nmatrix.1" but type "numeric"

was supplied

* > predict(glmout,newdata=df[,-4])
*

1 2 3

2.548387 6.551939 10.555491

Warning message:

In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==
:

prediction from a rank-deficient fit may be misleading

I'm not really looking for a "solution", as I can already identify several workarounds. I guess I'm mainly trying to figure out what the philosophy is here.

This is also weird to me:

> df$yy <- as.data.frame(y)

* > df
*

x y V1 out

1 1 1 1 3.066348 2 2 2 2 5.516017 3 3 3 3 11.073452

> glmout = glm(out~x+y+V1,data=df)

Error in eval(expr, envir, enclos) : object "V1" not found

Error in model.frame.default(formula = out ~ x + y + yy, data = df, drop.unused.levels = TRUE) :

invalid type (list) for variable 'yy'

> glmout = glm(out~x+y+yy$VI,data=df)

Error in model.frame.default(formula = out ~ x + y + yy$VI, data = df, :

invalid type (NULL) for variable 'yy$VI'

Is it impossible to build a model from a dataframe built this way?

thanks, Daryl Morris

(Biostatistics, Univ. of Washington)

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 13 Aug 2008 - 05:09:58 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 13 Aug 2008 - 06:33:49 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*