Re: [R] as.data.frame(cbind()) transforming numeric to factor?

From: Marc Schwartz (via MN) <mschwartz_at_mn.rr.com>
Date: Sat 19 Aug 2006 - 00:55:44 EST

On Fri, 2006-08-18 at 10:41 -0400, Tom Boonen wrote:
> Dear List,
>
> why does as.data.frame(cbind()) transform numeric variables to
> factors, once one of the other variablesused is a character vector?
>
> #
> x.1 <- rnorm(10)
> x.2 <- c(rep("Test",10))
> Foo <- as.data.frame(cbind(x.1))
> is.factor(Foo$x.1)
>
> Foo <- as.data.frame(cbind(x.1,x.2))
> is.factor(Foo$x.1)
> #
>
> I assume there is a good reason for this, can somebody explain? Thanks.
>
> Best,
> Tom

See the Note section of ?cbind, which states:

The method dispatching is not done via UseMethod(), but by C-internal dispatching. Therefore, there is no need for, e.g., rbind.default.

The dispatch algorithm is described in the source file (ā€˜.../src/main/bind.cā€™) as

  1. For each argument we get the list of possible class memberships from the class attribute.
  2. We inspect each class in turn to see if there is an an applicable method.
  3. If we find an applicable method we make sure that it is identical to any method determined for prior arguments. If it is identical, we proceed, otherwise we immediately drop through to the default code.

If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first. (Note that this algorithm can result in calling the data frame method if the arguments are all either data frames or vectors, and this will result in the coercion of character vectors to factors.)

Thus, note the result of:

> str(cbind(x.1, x.2))

 chr [1:10, 1:2] "-0.265756038510064" "2.13220714034528" ...

Since a matrix can only contain a single data type, the numeric vector is coerced to character.

Then using as.data.frame() coerces the character matrix to factors, which is the default behavior.

If you want to create a data frame, do it this way:

> str(data.frame(x.1, x.2))
`data.frame': 10 obs. of 2 variables:  $ x.1: num -0.266 2.132 2.096 -0.128 -0.466 ...  $ x.2: Factor w/ 1 level "Test": 1 1 1 1 1 1 1 1 1 1

or if you want to retain the character vector, use I():

> str(data.frame(x.1, I(x.2)))
`data.frame': 10 obs. of 2 variables:  $ x.1: num -0.266 2.132 2.096 -0.128 -0.466 ...  $ x.2:Class 'AsIs' chr [1:10] "Test" "Test" "Test" "Test" ...

See ?data.frame for more information.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Aug 19 01:12:46 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 19 Aug 2006 - 02:22:10 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.