Re: [R] many chr2factors ?

From: Christoph Buser <buser_at_stat.math.ethz.ch>
Date: Thu 02 Jun 2005 - 01:23:03 EST

Dear Christian

If you create your data frame by using data.frame all characters are automatically transformed into factors unless you force them to stay a character. Maybe that can solve your problem easily.

dat <- data.frame(a=1:10, b=letters[1:10]) str(dat)
  `data.frame': 10 obs. of 2 variables:   $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10   $ b: int 1 2 3 4 5 6 7 8 9 10  

Assuming that doesn't solve your problem due to the way your data frame are created you can do it afterwards.

There are two problems with your code.

First: (and that causes the error) you use in your repeat

if(!is.character(df[,i]))
  next

Imagine that the last column of you data frame is not a character you jump to the next cycle and then you are outside of the range of your data frame. Your break condition is ignored.

Second: You change your data frame inside of a function. Variables that are created or changed within a function are local. Their life ends with the end of the function. Therefore all changes you do will have no effect on the global data frame you want to change. See the example:

dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"),

                  row.names = as.character(1:10), class = "data.frame")
str(data.frame(dat1))
  `data.frame':	10 obs. of  2 variables:
  $ a: int 1 2 3 4 5 6 7 8 9 10
  $ b: chr "a" "b" "c" "d" ...
tofac(dat1)
  [1] 2
str(data.frame(dat1))
  `data.frame': 10 obs. of 2 variables:   $ a: int 1 2 3 4 5 6 7 8 9 10
  $ b: chr "a" "b" "c" "d" ...

You can use the following code instead

tofac <- function(x){
  for(i in 1:length(x)) {
    if(is.character(x[,i]))
      x[,i] <- factor(x[,i])
  }
  x
}

dat1 <- tofac(dat1)
  [1] 2
str(dat1)
  `data.frame': 10 obs. of 2 variables:   $ a: int 1 2 3 4 5 6 7 8 9 10
  $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10

The for loop avoids the problem with the index. Therefore it works in example that have a non character variable in the last column, too and by returning x at the end you are sure that you object keeps existing.

Regards,

Christoph



Christoph Buser <buser@stat.math.ethz.ch> Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228

http://stat.ethz.ch/~buser/

christian schulz writes:
> Hi,
>
> i would like transfrom
> characters from a data.frame to factors automatic.
>
> > tofac <- function(df){
> + i=0
> + repeat{
> + i <- i+1
> + if(!is.character(df[,i]))
> + next
> + df[,i] <- as.factor(df[,i])
> + print(i)
> + if(i == length(df))
> + break }
> + }
> >
> > tofac(abrdat)
> [1] 7
> [1] 8
> [1] 9
> [1] 11
> [1] 13
> [1] 15
> Error in "[.data.frame"(df, , i) : undefined columns selected
>
> This are the correct columns and i get the idea put into the loop
> a empty matrix with dimension like df and return it!?
>
> Another check?
> abrdat2 <- apply(abrdat,2,function(x)
> ifelse(is.character(x),as.factor(x),x))
>
>
> many thanks & regards,
> christian
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jun 02 01:34:30 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:20 EST