Re: [R] many chr2factors ?

From: christian schulz <ozric_at_web.de>
Date: Thu 02 Jun 2005 - 04:59:26 EST

...many thanks to clarify for me some things! christian

>Dear Christian
>
>If you create your data frame by using data.frame all characters
>are automatically transformed into factors unless you force them
>to stay a character. Maybe that can solve your problem easily.
>
>dat <- data.frame(a=1:10, b=letters[1:10])
>str(dat)
> `data.frame': 10 obs. of 2 variables:
> $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
> $ b: int 1 2 3 4 5 6 7 8 9 10
>
>Assuming that doesn't solve your problem due to the way your
>data frame are created you can do it afterwards.
>
>There are two problems with your code.
>
>First: (and that causes the error) you use in your repeat
>
>if(!is.character(df[,i]))
> next
>
>Imagine that the last column of you data frame is not a
>character you jump to the next cycle and then you are outside of
>the range of your data frame. Your break condition is ignored.
>
>Second: You change your data frame inside of a
>function. Variables that are created or changed within a
>function are local. Their life ends with the end of the
>function. Therefore all changes you do will have no effect on
>the global data frame you want to change. See the example:
>
>dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"),
> row.names = as.character(1:10), class = "data.frame")
>str(data.frame(dat1))
> `data.frame': 10 obs. of 2 variables:
> $ a: int 1 2 3 4 5 6 7 8 9 10
> $ b: chr "a" "b" "c" "d" ...
>tofac(dat1)
> [1] 2
>str(data.frame(dat1))
> `data.frame': 10 obs. of 2 variables:
> $ a: int 1 2 3 4 5 6 7 8 9 10
> $ b: chr "a" "b" "c" "d" ...
>
>You can use the following code instead
>
>tofac <- function(x){
> for(i in 1:length(x)) {
> if(is.character(x[,i]))
> x[,i] <- factor(x[,i])
> }
> x
>}
>
>dat1 <- tofac(dat1)
> [1] 2
>str(dat1)
> `data.frame': 10 obs. of 2 variables:
> $ a: int 1 2 3 4 5 6 7 8 9 10
> $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
>
>The for loop avoids the problem with the index. Therefore it
>works in example that have a non character variable in the last
>column, too and by returning x at the end you are sure that you
>object keeps existing.
>
>Regards,
>
>Christoph
>
>--------------------------------------------------------------
>Christoph Buser <buser@stat.math.ethz.ch>
>Seminar fuer Statistik, LEO C13
>ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND
>phone: x-41-44-632-4673 fax: 632-1228
>http://stat.ethz.ch/~buser/
>--------------------------------------------------------------
>
>christian schulz writes:
> > Hi,
> >
> > i would like transfrom
> > characters from a data.frame to factors automatic.
> >
> > > tofac <- function(df){
> > + i=0
> > + repeat{
> > + i <- i+1
> > + if(!is.character(df[,i]))
> > + next
> > + df[,i] <- as.factor(df[,i])
> > + print(i)
> > + if(i == length(df))
> > + break }
> > + }
> > >
> > > tofac(abrdat)
> > [1] 7
> > [1] 8
> > [1] 9
> > [1] 11
> > [1] 13
> > [1] 15
> > Error in "[.data.frame"(df, , i) : undefined columns selected
> >
> > This are the correct columns and i get the idea put into the loop
> > a empty matrix with dimension like df and return it!?
> >
> > Another check?
> > abrdat2 <- apply(abrdat,2,function(x)
> > ifelse(is.character(x),as.factor(x),x))
> >
> >
> > many thanks & regards,
> > christian
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jun 02 05:04:45 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:32:20 EST