Re: [R] Refactor all factors in a data frame

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Tue, 05 Jun 2007 15:22:02 +0100 (BST)

On Tue, 5 Jun 2007, John Fox wrote:

> Dear Hilmar,
>
> You could use something like
>
> DF <- as.data.frame(lapply(DF, function (x) if (is.factor(x)) factor(x) else
> x))
>
> Where DF is the data frame.

I think DF[] <- lapply(DF, "[", drop=TRUE) is more likely to be what is wanted. That drops factor levels without reordering the remaining levels, and would appear to be harmless for other variables. But if one prefers

ind <- sapply(DF, is.factor)
DF[ind] <- lapply(DF[ind], "[", drop=TRUE)

Note the use of DF[] <- to preserve other attributes of DF, notably row names.

>
> I hope this helps,
> John
>
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>> -----Original Message-----
>> From: r-help-bounces_at_stat.math.ethz.ch
>> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Hilmar Berger
>> Sent: Tuesday, June 05, 2007 8:20 AM
>> To: r-help_at_stat.math.ethz.ch
>> Subject: [R] Refactor all factors in a data frame
>>
>> Hi all,
>>
>> Assume I have a data frame with numerical and factor
>> variables that I got through merging various other data
>> frames and subsetting the resulting data frame afterwards.
>> The number levels of the factors seem to be the same as in
>> the original data frames, probably because subset() calls
>> [.factor without drop = TRUE (that's what I gather from
>> scanning the mailing lists).
>>
>> I wonder if there is a easy way to refactor all factors in
>> the data frame at once. I noted that fix(data_frame) does the
>> trick, however, this needs user interaction, which I'd like
>> to avoid. Subsequent write.table / read.table would be
>> another option but I'm not sure if R can guess the
>> factor/char/numeric-type correctly when reading the table.
>>
>> So, is there any way in drop the unused factor levels from
>> *all* factors of a data frame without import/export ?

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 05 Jun 2007 - 14:28:43 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 05 Jun 2007 - 14:31:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.