Re: [R] Factor function

From: peter dalgaard <>
Date: Tue, 26 Apr 2011 19:59:22 +0200

On Apr 26, 2011, at 18:52 , Petr Savicky wrote:

> On Tue, Apr 26, 2011 at 10:51:33AM +0200, Petr PIKAL wrote:

>> Hi
>> d<-data.frame(matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"), 
>> ncol=3, byrow=TRUE))
>> Change character value "NA" to missing value <NA>
>> d[d[,3]=="NA",3]<-NA
>> If you want drop any unused levels of a factor just use
>> factor(d[,3])
>> [1] xx   yy   <NA>
>> Levels: xx yy

> An explicit NA is a good idea. If the NA is introduced before
> creating the data frame, then also the data frame will not
> contain the unwanted level.
> a<-matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"),
> ncol=3, byrow=TRUE)
> a[a[,3]=="NA",3]<-NA
> d<-data.frame(a)
> d[,3]
> [1] xx yy <NA>
> Levels: xx yy
> If the replacement should be done in the whole matrix, then
> a[a=="NA"]<-NA
> may be used.
> Petr Savicky.

I think there's a buglet in here. According to the docs, "If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded". However, that plainly doesn't work:

> cc <- c("x","y","NA")
> ff <- factor(cc)
> factor(ff,exclude=1)

[1] x y NA
Levels: NA x y
> factor(ff,exclude=ff[3])

[1] x y NA
Levels: NA x y
> factor(ff,exclude=ff[2])

[1] x y NA
Levels: NA x y

In these cases, the internal logic converts exclude to integer, and then uses match(levels, exclude) where levels is unique(x), i.e., a factor. This won't work because match() matches on the _character_ representation of x.

The cleanest version that I can think of for the original problem is

> factor(ff, levels=setdiff(levels(ff), "NA"))
[1] x y <NA>
Levels: x y     

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email:  Priv:

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 26 Apr 2011 - 18:01:53 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 26 Apr 2011 - 18:10:34 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive