# Re: [R] Factor function

From: peter dalgaard <pdalgd_at_gmail.com>
Date: Tue, 26 Apr 2011 19:59:22 +0200

On Apr 26, 2011, at 18:52 , Petr Savicky wrote:

```>> Hi
>>
>>
>> d<-data.frame(matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"),
>> ncol=3, byrow=TRUE))
>>
>> Change character value "NA" to missing value <NA>
>> d[d[,3]=="NA",3]<-NA
>>
>> If you want drop any unused levels of a factor just use
>>
>> factor(d[,3])
>>  xx   yy   <NA>
>> Levels: xx yy
```

>
> An explicit NA is a good idea. If the NA is introduced before
> creating the data frame, then also the data frame will not
> contain the unwanted level.
>
> a<-matrix(c("ww","ww","xx","yy","ww","yy","xx","yy","NA"),
> ncol=3, byrow=TRUE)
> a[a[,3]=="NA",3]<-NA
> d<-data.frame(a)
> d[,3]
>
>  xx yy <NA>
> Levels: xx yy
>
> If the replacement should be done in the whole matrix, then
>
> a[a=="NA"]<-NA
>
> may be used.
>
> Petr Savicky.

I think there's a buglet in here. According to the docs, "If exclude is used it should also be a factor with the same level set as x or a set of codes for the levels to be excluded". However, that plainly doesn't work:

> cc <- c("x","y","NA")
> ff <- factor(cc)
> factor(ff,exclude=1)

 x y NA
Levels: NA x y
> factor(ff,exclude=ff)

 x y NA
Levels: NA x y
> factor(ff,exclude=ff)

 x y NA
Levels: NA x y

In these cases, the internal logic converts exclude to integer, and then uses match(levels, exclude) where levels is unique(x), i.e., a factor. This won't work because match() matches on the _character_ representation of x.

The cleanest version that I can think of for the original problem is

> factor(ff, levels=setdiff(levels(ff), "NA"))
 x y <NA>
Levels: x y

```--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes_at_cbs.dk  Priv: PDalgd_at_gmail.com

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help