Re: [Rd] Dropping unused levels of a factor that has "NA" as a level

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 11 Jul 2006 - 21:58:51 GMT

"J. Hosking" <jh910@juno.com> writes:

> Is this a bug?
>
> > f1 <- factor(c("a", NA), levels = c("a", "NA") )
> > f2 <- f1[, drop = TRUE]
> > f2
> [1] a <NA>
> Levels: a <NA>
>
> I would have expected f2 to have only one level, "a". It seems
> to me that the code in [.factor does not follow the advice in
> help("factor") on how to set factor codes to be missing when
> "NA" is a level of the factor.

Something odd is going on, that's for sure...

The problem is also there with factor(f1). And the logic in as.character.factor seems to be at the root of it:

> as.character.factor

function (x, ...)
{

    cx <- levels(x)[x]
    if ("NA" %in% levels(x))

        cx[is.na(x)] <- "<NA>"
    cx
}  

This looks like something from before we had character NA values. I wonder if it is a mistake or there could actually be a reason to keep it.

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jul 12 08:02:05 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 11 Jul 2006 - 22:28:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.