Re: [Rd] Dropping unused levels of a factor that has "NA" as a level

From: Brahm, David <David.Brahm_at_geodecapital.com>
Date: Tue 11 Jul 2006 - 22:19:40 GMT


I mentioned this in R-help on April 28:
<
https://stat.ethz.ch/pipermail/r-help/2006-April/104595.html>

| as.character.factor contains this line (where cx=levels(x)[x]):
| if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>"
|
| Is it possible that this is no longer the desired behavior? These
| two results don't seem very consistent:
|
| > as.character(as.factor(c("AB", "CD", NA)))
| [1] "AB" "CD" NA
| > is.na(.Last.value)[3]
| [1] TRUE
|
| > as.character(as.factor(c("NA", "CD", NA)))
| [1] "NA" "CD" "<NA>"
| > is.na(.Last.value)[3]
| [1] FALSE
|
| I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior
| is new (maybe since character NA's were introduced?).
|
| -- David Brahm (brahm@alum.mit.edu)

-----Original Message-----
From: r-devel-bounces@r-project.org [mailto:r-devel-bounces@r-project.org] On Behalf Of Peter Dalgaard Sent: Tuesday, July 11, 2006 5:59 PM
To: J. Hosking
Cc: r-devel@stat.math.ethz.ch
Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level

"J. Hosking" <jh910@juno.com> writes:

> Is this a bug?
> 
>    > f1 <- factor(c("a", NA), levels = c("a", "NA") )
>    > f2 <- f1[, drop = TRUE]
>    > f2
>    [1] a    <NA>
>    Levels: a <NA>
> 
> I would have expected f2 to have only one level, "a".  It seems
> to me that the code in [.factor does not follow the advice in
> help("factor") on how to set factor codes to be missing when
> "NA" is a level of the factor.


Something odd is going on, that's for sure...

The problem is also there with factor(f1). And the logic in as.character.factor seems to be at the root of it:

> as.character.factor
function (x, ...)
{

    cx <- levels(x)[x]
    if ("NA" %in% levels(x))

        cx[is.na(x)] <- "<NA>"
    cx
}  

This looks like something from before we had character NA values. I wonder if it is a mistake or there could actually be a reason to keep it.

-- 
   O__  ---- Peter Dalgaard             ěster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jul 12 08:23:13 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 19 Jul 2006 - 12:28:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.