Re: [Rd] Dropping unused levels of a factor that has "NA" as a level

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Wed 19 Jul 2006 - 10:17:13 GMT

It is history:

r16144 | ripley | 2001-09-28 19:40:28 +0100 (Fri, 28 Sep 2001) | 2 lines

add is.na<-, distinguish NA level and NA codes in factors

so predates having NA character strings distinct from "NA".

On Tue, 11 Jul 2006, Brahm, David wrote:

> I mentioned this in R-help on April 28:
> <https://stat.ethz.ch/pipermail/r-help/2006-April/104595.html>
>
> | as.character.factor contains this line (where cx=levels(x)[x]):
> | if ("NA" %in% levels(x)) cx[is.na(x)] <- "<NA>"
> |
> | Is it possible that this is no longer the desired behavior? These
> | two results don't seem very consistent:
> |
> | > as.character(as.factor(c("AB", "CD", NA)))
> | [1] "AB" "CD" NA
> | > is.na(.Last.value)[3]
> | [1] TRUE
> |
> | > as.character(as.factor(c("NA", "CD", NA)))
> | [1] "NA" "CD" "<NA>"
> | > is.na(.Last.value)[3]
> | [1] FALSE
> |
> | I'm using R-2.3.0 on Redhat Linux, but I don't think the behavior
> | is new (maybe since character NA's were introduced?).
> |
> | -- David Brahm (brahm@alum.mit.edu)
>
>
> -----Original Message-----
> From: r-devel-bounces@r-project.org [mailto:r-devel-bounces@r-project.org] On Behalf Of Peter Dalgaard
> Sent: Tuesday, July 11, 2006 5:59 PM
> To: J. Hosking
> Cc: r-devel@stat.math.ethz.ch
> Subject: Re: [Rd] Dropping unused levels of a factor that has "NA" as a level
>
> "J. Hosking" <jh910@juno.com> writes:
>
> > Is this a bug?
> >
> > > f1 <- factor(c("a", NA), levels = c("a", "NA") )
> > > f2 <- f1[, drop = TRUE]
> > > f2
> > [1] a <NA>
> > Levels: a <NA>
> >
> > I would have expected f2 to have only one level, "a". It seems
> > to me that the code in [.factor does not follow the advice in
> > help("factor") on how to set factor codes to be missing when
> > "NA" is a level of the factor.
>
>
> Something odd is going on, that's for sure...
>
> The problem is also there with factor(f1). And the logic in
> as.character.factor seems to be at the root of it:
>
> > as.character.factor
> function (x, ...)
> {
> cx <- levels(x)[x]
> if ("NA" %in% levels(x))
> cx[is.na(x)] <- "<NA>"
> cx
> }
>
> This looks like something from before we had character NA values. I
> wonder if it is a mistake or there could actually be a reason to
> keep it.
>
>

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed Jul 19 20:31:51 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 19 Jul 2006 - 12:28:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.