Re: [R] Removing and restoring factor levels (TYPO CORRECTED)

From: Marc Schwartz (via MN) <mschwartz_at_mn.rr.com>
Date: Fri 14 Oct 2005 - 04:45:21 EST

On Thu, 2005-10-13 at 14:31 -0400, Duncan Murdoch wrote:
> On 10/13/2005 1:07 PM, Marc Schwartz (via MN) wrote:
> > On Thu, 2005-10-13 at 10:02 -0400, Duncan Murdoch wrote:
> >> Sorry, a typo in my previous message (parens in the wrong place in the
> >> conversion).
> >>
> >> Here it is corrected:
> >>
> >> I'm doing a big slow computation, and profiling shows that it is
> >> spending a lot of time in match(), apparently because I have code like
> >>
> >> x %in% listofxvals
> >>
> >> Both x and listofxvals are factors with the same levels, so I could
> >> probably speed this up by stripping off the levels and just treating
> >> them as integer vectors, then restoring the levels at the end.
> >>
> >> What is the safest way to do this? I am worried that at some point x
> >> and listofxvals will *not* have the same levels, and the optimization
> >> will give the wrong answer. So I need code that guarantees they have
> >> the same coding.
> >>
> >> I think this works, where "master" is a factor with the master list of
> >> levels (guaranteed to be a superset of the levels of x and listofxvals),
> >> but can anyone spot anything that might go wrong?
> >>
> >> # Strip the levels
> >> x <- as.integer( factor(x, levels = levels(master) ) )
> >>
> >> # Restore the levels
> >> x <- structure( x, levels = levels(master), class = "factor" )
> >>
> >> Thanks for any advice...
> >>
> >> Duncan Murdoch
> >
> > Duncan,
> >
> > With the predicate that 'master' has the full superset of all possible
> > factor levels defined, it would seem that this would be a reasonable way
> > to go.
> >
> > This approach would also seem to eliminate whatever overhead is
> > encountered as a result of the coercion of 'x' as a factor to a
> > character vector, which is done by match().
> >
> > One question I have is, what is the advantage of using structure()
> > versus:
> >
> > x <- factor(x, levels = levels(master))
> >
> > ?
>
> That one doesn't work. What "factor(x, levels=levels(master))" says is
> to convert x to a factor, coding the values in it according the levels
> in master. But at this point x has values which are integers, so they
> won't match the levels of master, which are probably character strings.
>
> For example:
>
> > master <- factor(letters)
> > print(x <- factor(letters[1:3]))
> [1] a b c
> Levels: a b c
> > print(x <- as.integer( factor(x, levels = levels(master) ) ) )
> [1] 1 2 3
> > print(x <- factor(x, levels = levels(master)))
> [1] <NA> <NA> <NA>
> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>
> I get NA's at the end because the values 1,2,3 aren't in the vector of
> factor levels (which are the lowercase letters).

As opposed to:

> print(x <- structure(x, levels = levels(master), class = "factor" ))
[1] a b c
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

OK. Makes sense. Thanks for the clarification.

Marc



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 14 04:50:08 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:50:52 EST