Re: [Rd] c.factor

From: Matthew Dowle <mdowle_at_concordiafunds.com>
Date: Wed 22 Nov 2006 - 16:29:00 GMT

I just noticed that a new feature in R 2.4 is that unlist of a list of factors
already does the operation that I proposed :

> x = factor(letters[1:5])
> y = factor(letters[4:8])
> unlist(list(x,y))
[1] a b c d e d e f g h
Levels: a b c d e f g h
>

Therefore, does it not make sense that c(x,y) should return the same as unlist(list(x,y)) ?

Also, the specific "if" for factors inside the definition of unlist, not

surprisingly, uses a very similar method to those previously posted. However, it first coerces the factors with as.character, before matching to
the new level set. This is inefficient. Here is the c.factor method again
that I proposed, which avoids the as.character and is therefore more efficient. Leaving aside the discussion about c.factor, or concat, or whatever, could 'unlist' be changed to use this method instead ? After all one of the key advantages of factors is to save main memory, anything
which coerces back to character is going to defeat the benefit.

> c.factor = function(...) {
args <- list(...)
if (!all(sapply(args, is.factor))) stop("all arguments must be factor") newlevels = unique(unlist(lapply(args,levels))) ans = unlist(lapply(args, function(x) {
m = match(levels(x), newlevels)
m[as.integer(x)]
}))
levels(ans) = newlevels
class(ans) = "factor"
ans
}
> identical(c(x,y), unlist(list(x,y)))

[1] TRUE
> version

_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 4.0
year 2006
month 10
day 03
svn rev 39566
language R
version.string R version 2.4.0 (2006-10-03)
>

"Brian Ripley" <ripley@stats.ox.ac.uk> wrote in message news:Pine.LNX.4.64.0611150926070.19618@auk.stats...
> On Tue, 14 Nov 2006, Bill Dunlap wrote:
>
>> On Tue, 14 Nov 2006, Prof Brian Ripley wrote:
>>
>>> Well, R has managed without a factor method for c() for most of its
>>> decade
>>> of existence (not that it originally had factors as we know them).
>>>
>>> I would argue that factors are best viewed as an enumeration type,
and
>>> anything which silently changes their level set is a bad idea. I can

>>> see
>>> a case for a c() method for factors that combines factors with the
same
>>> level sets, but I can also see this is best done by users who know
the
>>> level sets are same (c.factor would have to expend a considerable
effort
>>> to check).
>>>
>>> You also need to consider the dispatch rules. c.factor will be
called
>>> whenever the first argument is a factor, whatever the others are. S4
(I
>>> think, definitely S4-based versions of S-PLUS) has an alternative
>>> concat()
>>> that works differently (recursively) and seems a more natural model.
>>
>> In addition, c() has always had a double meaning of
>> (a) turning an object into a simple "vector" (an object
>> without "attributes"), as in
>> > c(factor(c("Cat","Dog","Cat")))
>> [1] 1 2 1
>> > c(data.frame(x=1:2,y=c("Dog","Cat")))
>> $x
>> [1] 1 2
>>
>> $y
>> [1] Dog Cat
>> Levels: Cat Dog
>
> To my surprise that was not documented at all on the R help page, and
I've
> clarified it. (BTW, at least in R it does not remove names, just all
> other attributes.)
>
>> (b) concatenating several such vectors into one.
>>
>> The proposed c.factor does only (b).
>
> (Strictly not, as a factor is not a vector.)
>
> But the help page explicitly only describes the default method, and
some
> of the other methods do preserve some attributes, AFAIR.
>
>> Should we just
>> throw c() into the ash heap and use as.vector() or
>> concat() instead?
>>
>> The whole concept of concatenating objects of disparate
>> types is suspect.
>
> I think working on a concat() for R would be helpful. I vaguely
recalled
> something like it in the Green Book, but the index does not help (but
then
> it is not very complete).
>
> Brian



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu Nov 23 04:07:39 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 22 Nov 2006 - 19:31:56 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.