[Rd] unlist on nested lists of factors (PR#12572)

From: <davison_at_stats.ox.ac.uk>
Date: Wed, 20 Aug 2008 15:25:10 +0200 (CEST)


Here is a description and a proposed solution for a bug in unlist().

I've used version 2.7.2 RC (2008-08-18 r46382) to look at this, under linux.

unlist(recursive=TRUE) incorrectly returns a factor with zero levels when passed either a nested list of factors, or a data frame containing only factor columns. You can't print() the result.

x <- list(list(v=factor("a")))
str(unlist(x))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(x)) : 'object' does not have valid levels()
y <- list(data.frame(v=factor("a")))
str(unlist(y))
## Factor w/ 0 levels: NA
## - attr(*, "names")= chr "v"
## Warning message:
## In str.default(unlist(y)) : 'object' does not have valid levels()

unlist is defined as

unlist <- function(x, recursive=TRUE, use.names=TRUE) {

    if(.Internal(islistfactor(x, recursive))) {

        lv <- unique(.Internal(unlist(lapply(x, levels), recursive, FALSE)))
        nm <- if(use.names) names(.Internal(unlist(x, recursive, use.names)))
        res <- .Internal(unlist(lapply(x, as.character), recursive, FALSE))
        res <- match(res, lv)
        ## we cannot make this ordered as level set may have been changed
        structure(res, levels=lv, names=nm, class="factor")
    } else .Internal(unlist(x, recursive, use.names)) }

The error occurs because, in both cases, at the C level, islistfactor recurses and finds that all elements are factors, and the if test condition is TRUE. However, the two instances of lapply do not recurse, and return inappropriate results. A possible solution is to replace both instances of lapply with rapply. This results in appropriate factor answers in this case:

str(unlist(x))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"

str(unlist(y))
## Factor w/ 1 level "a": 1
## - attr(*, "names")= chr "v"

An alternative is to not return a factor result, by altering the if test condition so that nested lists of factors, and lists of factor-only data frames, fail.

Dan

-- 
www.stats.ox.ac.uk/~davison

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Wed 20 Aug 2008 - 13:26:59 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 20 Aug 2008 - 15:37:09 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive