Date: Wed 18 Oct 2006 - 02:50:54 GMT

Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set, which is a multi-way table used in some help file examples such as ?as.data.frame.table.

Here is a corrected version:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{

DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],

simplify = FALSE)

DF <- subset(do.call("rbind", DF), select = -Freq)

for (i in 1:ncol(DF))

{

DF[[i]] <- type.convert(as.character(DF[[i]]), na.strings = na.strings, as.is = as.is, dec = dec)

}

DF

}

Thus if we now take the UCBAdmissions multi-way table data and convert it to a flat contingency table:

FCT <- as.data.frame(UCBAdmissions)

Admit Gender Dept Freq

1 Admitted Male A 512

2 Rejected Male A 313

3 Admitted Female A 89

4 Rejected Female A 19

5 Admitted Male B 353

6 Rejected Male B 207

7 Admitted Female B 17

8 Rejected Female B 8

9 Admitted Male C 120

10 Rejected Male C 205

11 Admitted Female C 202

12 Rejected Female C 391

13 Admitted Male D 138

14 Rejected Male D 279

15 Admitted Female D 131

16 Rejected Female D 244

17 Admitted Male E 53

18 Rejected Male E 138

19 Admitted Female E 94

20 Rejected Female E 299

21 Admitted Male F 22

22 Rejected Male F 351

23 Admitted Female F 24

24 Rejected Female F 317

Thus, there should be:

> sum(FCT$Freq)

[1] 4526

rows in the final 'raw' data frame.

> str(DF)

'data.frame': 4526 obs. of 3 variables:
$ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...

$ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Dept : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...

Note that the three columns are coerced back to factors, which is of course the default behavior for data frames.

If we now use:

> DF <- expand.dft(FCT, as.is = TRUE)

> str(DF)

'data.frame': 4526 obs. of 3 variables:

$ Admit : chr "Admitted" "Admitted" "Admitted" "Admitted" ... $ Gender: chr "Male" "Male" "Male" "Male" ... $ Dept : chr "A" "A" "A" "A" ...

The three columns stay as character vectors. It was this behavior that did not work properly in the first version.

Marc Schwartz

