Just a quick update on this thread.
The version of expand.dft() that I posted earlier has a bug in it.
This is the result of the use of lapply() and the evaluation of the additional arguments passed to type.convert().
I noted this when testing the function on the UCBAdmissions data set, which is a multi-way table used in some help file examples such as ?as.data.frame.table.
Here is a corrected version:
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{
DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],
simplify = FALSE)
DF <- subset(do.call("rbind", DF), select = -Freq)
for (i in 1:ncol(DF))
{
DF[[i]] <- type.convert(as.character(DF[[i]]), na.strings = na.strings, as.is = as.is, dec = dec)
}
DF
}
Thus if we now take the UCBAdmissions multi-way table data and convert it to a flat contingency table:
FCT <- as.data.frame(UCBAdmissions)
> FCT
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
7 Admitted Female B 17
8 Rejected Female B 8
9 Admitted Male C 120
10 Rejected Male C 205
11 Admitted Female C 202
12 Rejected Female C 391
13 Admitted Male D 138
14 Rejected Male D 279
15 Admitted Female D 131
16 Rejected Female D 244
17 Admitted Male E 53
18 Rejected Male E 138
19 Admitted Female E 94
20 Rejected Female E 299
21 Admitted Male F 22
22 Rejected Male F 351
23 Admitted Female F 24
24 Rejected Female F 317
Thus, there should be:
> sum(FCT$Freq)
[1] 4526
rows in the final 'raw' data frame.
> str(DF)
'data.frame': 4526 obs. of 3 variables:
$ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Dept : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...
Note that the three columns are coerced back to factors, which is of course the default behavior for data frames.
If we now use:
> DF <- expand.dft(FCT, as.is = TRUE)
> str(DF)
'data.frame': 4526 obs. of 3 variables:
$ Admit : chr "Admitted" "Admitted" "Admitted" "Admitted" ... $ Gender: chr "Male" "Male" "Male" "Male" ... $ Dept : chr "A" "A" "A" "A" ...
The three columns stay as character vectors. It was this behavior that did not work properly in the first version.
HTH, Marc Schwartz
Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.
Archive generated by hypermail 2.1.8, at Wed 18 Oct 2006 - 03:30:11 GMT.
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.