From: Marc Schwartz <MSchwartz_at_mn.rr.com>

Date: Wed 18 Oct 2006 - 02:50:54 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 12:54:32 2006

Date: Wed 18 Oct 2006 - 02:50:54 GMT

Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set, which is a multi-way table used in some help file examples such as ?as.data.frame.table.

Here is a corrected version:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{

DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],

simplify = FALSE)

DF <- subset(do.call("rbind", DF), select = -Freq)

for (i in 1:ncol(DF))

{

DF[[i]] <- type.convert(as.character(DF[[i]]), na.strings = na.strings, as.is = as.is, dec = dec)

}

DF

}

Thus if we now take the UCBAdmissions multi-way table data and convert it to a flat contingency table:

FCT <- as.data.frame(UCBAdmissions)

*> FCT
*

Admit Gender Dept Freq

1 Admitted Male A 512

2 Rejected Male A 313

3 Admitted Female A 89

4 Rejected Female A 19

5 Admitted Male B 353

6 Rejected Male B 207

7 Admitted Female B 17

8 Rejected Female B 8

9 Admitted Male C 120

10 Rejected Male C 205

11 Admitted Female C 202

12 Rejected Female C 391

13 Admitted Male D 138

14 Rejected Male D 279

15 Admitted Female D 131

16 Rejected Female D 244

17 Admitted Male E 53

18 Rejected Male E 138

19 Admitted Female E 94

20 Rejected Female E 299

21 Admitted Male F 22

22 Rejected Male F 351

23 Admitted Female F 24

24 Rejected Female F 317

Thus, there should be:

> sum(FCT$Freq)

[1] 4526

rows in the final 'raw' data frame.

> str(DF)

'data.frame': 4526 obs. of 3 variables:
$ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1
1 ...

$ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Dept : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1
1 ...

Note that the three columns are coerced back to factors, which is of course the default behavior for data frames.

If we now use:

> DF <- expand.dft(FCT, as.is = TRUE)

> str(DF)

'data.frame': 4526 obs. of 3 variables:

$ Admit : chr "Admitted" "Admitted" "Admitted" "Admitted" ... $ Gender: chr "Male" "Male" "Male" "Male" ... $ Dept : chr "A" "A" "A" "A" ...

The three columns stay as character vectors. It was this behavior that did not work properly in the first version.

**HTH,
**
Marc Schwartz

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 12:54:32 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 18 Oct 2006 - 03:30:11 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*