Re: [R] Convert Contingency Table to Flat File

From: Marc Schwartz <MSchwartz_at_mn.rr.com>
Date: Wed 18 Oct 2006 - 02:50:54 GMT

Just a quick update on this thread.

The version of expand.dft() that I posted earlier has a bug in it.

This is the result of the use of lapply() and the evaluation of the additional arguments passed to type.convert().

I noted this when testing the function on the UCBAdmissions data set, which is a multi-way table used in some help file examples such as ?as.data.frame.table.

Here is a corrected version:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
  DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],

               simplify = FALSE)

  DF <- subset(do.call("rbind", DF), select = -Freq)

  for (i in 1:ncol(DF))
  {

    DF[[i]] <- type.convert(as.character(DF[[i]]),
                            na.strings = na.strings,
                            as.is = as.is, dec = dec)
                                           

  }     

  DF
}

Thus if we now take the UCBAdmissions multi-way table data and convert it to a flat contingency table:

FCT <- as.data.frame(UCBAdmissions)

> FCT

      Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
7 Admitted Female B 17
8 Rejected Female B 8
9 Admitted Male C 120
10 Rejected Male C 205
11 Admitted Female C 202
12 Rejected Female C 391
13 Admitted Male D 138
14 Rejected Male D 279
15 Admitted Female D 131
16 Rejected Female D 244
17 Admitted Male E 53
18 Rejected Male E 138
19 Admitted Female E 94
20 Rejected Female E 299
21 Admitted Male F 22
22 Rejected Male F 351
23 Admitted Female F 24
24 Rejected Female F 317

Thus, there should be:

> sum(FCT$Freq)

[1] 4526

rows in the final 'raw' data frame.

> DF <- expand.dft(FCT)

> str(DF)

'data.frame': 4526 obs. of 3 variables:  $ Admit : Factor w/ 2 levels "Admitted","Rejected": 1 1 1 1 1 1 1 1 1 1 ...
 $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...  $ Dept : Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

Note that the three columns are coerced back to factors, which is of course the default behavior for data frames.

If we now use:

> DF <- expand.dft(FCT, as.is = TRUE)

> str(DF)

'data.frame': 4526 obs. of 3 variables:

 $ Admit : chr  "Admitted" "Admitted" "Admitted" "Admitted" ...
 $ Gender: chr  "Male" "Male" "Male" "Male" ...
 $ Dept  : chr  "A" "A" "A" "A" ...


The three columns stay as character vectors. It was this behavior that did not work properly in the first version.

HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 12:54:32 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 18 Oct 2006 - 03:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.