Re: [R] Convert Contingency Table to Flat File

From: Marc Schwartz <MSchwartz_at_mn.rr.com>
Date: Tue 17 Oct 2006 - 19:04:11 GMT

On Tue, 2006-10-17 at 13:09 +0200, Philipp Pagel wrote:
> On Tue, Oct 17, 2006 at 03:08:49AM -0700, Marco LO wrote:
> > Is there any R function out there to turn a multi-way contingency
> > table back to a flat file table of individual rows and attribute
> > columns.?
>
> Are you looking for something like this?
>
> # generate some data
> x = sample(c(0,1), 100, replace=T)
> y = sample(c(0,1), 100, replace=T)
> z = sample(c(0,1), 100, replace=T)
> # contingency table
> mytab = table(x,y,z)
> # flat contingency table
> as.data.frame( mytab )

This thread reminds me of a discussion a while back, but which I cannot seem to find at the moment in the archives.

The steps elucidated by Philipp result in a flattened contingency table, which contains the various cross-classifying factors as unique rows and the addition of a frequency column indicating the number of occurrences of each unique row.

It does not however result in what might be considered the original "raw data frame' containing a single row per observation, if that is what one desires.

In other words, we get the following:

set.seed(1)

x <- sample(c(0, 1), 100, replace = TRUE)
y <- sample(c(0, 1), 100, replace = TRUE)
z <- sample(c(0, 1), 100, replace = TRUE)
 

# contingency table
mytab <- table(x, y, z)  

> mytab

, , z = 0

   y
x 0 1
  0 17 19
  1 11 15

, , z = 1

   y
x 0 1
  0 6 10
  1 12 10  

# flattened contingency table
FCT <- as.data.frame(mytab)  

> FCT

  x y z Freq
1 0 0 0 17
2 1 0 0 11
3 0 1 0 19
4 1 1 0 15
5 0 0 1 6
6 1 0 1 12
7 0 1 1 10
8 1 1 1 10

In order to take 'FCT' and convert it to 'raw data rows', we can do the following:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
  # Take each row in the source data frame table and replicate it   # using the Freq value
  DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],

               simplify = FALSE)

  # Take the above list and rbind it to create a single DF   # Also subset the result to eliminate the Freq column   DF <- subset(do.call("rbind", DF), select = -Freq)

  # Now apply type.convert to the character coerced factor columns   # to facilitate data type selection for each column   DF <- as.data.frame(lapply(DF,

                             function(x) 
                             type.convert(as.character(x),
                                          na.strings = na.strings,
                                          as.is = as.is,
                                          dec = dec)))

  # Return data frame
  DF
}

# Now use expand.dft() on the table from above new.DF <- expand.dft(FCT)

> str(new.DF)

'data.frame': 100 obs. of 3 variables:

 $ x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ z: int  0 0 0 0 0 0 0 0 0 0 ...


# Re-create the multi-way table
new.tab <- table(new.DF)

> new.tab

, , z = 0

   y
x 0 1
  0 17 19
  1 11 15

, , z = 1

   y
x 0 1
  0 6 10
  1 12 10

# Compare to initial mytab
> identical(new.tab, mytab)

[1] TRUE So, if one needs it, expand.dft() can be used to take a multi-way contingency table that has been coerced to a data frame and convert it back to the raw data frame.

I'm not sure if this functionality is available elsewhere, but thought that it might be helpful.

I included the use of type.convert() in order to make a reasonable attempt at restoring original data types, as the lack of this step results in all columns as factors.

I wonder if it might make sense to add an 'expand' argument to as.data.frame.table(), which would default to FALSE. It could be then set to TRUE and utilize expand.dft() to take the additional step and return the raw data frame as above.

Anyway, I hope that this might be helpful.

Regards,

Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 05:09:11 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 18 Oct 2006 - 03:30:11 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.