From: Marc Schwartz <MSchwartz_at_mn.rr.com>

Date: Tue 17 Oct 2006 - 19:04:11 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 05:09:11 2006

Date: Tue 17 Oct 2006 - 19:04:11 GMT

On Tue, 2006-10-17 at 13:09 +0200, Philipp Pagel wrote:

> On Tue, Oct 17, 2006 at 03:08:49AM -0700, Marco LO wrote:

*> > Is there any R function out there to turn a multi-way contingency
**> > table back to a flat file table of individual rows and attribute
**> > columns.?
**>
**> Are you looking for something like this?
**>
**> # generate some data
**> x = sample(c(0,1), 100, replace=T)
**> y = sample(c(0,1), 100, replace=T)
**> z = sample(c(0,1), 100, replace=T)
**> # contingency table
**> mytab = table(x,y,z)
**> # flat contingency table
**> as.data.frame( mytab )
*

This thread reminds me of a discussion a while back, but which I cannot seem to find at the moment in the archives.

The steps elucidated by Philipp result in a flattened contingency table, which contains the various cross-classifying factors as unique rows and the addition of a frequency column indicating the number of occurrences of each unique row.

It does not however result in what might be considered the original "raw data frame' containing a single row per observation, if that is what one desires.

In other words, we get the following:

set.seed(1)

x <- sample(c(0, 1), 100, replace = TRUE) y <- sample(c(0, 1), 100, replace = TRUE) z <- sample(c(0, 1), 100, replace = TRUE)

# contingency table

mytab <- table(x, y, z)

> mytab

, , z = 0

y

x 0 1

0 17 19

1 11 15

, , z = 1

y

x 0 1

0 6 10

1 12 10

# flattened contingency table

FCT <- as.data.frame(mytab)

*> FCT
*

x y z Freq

1 0 0 0 17

2 1 0 0 11

3 0 1 0 19

4 1 1 0 15

5 0 0 1 6

6 1 0 1 12

7 0 1 1 10

8 1 1 1 10

In order to take 'FCT' and convert it to 'raw data rows', we can do the following:

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".")
{

# Take each row in the source data frame table and replicate it
# using the Freq value

DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ],

simplify = FALSE)

# Take the above list and rbind it to create a single DF # Also subset the result to eliminate the Freq column DF <- subset(do.call("rbind", DF), select = -Freq)

# Now apply type.convert to the character coerced factor columns # to facilitate data type selection for each column DF <- as.data.frame(lapply(DF,

function(x) type.convert(as.character(x), na.strings = na.strings, as.is = as.is, dec = dec)))

# Return data frame

DF

}

# Now use expand.dft() on the table from above new.DF <- expand.dft(FCT)

> str(new.DF)

'data.frame': 100 obs. of 3 variables:

$ x: int 0 0 0 0 0 0 0 0 0 0 ... $ y: int 0 0 0 0 0 0 0 0 0 0 ... $ z: int 0 0 0 0 0 0 0 0 0 0 ...

# Re-create the multi-way table

new.tab <- table(new.DF)

> new.tab

, , z = 0

y

x 0 1

0 17 19

1 11 15

, , z = 1

y

x 0 1

0 6 10

1 12 10

# Compare to initial mytab

> identical(new.tab, mytab)

**[1] TRUE
**
So, if one needs it, expand.dft() can be used to take a multi-way
contingency table that has been coerced to a data frame and convert it
back to the raw data frame.

I'm not sure if this functionality is available elsewhere, but thought that it might be helpful.

I included the use of type.convert() in order to make a reasonable attempt at restoring original data types, as the lack of this step results in all columns as factors.

I wonder if it might make sense to add an 'expand' argument to as.data.frame.table(), which would default to FALSE. It could be then set to TRUE and utilize expand.dft() to take the additional step and return the raw data frame as above.

Anyway, I hope that this might be helpful.

Regards,

Marc Schwartz

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Oct 18 05:09:11 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 18 Oct 2006 - 03:30:11 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*