Re: [R] the "union" of several data frame rows

From: Scot W. McNary <smcnary_at_charm.net>
Date: Fri, 08 Feb 2008 17:23:06 -0500

Hi,

Thanks to Henrique Dallazuanna, Erik Iverson, Mark Leeds, and J. Scott Olson for pointing me down the path of joy. I finally figured out a solution to the problem:

Given the following list of partially overlapping test keys, a data frame called keys1:

   ID X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15
A KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
B KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
C KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
D KEY D C D A B D D D A D D D A C C
E KEY D C D A B D D D A D D D A C C
F KEY D C D <NA> B D <NA> <NA> <NA> D <NA> <NA> <NA> <NA> <NA>
G KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
H KEY D C D A B D D D A D D D A C C
I KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
J KEY D C D A B <NA> <NA> <NA> <NA> <NA> D D A C C
K KEY D C <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
L KEY D C D <NA> B D <NA> <NA> <NA> D <NA> <NA> <NA> <NA> <NA>
M KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>
N KEY D <NA> D A <NA> D D D A <NA> <NA> <NA> <NA> <NA> <NA>

The goal was to wind up with a common test key:

Common Key D C D A B D D D A D D D A C C

What worked was the following:

ck <- for (i in 1:dim(keys1)[1]) {keys1[1, is.na(keys1[1,])] <- keys1[i+1, is.na(keys1[1,])]}

I neglected to mention in my first example that there were <NA> observations, which may have affected the kinds of solutions that were suggested. Chalk up another testimonial in favor providing a small workable examples when asking for help.

Thanks very much,

Scot

Henrique Dallazuanna wrote:
> Perhaps:
>
> data <- data.frame(key, row.names=1)
> names(data) <- paste("q", 1:6, sep="")
> apply(data, 2, function(x)unique(x)[unique(x) != " "])
>
>
> On 01/02/2008, Scot W. McNary <smcnary_at_charm.net> wrote:
>
>> Hi,
>>
>> I have a question about how to obtain the union of several data frame
>> rows. I'm trying to create a common key for several tests composed of
>> different items. Here is a small scale version of the problem. These
>> are keys for 4 different tests, not all mutually exclusive:
>>
>> id q1 q2 q3 q4 q5 q6
>> 1 A C
>> 2 B D
>> 3 A D B
>> 4 C D B D
>>
>> I would like to create a single key all test versions, the "union" of
>> the above:
>>
>> id q1 q2 q3 q4 q5 q6
>> key A C D B B D
>>
>>
>> Here is what I have (unsuccessfully) tried so far:
>>
>> > key <-
>> + matrix(c("1", "A", "C", " ", " ", " ", " ",
>> + "2", " ", " ", " ", " ", "B", "D",
>> + "3", "A", " ", "D", "B", " ", " ",
>> + "4", " ", "C", "D", " ", "B", "D"),
>> + byrow=TRUE, ncol = 7)
>> >
>> > k1 <- key[1, 2:7]
>> > k2 <- key[2, 2:7]
>> > k3 <- key[3, 2:7]
>> > k4 <- key[4, 2:7]
>> >
>> > itemid <- c("q1", "q2", "q3", "q4", "q5", "q6")
>> >
>> > k1 <- cbind(itemid, k1)
>> > k2 <- cbind(itemid, k2)
>> > k3 <- cbind(itemid, k3)
>> > k4 <- cbind(itemid, k4)
>> >
>> > tmp <- merge(k1, k2, by = "itemid")
>> > tmp <- merge(tmp, k3, by = "itemid")
>> > tmp <- merge(tmp, k4, by = "itemid")
>> >
>> > t(tmp)
>> [,1] [,2] [,3] [,4] [,5] [,6]
>> itemid "q1" "q2" "q3" "q4" "q5" "q6"
>> k1 "A" "C" " " " " " " " "
>> k2 " " " " " " " " "B" "D"
>> k3 "A" " " "D" "B" " " " "
>> k4 " " "C" "D" " " "B" "D"
>>
>> The actual problem involves 300 or so items instead of 6 and 10
>> different keys instead of four. Any suggestions welcome.
>>
>> Thanks in advance,
>>
>> Scot McNary
>>
>> > version
>> _
>> platform i386-pc-mingw32
>> arch i386
>> os mingw32
>> system i386, mingw32
>> status
>> major 2
>> minor 6.1
>> year 2007
>> month 11
>> day 26
>> svn rev 43537
>> language R
>> version.string R version 2.6.1 (2007-11-26)
>>
>>
>> --
>> Scot McNary
>> smcnary at charm dot net
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>

-- 
Scot McNary
smcnary at charm dot net

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 08 Feb 2008 - 22:28:42 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 08 Feb 2008 - 22:30:13 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive