From: David Winsemius <dwinsemius_at_comcast.net>

Date: Sun, 27 Apr 2008 15:10:19 +0000 (UTC)

#1 A B 5

#2 C D 2

"Nelson Castillo" <nelsoneci_at_gmail.com> wrote in news:2accc2ff0804251655o32686b99j73cf7df37243d08f_at_mail.gmail.com:

> Hi :-)

**> I'm new to R and I started using it for a project (I'm the CS guy in
**> a group of statisticians helping them find out how to solve issues
**> as they come out). This is my first post to the list and I am
**> starting to learn R.
**> Well, they were used to doing MCA analysis in other programs where
**> the data seems to be preprocessed automatically before running MCA.
**> So, they need to process a data set that comes with N=1000000 of
**> elements, but there are really about N/100 distinct elements over
**> all the variables, so the MCA can be run in reasonable time
**> summarizing data.
**> So, the question is:
**> How can I turn x from:
**> x <-
**> structure(list(weight = c(1, 1, 2, 1, 2), var1 = structure(c(1L,
**> 1L, 1L, 1L, 2L), .Label = c("A", "C"), class = "factor"), var2 =
**> structure(c(1L,
**> 1L, 1L, 1L, 2L), .Label = c("B", "D"), class = "factor")), .Names =
**> c("weight", "var1", "var2"), row.names = c(NA, 5L), class =
**> "data.frame")
**> to:
**>
**> y <-
**> structure(list(weihgt = c(5L, 2L), var1 = structure(1:2, .Label =
**> c("A", "C"), class = "factor"), var2 = structure(1:2, .Label =
**> c("B", "D"), class = "factor")), .Names = c("weihgt", "var1", "var2"
**> ), class = "data.frame", row.names = c(NA, -2L))
**> using R?
**> That is, from:
**>
>> x

> weight var1 var2

> 1 1 A B> 2 1 A B> 3 2 A B> 4 1 A B> 5 2 C D

>> y

> weihgt var1 var2

> 1 5 A B> 2 2 C D

aggregate(weight, by=list(var1=var1,var2=var2), sum) ) #> s.wt # var1 var2 x

#then fix names

names(s.wt)[3] <- "weight"

#> s.wt # var1 var2 weight #1 A B 5 #2 C D 2

I believe that the reshape or reShape packages could do this in one step.

-- David Winsemius

> The idea is that there is one occurrence of "A B" repeated 4 times

The idea is that there is one occurrence of "A B" repeated 4 times in the original table, and it is summarized in the second table, computing the sum of the weights.

I solved the problem using Perl, but I'd like to know what I have to read in order to do it in R.

Regards,
Nelson.

