Re: [R] Summarize data for MCA (FactoMineR)

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Sun, 27 Apr 2008 15:10:19 +0000 (UTC)

"Nelson Castillo" <nelsoneci_at_gmail.com> wrote in news:2accc2ff0804251655o32686b99j73cf7df37243d08f_at_mail.gmail.com:

> Hi :-)
>
> I'm new to R and I started using it for a project (I'm the CS guy in
> a group of statisticians helping them find out how to solve issues
> as they come out). This is my first post to the list and I am
> starting to learn R.
>
> Well, they were used to doing MCA analysis in other programs where
> the data seems to be preprocessed automatically before running MCA.
>
> So, they need to process a data set that comes with N=1000000 of
> elements, but there are really about N/100 distinct elements over
> all the variables, so the MCA can be run in reasonable time
> summarizing data.
>
> So, the question is:
>
> How can I turn x from:
>
> x <-
> structure(list(weight = c(1, 1, 2, 1, 2), var1 = structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("A", "C"), class = "factor"), var2 =
> structure(c(1L,
> 1L, 1L, 1L, 2L), .Label = c("B", "D"), class = "factor")), .Names =
> c("weight", "var1", "var2"), row.names = c(NA, 5L), class =
> "data.frame")
>
> to:
>
> y <-
> structure(list(weihgt = c(5L, 2L), var1 = structure(1:2, .Label =
> c("A", "C"), class = "factor"), var2 = structure(1:2, .Label =
> c("B", "D"), class = "factor")), .Names = c("weihgt", "var1", "var2"
> ), class = "data.frame", row.names = c(NA, -2L))
>
> using R?
>
> That is, from:
>

>> x

> weight var1 var2
> 1 1 A B
> 2 1 A B
> 3 2 A B
> 4 1 A B
> 5 2 C D
>
> to:
>
>> y

> weihgt var1 var2
> 1 5 A B
> 2 2 C D
>

Does this suffice?

s.wt <- with(x,

          aggregate(weight, by=list(var1=var1,var2=var2), sum)
             )
#> s.wt
#  var1 var2 x

#1 A B 5

#2 C D 2

#then fix names
names(s.wt)[3] <- "weight"

#> s.wt
#  var1 var2 weight
#1    A    B      5
#2    C    D      2

I believe that the reshape or reShape packages could do this in one step.

-- 
David Winsemius



>
> The idea is that there is one occurrence of "A B" repeated 4 times
> in the original table,
> and it is summarized in the second table, computing the sum of the
> weights.
>
> I solved the problem using Perl, but I'd like to know what I have to
> read in order to
> do it in R.
>
> Regards,
> Nelson.-
> ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Sun 27 Apr 2008 - 16:10:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 03 May 2008 - 01:30:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive