[R] Summarize data for MCA (FactoMineR)

From: Nelson Castillo <nelsoneci_at_gmail.com>
Date: Fri, 25 Apr 2008 18:55:27 -0500


Hi :-)

I'm new to R and I started using it for a project (I'm the CS guy in a group of statisticians helping them find out how to solve issues as they come out). This is my first post to the list and I am starting to learn R.

Well, they were used to doing MCA analysis in other programs where the data seems to be preprocessed automatically before running MCA.

So, they need to process a data set that comes with N=1000000 of elements, but there are really about N/100 distinct elements over all the variables, so the MCA can be run in reasonable time summarizing data.

So, the question is:

How can I turn x from:

x <-
structure(list(weight = c(1, 1, 2, 1, 2), var1 = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("A", "C"), class = "factor"), var2 = structure(c(1L,
1L, 1L, 1L, 2L), .Label = c("B", "D"), class = "factor")), .Names = c("weight", "var1", "var2"), row.names = c(NA, 5L), class = "data.frame")

to:

y <-
structure(list(weihgt = c(5L, 2L), var1 = structure(1:2, .Label = c("A", "C"), class = "factor"), var2 = structure(1:2, .Label = c("B", "D"), class = "factor")), .Names = c("weihgt", "var1", "var2" ), class = "data.frame", row.names = c(NA, -2L))

using R?

That is, from:

> x

  weight var1 var2

1      1    A    B
2      1    A    B
3      2    A    B
4      1    A    B
5      2    C    D

to:

> y

  weihgt var1 var2

1      5    A    B
2      2    C    D


The idea is that there is one occurrence of "A B" repeated 4 times in the original table,
and it is summarized in the second table, computing the sum of the weights.

I solved the problem using Perl, but I'd like to know what I have to read in order to
do it in R.

Regards,
Nelson.-

-- 
http://arhuaco.org

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 26 Apr 2008 - 00:01:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 27 Apr 2008 - 16:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive