[R] Agreggating data using external aggregation rules

From: <ANGELO.LINARDI_at_bancaditalia.it>
Date: Fri, 06 Jun 2008 12:12:36 +0200


Dear R experts,

I am currently facing a tricky problem which I have read a lot about in the various R mailing lists without finding exactly what I need. I have a big data frame DF (about 2,000,000 rows) with 7 columns being variables and 1 being a measure (using reshape package nomeclature). There are no "duplicates" in it.
Fot each of the variables I have some "rules" to apply, being COD_IN the value of the variable in the DF, COD_OUT the one to be transformed to; once obtained the "new codes" in the DF I have to aggregate the "new DF" (for example summing the measure).
Usually the total transformation (merge+aggregate) really decreases the number of lines in the data frame, but sometimes it can grows depending on the rule. Just to give an idea, the first "rule" in v1 maps 820 different values into 7 ones.
Using SQL and a database this can be done in a very straightforward way (for example on the variable v1):

Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
>From DF, RULE_v1

Where v1=COD_IN
Group by v2, v3,v4, v5, v6, v7

So the first choice would be using a database; the second one would be splitting the data frame and then joining the results. Is there any other possibility to merge+aggregate caused by the merge ?

Thank you in advance

Angelo Linardi


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 12:57:50 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 22:30:47 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive