Re: [R] Agreggating data using external aggregation rules

From: Gabor Grothendieck <ggrothendieck_at_gmail.com>
Date: Fri, 06 Jun 2008 09:10:22 -0400

Use aggregate() for aggregation and use indexing or subset() for selection. Alternately try the sqldf package: http://sqldf.googlecode.com which allows one to perform SQL operations on data frames.

On Fri, Jun 6, 2008 at 6:12 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
> Dear R experts,
>
> I am currently facing a tricky problem which I have read a lot about in
> the various R mailing lists without finding exactly what I need.
> I have a big data frame DF (about 2,000,000 rows) with 7 columns being
> variables and 1 being a measure (using reshape package nomeclature).
> There are no "duplicates" in it.
> Fot each of the variables I have some "rules" to apply, being COD_IN the
> value of the variable in the DF, COD_OUT the one to be transformed to;
> once obtained the "new codes" in the DF I have to aggregate the "new DF"
> (for example summing the measure).
> Usually the total transformation (merge+aggregate) really decreases the
> number of lines in the data frame, but sometimes it can grows depending
> on the rule. Just to give an idea, the first "rule" in v1 maps 820
> different values into 7 ones.
> Using SQL and a database this can be done in a very straightforward way
> (for example on the variable v1):
>
> Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
> >From DF, RULE_v1
> Where v1=COD_IN
> Group by v2, v3,v4, v5, v6, v7
>
> So the first choice would be using a database; the second one would be
> splitting the data frame and then joining the results.
> Is there any other possibility to merge+aggregate caused by the merge ?
>
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
> the Bank nor to be understood as creating any obligation on its part except where
> provided for in a written agreement. This e-mail is confidential. If you have received it
> by mistake, please inform the sender by reply e-mail and delete it from your system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 06 Jun 2008 - 22:19:28 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 06 Jun 2008 - 22:30:47 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive