Re: [R] Using tapply to create a new table

From: Kalish, Josh <josh.kalish_at_credit-suisse.com>
Date: Fri 26 Jan 2007 - 18:21:18 GMT


Marc,

Thanks for pointing out the merge function. That gets me part of the way there. The only thing is that I can't get the tapply() results into a format that merge() will take. For example:

merge( set1 , tapply( set2$f1 , set2$commonField, mean ) , by="commonField" )

Gives me "Error in names... Unused arguments..."

I'm not sure what the result of a tapply() exactly is, but it doesn't seem to be a table.

Yeah, rank amateur questions...

Thanks,

Josh

-----Original Message-----
From: Marc Schwartz [mailto:marc_schwartz@comcast.net] Sent: Friday, January 26, 2007 1:08 PM
To: Kalish, Josh
Cc: 'r-help@stat.math.ethz.ch'
Subject: Re: [R] Using tapply to create a new table

On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:
> All,
>
> I'm sure that this is covered somewhere, but I can't seem to find a
> good explanation. I have an existing table that contains information
> grouped by date. This is as so:
>
> Day NumberOfCustomers NumberOfComplaints
> 20060512 10040 40
> 20060513 32420 11
> ...
>
>
> I also have a table at the detail level as so:
>
> Day Meal PricePaid UsedCupon
> 20060512 Fish 14 Y
> 20060512 Chicken 20 N
> ...
>
> Is there a simple way to create summaries on the detail table and then
> join them into the first table above so that it looks like this:
>
> Day NumberOfCustomers NumberOfComplaints AveragePricePaid
> NumberUsingCupon
>
>
> I can do a tapply to get what I want from the detail table, but I
> can't figure out how to turn that into a table and join it back in.
>
>
>
> Thanks,
>
> Josh

Skipping the steps of using tapply() or aggregate() to get the summarized data from the second data frame, you would then use merge() to perform a SQL-like 'join' operation:

> DF.1

       Day NumberOfCustomers NumberOfComplaints
1 20060512             10040                 40
2 20060513             32420                 11


> DF.2
Day Meal PricePaid UsedCupon 1 20060512 Fish 14 Y 2 20060512 Chicken 20 N
> merge(DF.1, DF.2, by = "Day")
Day NumberOfCustomers NumberOfComplaints Meal PricePaid 1 20060512 10040 40 Fish 14 2 20060512 10040 40 Chicken 20 UsedCupon 1 Y 2 N

By default, only rows matching on the 'by' argument in both data frames will be in the result. See the 'all.x' and 'all.y' arguments to handle other scenarios of including non-matching rows.

See ?merge, which BTW:

  help.search("join")

would point you to, if you are familiar with the term from relational data base operations.

HTH, Marc Schwartz



Please access the attached hyperlink for an important electr...{{dropped}}

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Jan 27 05:24:40 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 26 Jan 2007 - 19:31:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.