Re: [R] Using tapply to create a new table

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Fri 26 Jan 2007 - 18:47:04 GMT

Josh,

As per the "Value" section of ?tapply, it "returns a single atomic value for each cell". This is easily viewed by using:

  tapply(set2$f1, set2$commonField, mean)

in a stand alone fashion or:

  str(tapply(set2$f1, set2$commonField, mean))

which will display the internal structure of the result. See ?str

To use merge() in the fashion you seem to want, you would want to use aggregate() and not tapply(). The former returns a data frame, where the "key" or "by" values will be part of the output.

See ?aggregate for more information.

I would also recommend that both for the readability of the code you write and to help clarify for yourself, the objects that are returned from each step, that you not nest the function calls as you have below.

There are times when it makes sense, but there are times when the code would end up being a good candidate for an Obfuscated R contest. :-)

HTH, Marc

On Fri, 2007-01-26 at 13:21 -0500, Kalish, Josh wrote:
> Marc,
>
> Thanks for pointing out the merge function. That gets me part of the
> way there. The only thing is that I can't get the tapply() results
> into a format that merge() will take. For example:
>
> merge( set1 , tapply( set2$f1 , set2$commonField, mean ) ,
> by="commonField" )
>
> Gives me "Error in names... Unused arguments..."
>
> I'm not sure what the result of a tapply() exactly is, but it doesn't
> seem to be a table.
>
> Yeah, rank amateur questions...
>
> Thanks,
>
> Josh
>
> -----Original Message-----
> From: Marc Schwartz [mailto:marc_schwartz@comcast.net]
> Sent: Friday, January 26, 2007 1:08 PM
> To: Kalish, Josh
> Cc: 'r-help@stat.math.ethz.ch'
> Subject: Re: [R] Using tapply to create a new table
>
> On Fri, 2007-01-26 at 12:39 -0500, Kalish, Josh wrote:
> > All,
> >
> > I'm sure that this is covered somewhere, but I can't seem to find a
> > good explanation. I have an existing table that contains
> information
> > grouped by date. This is as so:
> >
> > Day NumberOfCustomers NumberOfComplaints
> > 20060512 10040 40
> > 20060513 32420 11
> > ...
> >
> >
> > I also have a table at the detail level as so:
> >
> > Day Meal PricePaid UsedCupon
> > 20060512 Fish 14 Y
> > 20060512 Chicken 20 N
> > ...
> >
> > Is there a simple way to create summaries on the detail table and
> then
> > join them into the first table above so that it looks like this:
> >
> > Day NumberOfCustomers NumberOfComplaints AveragePricePaid
> > NumberUsingCupon
> >
> >
> > I can do a tapply to get what I want from the detail table, but I
> > can't figure out how to turn that into a table and join it back in.
> >
> >
> >
> > Thanks,
> >
> > Josh
>
> Skipping the steps of using tapply() or aggregate() to get the
> summarized data from the second data frame, you would then use merge()
> to perform a SQL-like 'join' operation:
>
> > DF.1
> Day NumberOfCustomers NumberOfComplaints
> 1 20060512 10040 40
> 2 20060513 32420 11
>
> > DF.2
> Day Meal PricePaid UsedCupon
> 1 20060512 Fish 14 Y
> 2 20060512 Chicken 20 N
>
> > merge(DF.1, DF.2, by = "Day")
> Day NumberOfCustomers NumberOfComplaints Meal PricePaid
> 1 20060512 10040 40 Fish 14
> 2 20060512 10040 40 Chicken 20
> UsedCupon
> 1 Y
> 2 N
>
>
> By default, only rows matching on the 'by' argument in both data
> frames will be in the result. See the 'all.x' and 'all.y' arguments to
> handle other scenarios of including non-matching rows.
>
> See ?merge, which BTW:
>
> help.search("join")
>
> would point you to, if you are familiar with the term from relational
> data base operations.
>
> HTH,
>
> Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat Jan 27 05:52:48 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Fri 26 Jan 2007 - 19:31:50 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.