Re: [R] Summarize by two or more attributes

From: Marc Schwartz <marc_schwartz_at_me.com>
Date: Tue, 17 May 2011 14:09:28 -0500

On May 17, 2011, at 11:48 AM, LCOG1 wrote:

> Okay everyone heres a likely softball for someone.
>
> Consider the following data frame:
>
> #Create data
> x<-rep(c(1,15),10)
> y<-rnorm(20)
> z<-c(rep("auto",10),rep("bus",10))
> a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
>
>
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
>
> A related question: I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists. Feel free to offer
> suggestions coming from that angle.
>
> Thanks guys
>
> JR-

See ?ave and consider:

# Presuming you want 'Bin' nested within 'Source' Df$Sum <- ave(Df$Rate, list(Df$Source, Df$Bin), FUN = sum)

# Or 'Source' nested within 'Bin'
Df$Sum <- ave(Df$Rate, list(Df$Bin, Df$Source), FUN = sum)

On your follow up, a data frame is a type of list with a 'data.frame' class attribute, a 'row.names' attribute and a 'names' attribute for the column names. Much like a matrix is a vector with a 'dim' attribute.

Try this:

  unclass(Df)

and see the output. It looks just like a list, because it is...

If dealing with 'rectangular' datasets (eg. a database table), where each column may need to be of differing data types, a data frame in R is specifically designed to handle it. It is because a data frame is a list, that it can do this, since each element in a list can be a different type.

If you need to deal with a data structure that may not be entirely based upon a rectangular data set and may need to contain various numbers of items per element, then a list is the way to go. Lists are commonly used in R functions to return complex objects that may contain vectors of various types, matrices, data frames and even lists of lists.

A quick example would be objects returned by R's model functions. Run example(lm) and after the graphs finish, use str(lm.D9) to give an example of the structure of a somewhat complex list object.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 17 May 2011 - 19:11:57 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 17 May 2011 - 19:40:07 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive