RE: [R] Aggregating data (with more than one function)

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Wed 30 Mar 2005 - 11:22:27 EST


> agg.dat <- do.call("rbind", by(dat$Salary, dat$Department,

+                function(x) c(mean=mean(x), total=sum(x))))

> agg.dat <- data.frame(dept=rownames(agg.dat), agg.dat)
> agg.dat

           dept mean total

Finance Finance 83925.67 251777
HR           HR 63333.33 190000
IT           IT 59928.67 179786
Sales     Sales 62481.67 187445

Andy

> From: Robin Schroeder
>
> Dear list & Andy,
>
> I am hopelessly stumped, how would one add the department
> names as a variable?
>
> Robin
>
> > Robin Tori Schroeder
> > International Institute for Sustainability
> > P.O. Box 873211
> > Arizona State University
> > Tempe, Arizona 85287-3211
> > Phone: (480) 727-7290
> >
> >
>
>
> -----Original Message-----
> From: r-help-bounces@stat.math.ethz.ch
> [mailto:r-help-bounces@stat.math.ethz.ch]On Behalf Of Liaw, Andy
> Sent: Monday, March 28, 2005 6:45 PM
> To: 'Sivakumaran Raman'; r-help@stat.math.ethz.ch
> Subject: RE: [R] Aggregating data (with more than one function)
>
>
> Here's one possible way, using the data you supplied:
>
> > dat <- read.table("clipboard", header=T, row=1)
> > do.call("rbind",by(dat$Salary, dat$Department, function(x)
> c(mean=mean(x),
> total=sum(x))))
> mean total
> Finance 83925.67 251777
> HR 63333.33 190000
> IT 59928.67 179786
> Sales 62481.67 187445
>
> If you need the department names as a variable, you can add
> that easily.
>
> HTH,
> Andy
>
> > From: Sivakumaran Raman
> >
> > I have the data similar to the following in a data frame:
> > LastName Department Salary
> > 1 Johnson IT 56000
> > 2 James HR 54223
> > 3 Howe Finance 80000
> > 4 Jones Finance 82000
> > 5 Norwood IT 67000
> > 6 Benson Sales 76000
> > 7 Smith Sales 65778
> > 8 Baker HR 56778
> > 9 Dempsey HR 78999
> > 10 Nolan Sales 45667
> > 11 Garth Finance 89777
> > 12 Jameson IT 56786
> >
> > I want to calculate both the mean salary broken down by
> > Department and
> > also the
> > total amount paid out per department i.e. I want both
> sum(Salary) and
> > mean(Salary) for each Department. Right now, I am using
> > aggregate.data.frame
> > twice, creating two data frames, and then combining them
> > using data.frame.
> > However, this seems to be very memory and processor
> intensive and is
> > taking a
> > very long time on my data set. Is there a quicker way to do this?
> >
> > Thanks in advance,
> > Siv Raman
> >

> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Mar 30 11:31:06 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:30:57 EST