Re: [R] Summarising by group

From: Ista Zahn <>
Date: Mon, 09 May 2011 10:23:59 -0400

Trolling? (but see in line below)

On Mon, May 9, 2011 at 5:12 AM, Martyn Byng <> wrote:
> I wonder if someone with more experience than me on using R to summarise
> by group wants to post a reply to this
> opics/why-still-use-sas-with-a-lot
> To save everyone having to follow the link, the text is copied below
> "SAS has some nice features, such as the SQL procedure or simple "group
> by" features. Try to compute correlations "by group" in R: say you have
> 2,000 groups, 2 variables e.g. salary and education level, and 2 million
> observations - you want to compute correlation between salary and
> education within each group.
> It is not obvious, your best bet is to use some R package

The wealth of R packages is a core strength of the platform. It is not a disadvantage to have a wealth of well-developed code for almost any statistical application.

(see sample
> code on Analyticbridge to do it), and the solution is painful,

Well, the Analyticbridge code is painful, but that example says far more about the person who wrote it that it does about R. All they really needed to do was

v <- ddply(xx, .(country), summarize,

           COR = cor(income, age),
           MEAN_age = mean(age),
           MEAN_income = mean(income),
           MAX_income = max(income),
           STDEV_income = sd(income))

I'm not intested in signing up so I can post a reply to the orignial post, but feel free to copy my answer there if you want.


you can
> not return both correlation and stdev "by group", as the function can
> return only one argument, not a vector. So if you want to return not
> just two, but say 100 metrics, it becomes a nightmare."

Wrong, see example above.

> ________________________________________________________________________
> The Numerical Algorithms Group Ltd is a company registered in England
> and Wales with company number 1249803. The registered office is:
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
> This e-mail has been scanned for all viruses by Star. Th...{{dropped:4}}
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 09 May 2011 - 14:27:37 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 09 May 2011 - 15:00:06 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive