From: Andrew Robinson <A.Robinson_at_ms.unimelb.edu.au>

Date: Mon, 02 May 2011 09:14:40 +1000

Date: Mon, 02 May 2011 09:14:40 +1000

This is a nice demonstration of the formula interface to aggregate. A less elegant alternative is to pass lists as arguments.

with(dd,

aggregate(Correct, by = list(Subject = Subject, Group = Group), FUN = function(x) sum(x == 'C')))

Using a list is advantageous if you want to make the summary of more than one variable (which does not seem to be the case, here) --- I believe that the formula interface doesn't allow for that. That would be set up like this

with(dd,

aggregate(x = list(Correct = Correct, other target variables listed here, ...), by = list(Subject = Subject, Group = Group), FUN = function(x) sum(x == 'C')))

Cheers

Andrew

On Sat, Apr 30, 2011 at 10:03:24PM -0700, Dennis Murphy wrote:

*> Hi:
**>
*

> If you have R 2.11.x or later, one can use the formula version of aggregate():

*>
**> aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x)
**> sum(x == 'C'))
**>
**> A variety of contributed packages (plyr, data.table, doBy, sqldf and
**> remix, among others) have similar capabilities.
**>
**> If you want some additional summaries (e.g., percent correct), here is
**> an example function for a single subject/group that aggregate() can
**> use to propagate to all subgroups and subjects (I encourage you to
**> play with it):
**>
**> f <- function(x) {
**> Correct <- sum(x == 'C')
**> Percent <- round(100 * Correct/length(x), 3)
**> c(Number = Correct, Percent = Percent)
**> }
**> aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f)
**>
**> The particular function isn't as important as knowing you can do this
**> sort of thing. Several of the contributed packages indicated above
**> have similar, if not superior, capabilities, depending on the
**> situation.
**>
**> Toy example to test the above:
**>
**> dd <- data.frame(Subject = rep(1:5, each = 100),
**> Group = rep(rep(c('C', 'T'), each = 50), 5),
**> Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C')))
**> aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C'))
**> Subject Group Correct
**> 1 1 C 40
**> 2 2 C 36
**> 3 3 C 39
**> 4 4 C 37
**> 5 5 C 41
**> 6 1 T 43
**> 7 2 T 45
**> 8 3 T 37
**> 9 4 T 45
**> 10 5 T 36
**> aggregate(Correct ~ Subject + Group, data = dd, FUN = f)
**> Subject Group Correct.Number Correct.Percent
**> 1 1 C 40 80
**> 2 2 C 36 72
**> 3 3 C 39 78
**> 4 4 C 37 74
**> 5 5 C 41 82
**> 6 1 T 43 86
**> 7 2 T 45 90
**> 8 3 T 37 74
**> 9 4 T 45 90
**> 10 5 T 36 72
**>
**> HTH,
**> Dennis
**>
**> On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham <kburnham_at_gmail.com> wrote:
**> > HI All,
**> >
**> > I have a long data file generated from a minimal pair test that I gave to
**> > learners of Arabic before and after a phonetic training regime. For each of
**> > thirty some subjects there are 800 rows of data, from each of 400 items at
**> > pre and posttest. For each item the subject got correct, there is a 'C' in
**> > the column 'Correct'. The line:
**> >
**> > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
**> >
**> > gives me the sum of correct answers for each subject.
**> >
**> > However, I would like to have that sum separated by Time (pre or post). Is
**> > there a simple way to do that?
**> >
**> >
**> > What if I further wish to separate by Group (T or C)?
**> >
**> > Thanks,
**> > Kevin
**> >
**> > [[alternative HTML version deleted]]
**> >
**> > ______________________________________________
**> > R-help_at_r-project.org mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

-- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu 05 May 2011 - 06:25:08 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 05 May 2011 - 07:00:06 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*