Re: [R] using tapply with multiple variables

From: Dennis Murphy <djmuser_at_gmail.com>
Date: Sat, 30 Apr 2011 22:03:24 -0700

Hi:

If you have R 2.11.x or later, one can use the formula version of aggregate():

aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x) sum(x == 'C'))

A variety of contributed packages (plyr, data.table, doBy, sqldf and remix, among others) have similar capabilities.

If you want some additional summaries (e.g., percent correct), here is an example function for a single subject/group that aggregate() can use to propagate to all subgroups and subjects (I encourage you to play with it):

f <- function(x) {

    Correct <- sum(x == 'C')
    Percent <- round(100 * Correct/length(x), 3)     c(Number = Correct, Percent = Percent)   }
aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f)

The particular function isn't as important as knowing you can do this sort of thing. Several of the contributed packages indicated above have similar, if not superior, capabilities, depending on the situation.

Toy example to test the above:

dd <- data.frame(Subject = rep(1:5, each = 100),

                  Group = rep(rep(c('C', 'T'), each = 50), 5),
                  Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C')))
aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C'))

   Subject Group Correct

1        1     C      40
2        2     C      36
3        3     C      39
4        4     C      37
5        5     C      41
6        1     T      43
7        2     T      45
8        3     T      37
9        4     T      45
10       5     T      36

aggregate(Correct ~ Subject + Group, data = dd, FUN = f)

   Subject Group Correct.Number Correct.Percent

1        1     C             40              80
2        2     C             36              72
3        3     C             39              78
4        4     C             37              74
5        5     C             41              82
6        1     T             43              86
7        2     T             45              90
8        3     T             37              74
9        4     T             45              90
10       5     T             36              72

HTH,
Dennis

On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham <kburnham_at_gmail.com> wrote:
> HI All,
>
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime.  For each of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest.  For each item the subject got correct, there is a 'C' in
> the column 'Correct'.  The line:
>
> tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
>
> gives me the sum of correct answers for each subject.
>
> However, I would like to have that sum separated by Time (pre or post).  Is
> there a simple way to do that?
>
>
> What if I further wish to separate by Group (T or C)?
>
> Thanks,
> Kevin
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 01 May 2011 - 06:07:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 May 2011 - 07:00:04 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive