# Re: [R] using tapply with multiple variables

From: Dennis Murphy <djmuser_at_gmail.com>
Date: Sat, 30 Apr 2011 22:03:24 -0700

Hi:

aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x) sum(x == 'C'))

A variety of contributed packages (plyr, data.table, doBy, sqldf and remix, among others) have similar capabilities.

If you want some additional summaries (e.g., percent correct), here is an example function for a single subject/group that aggregate() can use to propagate to all subgroups and subjects (I encourage you to play with it):

f <- function(x) {

Correct <- sum(x == 'C')
Percent <- round(100 * Correct/length(x), 3)     c(Number = Correct, Percent = Percent)   }
aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f)

The particular function isn't as important as knowing you can do this sort of thing. Several of the contributed packages indicated above have similar, if not superior, capabilities, depending on the situation.

Toy example to test the above:

dd <- data.frame(Subject = rep(1:5, each = 100),

```                  Group = rep(rep(c('C', 'T'), each = 50), 5),
Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C')))
```
aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C'))

Subject Group Correct

```1        1     C      40
2        2     C      36
3        3     C      39
4        4     C      37
5        5     C      41
6        1     T      43
7        2     T      45
8        3     T      37
9        4     T      45
10       5     T      36
```

aggregate(Correct ~ Subject + Group, data = dd, FUN = f)

Subject Group Correct.Number Correct.Percent

```1        1     C             40              80
2        2     C             36              72
3        3     C             39              78
4        4     C             37              74
5        5     C             41              82
6        1     T             43              86
7        2     T             45              90
8        3     T             37              74
9        4     T             45              90
10       5     T             36              72

```

HTH,
Dennis

On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham <kburnham_at_gmail.com> wrote:
> HI All,
>
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime.  For each of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest.  For each item the subject got correct, there is a 'C' in
> the column 'Correct'.  The line:
>
> tapply(ALLDATA\$Correct, ALLDATA\$Subject, function(x)sum(x=="C"))
>
> gives me the sum of correct answers for each subject.
>
> However, I would like to have that sum separated by Time (pre or post).  Is
> there a simple way to do that?
>
>
> What if I further wish to separate by Group (T or C)?
>
> Thanks,
> Kevin
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help