Re: [R] Proportion test in three-chices experiment

From: Jonathan Baron <baron_at_psych.upenn.edu>
Date: Mon 18 Jul 2005 - 05:05:52 EST

> Thanks for your reply, Jonathan. Thanks also to Spencer, who suggested
> using the BTm function. I realize that my description of both the
> experiment and the involved issue was not clear. Let me try again:
>
> My subjects do a recognition task where I present stimuli belonging to
> three different classes (let us say A, B, and C). There are many of
> them. Subjects are asked to recognize each stimulus as belonging to one
> of the three classes (forced-choice design). This is done under two
> different conditions (say conditions 1 and 2). I end up with matrices of
> counts like this (in R notation):
>
> # under condition 1
> c1 <- t (matrix (c (c1AA, c1AB, c1AC,
> c1BA, c1BB, c1BC,
> c1CA, c1CB, c1CC), nc = 3))
> # under condition 2
> c2 <- t (matrix (c (c2AA, c2AB, c2AC,
> c2BA, c2BB, c2BC,
> c2CA, c2CB, c2CC), nc = 3))
>
> where "cijk" is the number of times the subject gave answer k when
> presented with a stimulus of class j, under condition i.
>
> The issue is to test whether subjects perform better (in the sense of a
> higher recognition score) in condition 1 compared with condition 2. My
> first idea was to test the global recognition rate, which could be
> computed as:
>
> # under condition 1
> r1 <- sum (diag (c1)) / sum (c1)
> # under condition 2
> r2 <- sum (diag (c2)) / sum (c2)
>
> The null hypothesis is that r1 is not different from r2. I guess that I
> could test it with the chisq.test function, like this:
>
> p1 <- sum (diag (c1))
> q1 <- sum (c1) - p1
> p2 <- sum (diag (c2))
> q2 <- sum (c2) - p2
> chisq.test (matrix (c(p1, q1, p2, q2), nc = 2))
>
> What do you think?
>
> I also thought about testing the triples like [c1AA, c1AB, c1AC] against
> [c2AA, c2AB, c2AC], hence my original question.

The method you suggest requires several assumptions, and I don't know if these are reasonable. The problem is in using a sum of the diagonal (p1) and off-diagonal entries (q1) in the table. This may work if you have no reason to think that c2 is better, ever. In that case, all you need is a measure that varies monotonically with the true measure, whatever it is. You need also to assume that c1 and c2 do not differ in response biases, and that it could not be the case that one of the diagonal cells is better in c1 and another is better in c2.

I have not studied these issues much since my PhD thesis (1970!), but then the usual approach was to develop a sensible model of the task and then use some parameter of the model as the measure. Perhaps this is over-kill for what you are doing, but I don't know. For example, one model says that the subject either knows the answer or guesses, and the guesses are distributed across the three categories according to biases that are specific to the condition, but knowing the answer is independent of the category. (You can test the assumptions of this model.) Another model (popular in 1970) is Luce's choice theory, which is similar to the first but uses multiplication. If I remember correctly (which I probably don't) you would exactly what you propose but after taking the logs of the frequencies.

It is possible to get different, even opposite, results using logs than you would get with your proposal. Likewise, it is possible to get opposite results if you ignore response bias, and if the conditions differ in response bias.

The suggestion I made based on the idea of inter-rater agreement implies a rough-and-ready model similar to the first. It does take response bias into account.

Jon

```--
Jonathan Baron, Professor of Psychology, University of Pennsylvania