# Re: [R] Computing P-Value

From: Ben Bolker <bolker_at_ufl.edu>
Date: Wed, 28 May 2008 11:20:54 -0400

Gundala Viswanath wrote:

```| Dear Ben,
|
| Given a set of words
| ('foo', 'bar', 'bar', 'bar', "quux" ..... "foo") this can be in 10.000
```
items.
| I would like to compute the significance of the word occurrence with P-Value.
```|
| Is there a simple way to do it?
|
| - GV
|

```

~ Closer, but still not enough information. What is your null
hypothesis? Equidistribution? If so, ...

dat <- sample(c("foo","bar","quux","pridznyskie"),
~ replace=TRUE,size=10000)

tab <- table(dat)
chisq.test(tab)

from ?chisq.test:

~ If 'x' is a matrix with one row or column, or if 'x' is a vector
~ and 'y' is not given, then a _goodness-of-fit test_ is performed
~ ('x' is treated as a one-dimensional contingency table). The
~ entries of 'x' must be non-negative integers. In this case, the
~ hypothesis tested is whether the population probabilities equal
~ those in 'p', or are all equal if 'p' is not given.

~ Note that this won't test the significance of *individual* deviations
from equiprobability, just the overall pattern. If you wanted to test individual words you could use binom.test -- but if you tested more than one word, or tested words on the basis of those that appeared to have extreme frequencies, you'd start running into multiple comparisons/ post hoc testing issues.

~ Do you know something about the methods that people usually use
in this area?

~ Ben Bolker

