# Re: [R] Computing P-Value

From: Ben Bolker <bolker_at_ufl.edu>
Date: Wed, 28 May 2008 11:20:54 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gundala Viswanath wrote:

```| Dear Ben,
|
| Given a set of words
| ('foo', 'bar', 'bar', 'bar', "quux" ..... "foo") this can be in 10.000
```
items.
| I would like to compute the significance of the word occurrence with P-Value.
```|
| Is there a simple way to do it?
|
| - GV
|

```

~ Closer, but still not enough information. What is your null
hypothesis? Equidistribution? If so, ...

dat <- sample(c("foo","bar","quux","pridznyskie"),
~ replace=TRUE,size=10000)

tab <- table(dat)
chisq.test(tab)

from ?chisq.test:

~ If 'x' is a matrix with one row or column, or if 'x' is a vector
~ and 'y' is not given, then a _goodness-of-fit test_ is performed
~ ('x' is treated as a one-dimensional contingency table). The
~ entries of 'x' must be non-negative integers. In this case, the
~ hypothesis tested is whether the population probabilities equal
~ those in 'p', or are all equal if 'p' is not given.

~ Note that this won't test the significance of *individual* deviations
from equiprobability, just the overall pattern. If you wanted to test individual words you could use binom.test -- but if you tested more than one word, or tested words on the basis of those that appeared to have extreme frequencies, you'd start running into multiple comparisons/ post hoc testing issues.

~ Do you know something about the methods that people usually use
in this area?

~ Ben Bolker

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIPXhVc5UpGjwzenMRAsunAJ9to/KGX0ohSrhUC8qTkhIR0CO8OgCfcejV +LpiB16YBG9ExiHd2tD0sOg=
=w5FE
-----END PGP SIGNATURE-----

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 28 May 2008 - 17:22:49 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 00:30:43 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.