From: Weiwei Shi <helprhelp_at_gmail.com>

Date: Thu 23 Jun 2005 - 01:30:06 EST

Date: Thu 23 Jun 2005 - 01:30:06 EST

[,1] [,2]

[1,] 11266 2151526

[2,] 125 31734

[,1] [,2]

[1,] 43571 2119221

[2,] 52 31807

[,1] [,2]

[1,] 427 2162365

[2,] 5 31854

[,1] [,2]

[1,] 427 2162365

[2,] 5 31854

> chisq.test(tab[,,3])

Pearson's Chi-squared test with Yates' continuity correction

data: tab[, , 3]

X-squared = 0.0963, df = 1, p-value = 0.7564

1: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 2: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 3: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i]) 4: Chi-squared approximation may be incorrect in: chisq.test(tab[, , i])

2. So, my second question is, is this warning b/c I am against the assumption of using chisq. But why Word 3 is fine? How to trace the warning to see which word caused this warning?

3. My result looks like this (after some mapping treating from number id to word and some words are stemmed here, like ACCID is accident): > of[1:50,]

map...2. p.fisher 21 ACCID 0.000000e+00 30 CD 0.000000e+00 67 ROCK 0.000000e+00 104 CRACK 0.000000e+00 111 CHIP 0.000000e+00 179 GLASS 0.000000e+00 84 BACK 4.199878e-291 395 DRIVEABL 5.335989e-287 60 CAP 9.405235e-285 262 WINDSHIELD 2.691641e-254 13 IV 3.905186e-245 110 HZ 2.819713e-210 11 CAMP 9.086768e-207 2 SHATTER 5.273994e-202 297 ALP 1.678521e-177 162 BED 1.822031e-173 249 BCD 1.398391e-160 493 RACK 4.178617e-156 59 CAUS 7.539031e-147

3.1 question: Should I use two-sided test instead of one-sided for fisher test? I read some material which suggests using two-sided.

3.2 A big question: Even though the result looks very promising since this is case of classiying fraud cases and the words selected by this approach make sense. However, I think p-values here just indicate the strength to reject null hypothesis, not the strength of association between word and class of document. So, what kind of statistics I should use here to evaluate the strength of association? odds ratio?

Thanks!

-- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Thu Jun 23 01:46:23 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:32:57 EST
*