[R] Statistical significance of a classifier

From: Martin C. Martin <martin_at_metahuman.org>
Date: Sat 06 Aug 2005 - 05:58:39 EST


I have a bunch of data points x from two classes A & B, and I'm creating a classifier. So I have a function f(x) which estimates the probability that x is in class A. (I have an equal number of examples of each, so p(class) = 0.5.)

One way of seeing how well this does is to compute the error rate on the test set, i.e. if f(x)>0.5 call it A, and see how many times I misclassify an item. That's what MASS does. But we should be able to do better: misclassifying should be more of a problem if the regression is confident then if it isn't.

How can I show that my f(x) = P(x is in class A) does better than chance?


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 06 06:04:12 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:44 EST