Re: [R] Statistical significance of a classifier

From: Martin C. Martin <martin_at_metahuman.org>
Date: Sat 06 Aug 2005 - 07:15:21 EST

Liaw, Andy wrote:

>>From: Martin C. Martin
>>
>>Hi,
>>
>>I have a bunch of data points x from two classes A & B, and
>>I'm creating
>>a classifier. So I have a function f(x) which estimates the
>>probability
>>that x is in class A. (I have an equal number of examples of
>>each, so
>>p(class) = 0.5.)
>>
>>One way of seeing how well this does is to compute the error
>>rate on the
>>test set, i.e. if f(x)>0.5 call it A, and see how many times I
>>misclassify an item. That's what MASS does. But we should
>>
>>
>
>Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
>
>

That was my impression, but I didn't want to presume to speak for most dataminers/machine learners.

>>be able to
>>do better: misclassifying should be more of a problem if the
>>regression
>>is confident then if it isn't.
>>
>>How can I show that my f(x) = P(x is in class A) does better
>>than chance?
>>
>>
>
>It depends on what you mean by `better'. For some problem, people are
>perfectly happy with misclassifcation rate. For others, the estimated
>probabilities count a lot more. One possibility is to look at the ROC
>curve. Another possibility is to look at the calibration curve (see MASS
>the book).
>
>

Thanks, those are getting closer to what I want. I think the bottom line is that I can't really assign a p-value the way I want to, since the problem I'm thinking of is ill-posed.

Thanks,
Martin

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 06 07:20:43 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:06:46 EST