From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Sat 06 Aug 2005 - 06:06:05 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 06 06:18:27 2005

Date: Sat 06 Aug 2005 - 06:06:05 EST

*> From: Martin C. Martin
**>
**> Hi,
**>
*

> I have a bunch of data points x from two classes A & B, and

*> I'm creating
**> a classifier. So I have a function f(x) which estimates the
**> probability
**> that x is in class A. (I have an equal number of examples of
**> each, so
**> p(class) = 0.5.)
**>
**> One way of seeing how well this does is to compute the error
**> rate on the
**> test set, i.e. if f(x)>0.5 call it A, and see how many times I
**> misclassify an item. That's what MASS does. But we should
*

Surely you mean `99% of dataminers/machine learners' rather than `MASS'?

> be able to

*> do better: misclassifying should be more of a problem if the
**> regression
**> is confident then if it isn't.
**>
**> How can I show that my f(x) = P(x is in class A) does better
**> than chance?
*

It depends on what you mean by `better'. For some problem, people are perfectly happy with misclassifcation rate. For others, the estimated probabilities count a lot more. One possibility is to look at the ROC curve. Another possibility is to look at the calibration curve (see MASS the book).

Andy

*> Thanks,
*

> Martin

*>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide!
**> http://www.R-project.org/posting-guide.html
**>
**>
**>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Aug 06 06:18:27 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:39:44 EST
*