[R] p-values for classification

From: <Arne.Muller_at_sanofi-aventis.com>
Date: Fri 01 Jul 2005 - 20:14:20 EST

Dear All,

I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points.

I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity.

The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal.

Basically I'm looking for a way to calculate the p-values analytically. I'd be happy for any suggestions, web-addresses or references.

        kind regads,


R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 01 20:21:51 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:08 EST