Re: [R] p-values for classification

From: Prof Brian Ripley <>
Date: Fri 01 Jul 2005 - 22:01:05 EST

Not really an R question.

Most classifiers will produce predicted probabilities, and you can check their accuracy. There are lots of details in my PRNN book, and some examples in MASS4.

I suggest you adjust your training and test sets to be more nearly equal, or use cross-validation.

I don't see how shuffling the labels will help: you want to know how well a classifier does when there is a real relationship between the explanatory variables and the class. To take a simple example, suppose the classes are clearly linearly separable. Then a logistic discriminant will have nigh-perfect performance on the actual data, but very poor performance on permuted labels. You would do a lot better to simulate from a good fitted model, the so-called parametric bootstrapping.

On Fri, 1 Jul 2005 wrote:

> Dear All,
> I'm classifying some data with various methods (binary classification).
> I'm interpreting the results via a confusion matrix from which I
> calculate the sensitifity and the fdr. The classifiers are trained on
> 575 data points and my test set has 50 data points.
> I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for
> each classifier. I was thinking about shuffling/bootstrap the lables of
> the test set, classify them and calculating the p-value from the
> obtained normal distributed random fdr and sensitifity.
> The problem is that it's rather slow when running many rounds of
> shuffling/classification (I'd like to do this for many classifiers and
> parameter combinations). In addition classification of the 50 test data
> points with shuffled lables realistically produces only a very limited
> number of possible fdr's and sensitivities, and I'm wondering if I can
> realy believe these values to be normal.
> Basically I'm looking for a way to calculate the p-values analytically.
> I'd be happy for any suggestions, web-addresses or references.
> kind regads,
> Arne
> ______________________________________________
> mailing list
> PLEASE do read the posting guide!

Brian D. Ripley,        
Professor of Applied Statistics,
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________ mailing list
PLEASE do read the posting guide!
Received on Fri Jul 01 22:04:19 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:08 EST