Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

From: Bert Gunter <gunter.berton_at_gene.com>
Date: Mon, 07 May 2007 09:38:23 -0700


Folks:

If I understand correctly, the following may be pertinent.

Note that the procedure:

min.nnet = nnet[k] such that error rate of nnet[k] = min[i] {error rate(nnet(training data) from ith random start) }

does not guarantee a classifier with a lower error rate on **new** data than any single one of the random starts. That is because you are using the same training set to choose the model (= nnet parameters) as you are using to determine the error rate. I know it's tempting to think that choosing the best among many random starts always gets you a better classifier, but it need not. The error rate on the training set for any classifier -- be it a single one or one derived in some way from many -- is a biased estimate of the true error rate, so that choosing a classifer on this basis does not assure better performance for future data. In particular, I would guess that choosing the best among many (hundreds/thousands) random starts is probably almost guaranteed to produce a poor predictor (ergo the importance of parsimony/penalization). I would appreciate comments from anyone, pro or con, with knowledge and experience of these things, however, as I'm rather limited on both.

The simple answer to the question of obtaining the error rate using validation data is: Do whatever you like to choose/fit a classifier on the training set. **Once you are done,** the estimate of your error rate is the error rate you get on applying that classifier to the validation set. But you can do this only once! If you don't like that error rate and go back to finding a a better predictor in some way, then your validation data have now been used to derive the classifier and thus has become part of the training data, so any further assessment of the error rate of a new classifier on it is now also a biased estimate. You need yet new validation data for that.

Of course, there are all sort of cross validation schemes one can use to avoid -- or maybe mitigate -- these issues: most books on statistical classification/machine learning discuss this in detail.

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces_at_stat.math.ethz.ch
[mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of hadley wickham Sent: Monday, May 07, 2007 5:26 AM
To: Wensui Liu
Cc: r-help_at_stat.math.ethz.ch
Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

Pick the one with the lowest error rate on your training data? Hadley

On 5/7/07, Wensui Liu <liuwensui_at_gmail.com> wrote:
> well, how to do you know which ones are the best out of several hundreds?
> I will average all results out of several hundreds.
>
> On 5/7/07, hadley wickham <h.wickham_at_gmail.com> wrote:
> > On 5/6/07, nathaniel Grey <nathaniel.grey_at_yahoo.co.uk> wrote:
> > > Hello R-Users,
> > >
> > > I have been using (nnet) by Ripley to train a neural net on a test
dataset, I have obtained predictions for a validtion dataset using:
> > >
> > > PP<-predict(nnetobject,validationdata)
> > >
> > > Using PP I can find the -2 log likelihood for the validation datset.
> > >
> > > However what I really want to know is how well my nueral net is doing
at classifying my binary output variable. I am new to R and I can't figure out how you can assess the success rates of predictions.
> > >
> >
> > table(PP, binaryvariable)
> > should get you started.
> >
> > Also if you're using nnet with random starts, I strongly suggest
> > taking the best out of several hundred (or maybe thousand) trials - it
> > makes a big difference!
> >
> > Hadley
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> WenSui Liu
> A lousy statistician who happens to know a little programming
> (http://spaces.msn.com/statcompute/blog)
>



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 May 2007 - 16:50:01 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 May 2007 - 08:31:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.