From: Bert Gunter <gunter.berton_at_gene.com>

Date: Mon, 07 May 2007 09:38:23 -0700

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 May 2007 - 16:50:01 GMT

Date: Mon, 07 May 2007 09:38:23 -0700

Folks:

If I understand correctly, the following may be pertinent.

The simple answer to the question of obtaining the error rate using validation data is: Do whatever you like to choose/fit a classifier on the training set. **Once you are done,** the estimate of your error rate is the error rate you get on applying that classifier to the validation set. But you can do this only once! If you don't like that error rate and go back to finding a a better predictor in some way, then your validation data have now been used to derive the classifier and thus has become part of the training data, so any further assessment of the error rate of a new classifier on it is now also a biased estimate. You need yet new validation data for that.

Of course, there are all sort of cross validation schemes one can use to avoid -- or maybe mitigate -- these issues: most books on statistical classification/machine learning discuss this in detail.

Bert Gunter

Genentech Nonclinical Statistics

-----Original Message-----

From: r-help-bounces_at_stat.math.ethz.ch

[mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of hadley wickham
Sent: Monday, May 07, 2007 5:26 AM

To: Wensui Liu

Cc: r-help_at_stat.math.ethz.ch

Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions

Pick the one with the lowest error rate on your training data? Hadley

On 5/7/07, Wensui Liu <liuwensui_at_gmail.com> wrote:

> well, how to do you know which ones are the best out of several hundreds?

*> I will average all results out of several hundreds.
**>
**> On 5/7/07, hadley wickham <h.wickham_at_gmail.com> wrote:
**> > On 5/6/07, nathaniel Grey <nathaniel.grey_at_yahoo.co.uk> wrote:
**> > > Hello R-Users,
**> > >
**> > > I have been using (nnet) by Ripley to train a neural net on a test
*

dataset, I have obtained predictions for a validtion dataset using:

*> > >
*

> > > PP<-predict(nnetobject,validationdata)

*> > >
**> > > Using PP I can find the -2 log likelihood for the validation datset.
**> > >
**> > > However what I really want to know is how well my nueral net is doing
*

at classifying my binary output variable. I am new to R and I can't figure
out how you can assess the success rates of predictions.

*> > >
**> >
*

> > table(PP, binaryvariable)

*> > should get you started.
**> >
**> > Also if you're using nnet with random starts, I strongly suggest
**> > taking the best out of several hundred (or maybe thousand) trials - it
**> > makes a big difference!
**> >
**> > Hadley
**> >
**> > ______________________________________________
**> > R-help_at_stat.math.ethz.ch mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide
*

http://www.R-project.org/posting-guide.html

*> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**>
**> --
*

> WenSui Liu

*> A lousy statistician who happens to know a little programming
**> (http://spaces.msn.com/statcompute/blog)
**>
*

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 07 May 2007 - 16:50:01 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 08 May 2007 - 08:31:41 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*