Re: [R] error in random forest

From: Nagu <thogiti_at_gmail.com>
Date: Fri, 07 Mar 2008 17:27:19 -0800

Thank you very much. I'll jump in to the data and verify the consistency between the training and testing variables and their levels.

On Fri, Mar 7, 2008 at 5:14 PM, <Bill.Venables_at_csiro.au> wrote:
> The error message is pretty clear, really. To spell it out a bit more,
> what you have done is as follows.
>
> Your training set has factor variables in it. Suppose one of them is
> "f". In the training set it has 5 levels, say.
>
> Your test set also has a factor "f", as it must, but it appears that in
> the test set it has 6 levels, or more, or levels that do not agree with
> those for "f" in the training set.
>
> This mismatch measn that the predict method for randomForest cannot use
> this test set.
>
> What you have to do is make sure that the factor levels agree for every
> factor in both test and training set. One way to do this is to put the
> test and training set together with rbind(...) say, and then separate
> them again. But even this will still have a problem for you. Because
> you training set will have some factor levels empty, which are not empty
> in the test set. The error will most likely be more subtle, though.
>
> You really need to sort this out yourself. It is not particularly an R
> problem, but a confusion over data. To be useful, your training set
> need to cover the field for all levels of every factor. Think about it.
>
>
>
> -----Original Message-----
> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> On Behalf Of Nagu
> Sent: Saturday, 8 March 2008 5:37 AM
> To: r-help_at_r-project.org; r-help_at_stat.math.ethz.ch
> Subject: [R] error in random forest
>
> Hi,
>
> I get the following error when I try to predict the probabilities of a
> test sample:
>
> Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = "prob") :
> New factor levels not present in the training data
>
> I have about 630 predictor variables in the dataset x.OM (25 factor
> variables and the remaining are continuous variables). Any ideas on
> how to trace it?
>
> Thank you,
> Nagu
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Mar 2008 - 01:31:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Mar 2008 - 02:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive