Re: [R] error in random forest

From: <Bill.Venables_at_csiro.au>
Date: Sat, 08 Mar 2008 11:14:13 +1000


The error message is pretty clear, really. To spell it out a bit more, what you have done is as follows.

Your training set has factor variables in it. Suppose one of them is "f". In the training set it has 5 levels, say.

Your test set also has a factor "f", as it must, but it appears that in the test set it has 6 levels, or more, or levels that do not agree with those for "f" in the training set.

This mismatch measn that the predict method for randomForest cannot use this test set.

What you have to do is make sure that the factor levels agree for every factor in both test and training set. One way to do this is to put the test and training set together with rbind(...) say, and then separate them again. But even this will still have a problem for you. Because you training set will have some factor levels empty, which are not empty in the test set. The error will most likely be more subtle, though.

You really need to sort this out yourself. It is not particularly an R problem, but a confusion over data. To be useful, your training set need to cover the field for all levels of every factor. Think about it.

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Nagu
Sent: Saturday, 8 March 2008 5:37 AM
To: r-help_at_r-project.org; r-help_at_stat.math.ethz.ch Subject: [R] error in random forest

Hi,

I get the following error when I try to predict the probabilities of a test sample:

Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = "prob") :   New factor levels not present in the training data

I have about 630 predictor variables in the dataset x.OM (25 factor variables and the remaining are continuous variables). Any ideas on how to trace it?

Thank you,
Nagu



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 08 Mar 2008 - 01:18:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 08 Mar 2008 - 02:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive