Re: [R] RandomForest question

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Fri 22 Jul 2005 - 00:16:43 EST


> From: Arne.Muller@sanofi-aventis.com
>
> Hello,
>
> I'm trying to find out the optimal number of splits (mtry
> parameter) for a randomForest classification. The
> classification is binary and there are 32 explanatory
> variables (mostly factors with each up to 4 levels but also
> some numeric variables) and 575 cases.
>
> I've seen that although there are only 32 explanatory
> variables the best classification performance is reached when
> choosing mtry=80. How is it possible that more variables can
> used than there are in columns the data frame?

It's not. The code for randomForest.default() has:

    ## Make sure mtry is in reasonable range.     mtry <- max(1, min(p, round(mtry)))

so it silently sets mtry to number of predictors if it's too large. As an example:

> library(randomForest)

randomForest 4.5-12
Type rfNews() to see new features/changes/bug fixes.
> iris.rf = randomForest(Species ~ ., iris, mtry=10)
> iris.rf$mtry

[1] 4

I should probably add a warning in such cases...

Andy  

> thanks for your help
> + kind regards,
>
> Arne
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Jul 22 01:14:17 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:55 EST