Re: [R] randomForest question [Broadcast]

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 27 Jul 2006 - 00:23:18 EST


When mtry is equal to total number of features, you just get regular bagging (in the R package -- Breiman & Cutler's Fortran code samples variable with replacement, so you can't do bagging with that). There are cases when bagging will do better than random feature selection (i.e., RF), even in simulated data, but I'd say not very often.

HTH,
Andy

From: Arne.Muller@sanofi-aventis.com
>
> Hello,
>
> I've a question regarding randomForest (from the package with
> same name). I've 16 featurs (nominative), 159 positive and
> 318 negative cases that I'd like to classify (binary classification).
>
> Using the tuning from the e1071 package it turns out that the
> best performance if reached when using all 16 features per
> tree (mtry=16). However, the documentation of randomForest
> suggests to take the sqrt(#features), i.e. 4. How can I
> explain this difference? When using all features this is the
> same as a classical decision tree, with the difference that
> the tree is built and tested with different data sets, right?
>
> example (I've tried different configurations, incl. changing ntree):
> > param <- try(tune(randomForest, class ~ ., data=d.all318,
> > range=list(mtry=c(4, 8, 16), ntree=c(1000))));
> >
> > summary(param)
>
> Parameter tuning of `randomForest':
>
> - sampling method: 10-fold cross validation
>
> - best parameters:
> mtry ntree
> 16 1000
>
> - best performance: 0.1571809
>
> - Detailed performance results:
> mtry ntree error
> 1 4 1000 0.1928635
> 2 8 1000 0.1634752
> 3 16 1000 0.1571809
>
> thanks a lot for your help,
>
> kind regards,
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Jul 27 01:55:55 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 27 Jul 2006 - 02:16:50 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.