Re: [R] sampsize in Random Forests

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Mon, 10 Mar 2008 11:00:57 -0400

Are you sure there are 100 sites in your data? Here's an example:

R> library(randomForest)
randomForest 4.5-23
Type rfNews() to see new features/changes/bug fixes. R> f <- factor(sample(1:4, nrow(iris), replace=TRUE)) R> rf1 <- randomForest(iris[1:4], iris[[5]], strata=f, sampsize=rep(5, nlevels(f)))
R> rf1

Call:
 randomForest(x = iris[1:4], y = iris[[5]], strata = f, sampsize =

rep(5,      nlevels(f))) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of error rate: 4.67% Confusion matrix:

           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08
 

> -----Original Message-----
> From: r-help-bounces_at_r-project.org
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Naiara Pinto
> Sent: Sunday, March 09, 2008 5:19 PM
> To: r-help_at_r-project.org
> Subject: [R] sampsize in Random Forests
>
> Hi all,
>
> I have a dataset where each point is assigned to a class A, B, C, or
> D. Each point is also assigned to a study site. Each study site is
> coded with a number ranging between 1-100. This information is stored
> in the vector studySites.
>
> I want to run randomForests using stratified sampling, so I
> chose the option
> strata = factor(studySites)
>
> But I am not sure how to control the number of samples taken from each
> study site. I tried to use 10 points from each study site:
> mySampSize = rep(10, 100)
>
> So my function call looks like:
> RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE,
> strata = factor(studySites), sampsize=mySampSize)
>
> But randomForest gives me the following error:
> Error in randomForest.default(m, y, ...) :
> sampsize can not be larger than class frequency
>
> Does anybody have any idea why this happens?
>
> Thank you very much,
>
> Naiara.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>



Notice: This e-mail message, together with any attachme...{{dropped:15}}

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 10 Mar 2008 - 15:05:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 Apr 2008 - 12:30:28 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive