[R] sampsize in Random Forests

From: Naiara Pinto <naiara_at_mail.utexas.edu>
Date: Sun, 09 Mar 2008 16:18:48 -0500

Hi all,

I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites.

I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites)

But I am not sure how to control the number of samples taken from each study site. I tried to use 10 points from each study site: mySampSize = rep(10, 100)

So my function call looks like:
RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE, strata = factor(studySites), sampsize=mySampSize)

But randomForest gives me the following error: Error in randomForest.default(m, y, ...) : sampsize can not be larger than class frequency

Does anybody have any idea why this happens?

Thank you very much,


R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 09 Mar 2008 - 21:21:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 10 Mar 2008 - 15:30:21 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive