Re: [R] problem with certain data sets when using randomForest

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Wed 31 Aug 2005 - 23:47:29 EST


I've been trying to play catch-up on R-help since DSC2005. This one must have slipped through...

This is what I'd do:

iris.sub <- subset(iris, Species %in% c("setosa", "virginica")) iris.sub$Species <- factor(iris.sub$Species)

That last line drops the empty level in the factor. You can then run randomForest with that data.

HTH,
Andy

> From: Martin Lam
>
> Hi,
>
> Since I've had no replies on my previous post about my
> problem I am posting it again in the hope someone
> notice it. The problem is that the randomForest
> function doesn't take datasets which has instances
> only containing a subset of all the classes. So the
> dataset with instances that either belong to class "a"
> or "b" from the levels "a", "b" and "c" doesn't work
> because there is no instance that has class "c". Is
> there any way to solve this problem?
>
> library("randomForest")
>
> # load the iris plant data set
> dataset <- iris
>
> numberarray <- array(1:nrow(dataset), nrow(dataset),
> 1)
>
> # include only instances with Species = setosa or
> virginica
> indices <- t(numberarray[(dataset$Species == "setosa"
> |
> dataset$Species == "virginica") == TRUE])
>
> finaldataset <- dataset[indices,]
>
> # just to let you see the 3 classes
> levels(finaldataset$Species)
>
> # create the random forest
> randomForest(formula = Species ~ ., data =
> finaldataset, ntree = 5)
>
> # The error message I get
> Error in randomForest.default(m, y, ...) :
> Can't have empty classes in y.
>
> #The problem is that the finaldataset doesn't contain
> #any instances of "versicolor", so I think the only
> way #to solve this problem is by changing the levels
> the #"Species" have to only "setosa" and "virginica",
> # correct me if I'm wrong.
>
> # So I tried to change the levels but I got stuck:
>
> # get the possible unique classes
> uniqueItems <- unique(levels(finaldataset$Species))
>
> # the problem!
> newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
> uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
>
> # Error message
> Error: syntax error
>
> # In the help they use constant names to rename the
> #levels, so this works (but that's not what I want
> #because I don't want to change the code every time I
> #use another data set):
> newlevels <- list("setosa" = c(uniqueItems[1],
> uniqueItems[2]), "virginica" = uniqueItems[3])
>
> levels(finaldataset$Species) <- newlevels
>
> levels(finaldataset$Species)
>
> finaldataset$Species
>
> ---------------------------
>
> Thanks in advance,
>
> Martin
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Aug 31 23:58:46 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 16:09:44 EST