From: Martin Lam <tmlammail_at_yahoo.com>
Date: Sat 27 Aug 2005 - 01:52:21 EST


Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class "a" or "b" from the levels "a", "b" and "c" doesn't work because there is no instance that has class "c". Is there any way to solve this problem?


# load the iris plant data set
dataset <- iris

numberarray <- array(1:nrow(dataset), nrow(dataset), 1)

# include only instances with Species = setosa or
indices <- t(numberarray[(dataset$Species == "setosa" |
dataset$Species == "virginica") == TRUE])

finaldataset <- dataset[indices,]

# just to let you see the 3 classes

# create the random forest

randomForest(formula = Species ~ ., data = finaldataset, ntree = 5)

# The error message I get

Error in randomForest.default(m, y, ...) :

        Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))

# the problem!

newlevels <- list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message

Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1], uniqueItems[2]), "virginica" = uniqueItems[3])

levels(finaldataset$Species) <- newlevels



Thanks in advance,


