[R] Truncate levels to use randomForest

From: Martin Lam <tmlammail_at_yahoo.com>
Date: Fri 26 Aug 2005 - 18:15:35 EST


I will explain my problem with this example:


# load the iris plant data set

dataset <- iris

numberarray <- array(1:nrow(dataset), nrow(dataset), 1)

# include only instances with Species = setosa or
indices <- t(numberarray[(dataset$Species == "setosa" |
dataset$Species == "virginica") == TRUE])

finaldataset <- dataset[indices,]

# just to let you see the 3 classes


# create the random forest

randomForest(formula = Species ~ ., data = finaldataset, ntree = 5)

# The error message I get

Error in randomForest.default(m, y, ...) :

        Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes

uniqueItems <- unique(levels(finaldataset$Species))

# the problem!

newlevels <- list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message

Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):

newlevels <- list("setosa" = c(uniqueItems[1], uniqueItems[2]), "virginica" = uniqueItems[3])

levels(finaldataset$Species) <- newlevels



Thanks in advance,


