[R] Truncate levels to use randomForest

From: Martin Lam <tmlammail_at_yahoo.com>
Date: Fri 26 Aug 2005 - 18:15:35 EST


Hi,

I will explain my problem with this example:

library("randomForest")

# load the iris plant data set

dataset <- iris

numberarray <- array(1:nrow(dataset), nrow(dataset), 1)

# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa" |
dataset$Species == "virginica") == TRUE])

finaldataset <- dataset[indices,]

# just to let you see the 3 classes

levels(finaldataset$Species)

# create the random forest

randomForest(formula = Species ~ ., data = finaldataset, ntree = 5)

# The error message I get

Error in randomForest.default(m, y, ...) :

        Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes

uniqueItems <- unique(levels(finaldataset$Species))

# the problem!

newlevels <- list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message

Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):

newlevels <- list("setosa" = c(uniqueItems[1], uniqueItems[2]), "virginica" = uniqueItems[3])

levels(finaldataset$Species) <- newlevels

levels(finaldataset$Species)

finaldataset$Species


Thanks in advance,

Martin



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Aug 26 18:29:39 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:39:56 EST