[R] Dummy variables using rfe in caret for variable selection

From: Ren <rajreni.kaul_at_gmail.com>
Date: Sun, 01 May 2011 07:52:05 -0700 (PDT)


I'm trying to run "rfe" for variable selection in the caret package, and am getting an error. My data frame includes a dummy variable with 3 levels.

x <- chlDescr
y <- chl
#crate dummy variable

levels(x$State) <- c("AL","GA","FL")
dummy <- model.matrix(~State,x)
z <- cbind(dummy, x)
#remove State category variable

w <- z[,c(-4)]
subsets <- c(2:8)
ctrl<- rfeControl(functions = lmFuncs, method="cv", verbose=FALSE, returnResamp = "final")
lmProfile <- rfe(w, y, sizes = subsets, rfeControl = ctrl)

Returns:
Error in `[.data.frame`(x, , retained, drop = FALSE) :   undefined columns selected
In addition: Warning message:
In predict.lm(object, x) :
  prediction from a rank-deficient fit may be misleading

When I remove the dummy variables the function runs fine.
#remove State variable

Desc <- chlDescr[,-c(1)]
lmProfile <- rfe(Desc, y, sizes = subsets, rfeControl = ctrl) Returns:
Recursive feature selection

Outer resamping method was 10 iterations of cross-validation.

Resampling performance over subset size:

 Variables RMSE Rsquared RMSESD RsquaredSD Selected

         1 0.2462   0.7454 0.09529    0.17362         
         2 0.2408   0.7680 0.07860    0.12543         
         3 0.2134   0.8285 0.06649    0.09043         
         4 0.2011   0.8609 0.03463    0.05928        *
         5 0.2019   0.8622 0.03421    0.05675         
         6 0.2019   0.8622 0.03421    0.05675         


Can lmFuncs handle dummy variables? How does it need to be modified so it can?

I'm new at this so any help would be appreciated, thanks. Reni
http://r.789695.n4.nabble.com/file/n3487861/chl.csv chl.csv http://r.789695.n4.nabble.com/file/n3487861/chlDescr.csv chlDescr.csv

--
View this message in context: http://r.789695.n4.nabble.com/Dummy-variables-using-rfe-in-caret-for-variable-selection-tp3487861p3487861.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 05 May 2011 - 06:25:08 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 05 May 2011 - 07:00:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive