[R] Half Million features Selection (Random Forest)

From: daisy <daisy.wang_at_rtc.bosch.com>
Date: Sat 03 Jul 2004 - 05:31:32 EST


I have about half million binary features, and would like to find a model to estimate the continous response. According to the inference, I can express predictors and response by linear model. (ie. Design matrix: large sparse matrix with 0/1. Response: Continous number) Since it is not a classification problem, someone suggested me to try random forest in R. However, in the randomForest help page, it points out "For large data sets, especially those with large number of variables, calling 'randomForest' via the formula interface is not advised: There may be too much overhead in handling the formula." and I also gave a try on 300 variables and R either gave me error message or no response. (OS: Windows XP; R:1.9.0 ; RAM:512MB) Is there any way to implement random forest on this big dataset? Any suggestion is welcome! Many thanks!


        [[alternative HTML version deleted]]

R-help@stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 03 05:37:56 2004

This archive was generated by hypermail 2.1.8 : Fri 18 Mar 2005 - 09:25:21 EST