Re: [R] RandomForest vs. bayes & svm classification performance

From: roger bos <roger.bos_at_gmail.com>
Date: Tue 25 Jul 2006 - 06:14:33 EST

I can't add much to your question, being a complete novice at classification, but I have tried both randomForest and SVM and I get better results from randomForest than SVM (even after tuning). randomForest is also much, much faster. I just thought randomForest was a much better algorithm, although I was wondering in the back of my head if I made a mistake. I am not sure that giving the call allows anyone to say if a mistake is being made as there are many places in the code where something could go wrong.

I hear SVM is used for very comlicated things like facial recognition, so I wonder why it can't do better on my data set, but I have a limited amount of time for testing. It was interesting to hear your results.

Thanks,

Roger

On 7/24/06, Eleni Rapsomaniki <e.rapsomaniki@mail.cryst.bbk.ac.uk> wrote:
>
> Hi
>
> This is a question regarding classification performance using different
> methods.
> So far I've tried NaiveBayes (klaR package), svm (e1071) package and
> randomForest (randomForest). What has puzzled me is that randomForest
> seems to
> perform far better (32% classification error) than svm and NaiveBayes,
> which
> have similar classification errors (45%, 48% respectively). A similar
> difference in performance is observed with different combinations of
> parameters, priors and size of training data.
>
> Because I was expecting to see little difference in the perfomance of
> these
> methods I am worried that I may have made a mistake in my randomForest
> call:
>
> my.rf=randomForest(x=train.df[,-response_index], y=train.df
> [,response_index],
> xtest=test.df[,-response_index], ytest=test.df[,response_index],
> importance=TRUE,proximity=FALSE, keep.forest=FALSE)
>
> (where train.df and test.df are my train and test data.frames and
> response_index
> is the column number specifiying the class)
>
> My main question is: could there be a legitimate reason why random forest
> would
> outperform the other two models (e.g. maybe one
> method is more reliable with Gaussian data, handles categorical data
> better etc)? Also, is there a way of evaluating the predictive ability of
> each
> parameter in the bayesian model as it can be done for random Forests
> (through
> the importance table)?
>
> I would appreciate any of your comments and suggestions on these.
>
> Many thanks
> Eleni Rapsomaniki
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jul 25 06:21:03 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 08:20:02 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.