From: Liaw, Andy <andy_liaw_at_merck.com>

Date: Thu 02 Feb 2006 - 16:28:40 GMT

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Feb 03 03:35:55 2006

Date: Thu 02 Feb 2006 - 16:28:40 GMT

- This is _not_ a bug in R itself. Please don't use R's bug reporting system for contributed packages.
- This is _not_ a bug in svm() in `e1071'. I believe you forgot to take sqrt.
- You really should use the `tot.MSE' component rather than the mean of the `MSE' component, but this is only a very small difference.

So, instead of spread[i] <- mean(mysvm$MSE), you should have spread[i] <- sqrt(mysvm$tot.MSE). I get:

> spread <- rep(0,20)

> for (i in 1:20) {

+ spread[i] <- svm(y ~ x,data, cross=10)$tot.MSE
+ }

> summary(sqrt(spread[i]))

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2679 0.2679 0.2679 0.2679 0.2679 0.2679

Andy

From: no228@cam.ac.uk

*>
*

> Full_Name: Noel O'Boyle

*> Version: 2.1.0
**> OS: Debian GNU/Linux Sarge
**> Submission from: (NULL) (131.111.8.96)
**>
**>
**> (1) Description of error
**>
**> The 10-fold CV option for the svm function in e1071 appears
**> to give incorrect
**> results for the rmse.
**>
**> The example code in (3) uses the example regression data in the svm
**> documentation. The rmse for internal prediction is 0.24. It
**> is expected the
**> 10-fold CV rmse should be bigger, but the result obtained
**> using the "cross=10"
**> option is 0.07. When the 10-fold CV is conducted either 'by
**> hand' (not shown
**> below) or using the errorest function in ipred (shown below)
**> the answer is
**> closer to 0.27, a more reasonable value.
**>
**> (2) Description of system
**>
**> I'm using the Debian Sarge version of R:
**> R : Copyright 2005, The R Foundation for Statistical Computing
**> Version 2.1.0 (2005-04-18), ISBN 3-900051-07-0
**>
**> svm is in the e1071 package from CRAN:
**> Version: 1.5-11
**> Date: 2005-09-19
**>
**> (3) Example code illustrating the problem
**>
**> library(e1071)
**>
**> set.seed(42)
**> # create data
**> x <- seq(0.1, 5, by = 0.05)
**> y <- log(x) + rnorm(x, sd = 0.2)
**> data <- as.data.frame(cbind(y,x))
**>
**> # estimate model and predict input values
**> mysvm <- svm(y ~ x,data)
**> result <- predict(mysvm, data)
**> (rmse <- sqrt(mean((result-data[,1])**2)))
**> # 0.2390489
**>
**> # built-in 10-fold CV estimate of prediction error
**> spread <- rep(0,20)
**> for (i in 1:20) {
**> mysvm <- svm(y ~ x,data,cross=10)
**> spread[i] <- mean(mysvm$MSE)
**> }
**> summary(spread)
**> # Min. 1st Qu. Median Mean 3rd Qu. Max.
**> # 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or
**> something similar)
**>
**> # 10-fold CV using errorest
**> library(ipred)
**> mysvm <- function(formula,data) {
**> model <- svm(formula,data)
**> function(newdata) predict(model,newdata)
**> }
**> spread <- rep(0,20)
**> for (i in 1:20) {
**> spread[i] <- errorest(y ~ x, data, model=mysvm)$error
**> }
**> summary(spread)
**> # Min. 1st Qu. Median Mean 3rd Qu. Max.
**> # 0.2601 0.2649 0.2673 0.2696 0.2741 0.2927
**>
**>
**> Regards,
**> Noel O'Boyle.
**>
**> ______________________________________________
**> R-devel@r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
*

>

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Feb 03 03:35:55 2006

*
This archive was generated by hypermail 2.1.8
: Fri 03 Feb 2006 - 07:01:47 GMT
*