Date: Thu 02 Feb 2006 - 16:28:40 GMT

- This is _not_ a bug in R itself. Please don't use R's bug reporting system for contributed packages.
- This is _not_ a bug in svm() in `e1071'. I believe you forgot to take sqrt.
- You really should use the `tot.MSE' component rather than the mean of the `MSE' component, but this is only a very small difference.

So, instead of spread[i] <- mean(mysvm$MSE), you should have spread[i] <- sqrt(mysvm$tot.MSE). I get:

> spread <- rep(0,20)

> for (i in 1:20) {

+ spread[i] <- svm(y ~ x,data, cross=10)$tot.MSE
+ }

> summary(sqrt(spread[i]))

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2679 0.2679 0.2679 0.2679 0.2679 0.2679

Andy

Full_Name: Noel O'Boyle

(1) Description of error
**> The 10-fold CV option for the svm function in e1071 appears
**> to give incorrect
**> results for the rmse.
**>
**> The example code in (3) uses the example regression data in the svm
**> documentation. The rmse for internal prediction is 0.24. It
**> is expected the
**> 10-fold CV rmse should be bigger, but the result obtained
**> using the "cross=10"
**> option is 0.07. When the 10-fold CV is conducted either 'by
**> hand' (not shown
**> below) or using the errorest function in ipred (shown below)
**> the answer is
**> closer to 0.27, a more reasonable value.
(2) Description of system
(3) Example code illustrating the problem
**>
**> library(e1071)
**>
**> set.seed(42)
**> # create data
**> x <- seq(0.1, 5, by = 0.05)
**> y <- log(x) + rnorm(x, sd = 0.2)
**> data <- as.data.frame(cbind(y,x))
**> # estimate model and predict input values
**> mysvm <- svm(y ~ x,data)
**> result <- predict(mysvm, data)
**> (rmse <- sqrt(mean((result-data[,1])**2)))
**> # 0.2390489
**> # built-in 10-fold CV estimate of prediction error
**> spread <- rep(0,20)
**> for (i in 1:20) {
**> mysvm <- svm(y ~ x,data,cross=10)
**> spread[i] <- mean(mysvm$MSE)
**> }
**> summary(spread)
**> # Min. 1st Qu. Median Mean 3rd Qu. Max.
**> # 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or
**> something similar)
**> # 10-fold CV using errorest
**> library(ipred)
**> mysvm <- function(formula,data) {
**> model <- svm(formula,data)
**> function(newdata) predict(model,newdata)
**> }
**> spread <- rep(0,20)
**> for (i in 1:20) {
**> spread[i] <- errorest(y ~ x, data, model=mysvm)$error
**> }
**> summary(spread)
**> # Min. 1st Qu. Median Mean 3rd Qu. Max.
**> # 0.2601 0.2649 0.2673 0.2696 0.2741 0.2927
**> Regards,
**> Noel O'Boyle.
