# [Rd] crossvalidation in svm regression in e1071 gives incorrect results (PR#8554)

From: <no228_at_cam.ac.uk>
Date: Thu 02 Feb 2006 - 15:28:25 GMT

Full_Name: Noel O'Boyle
Version: 2.1.0
OS: Debian GNU/Linux Sarge
Submission from: (NULL) (131.111.8.96)

(1) Description of error

The 10-fold CV option for the svm function in e1071 appears to give incorrect results for the rmse.

The example code in (3) uses the example regression data in the svm documentation. The rmse for internal prediction is 0.24. It is expected the 10-fold CV rmse should be bigger, but the result obtained using the "cross=10" option is 0.07. When the 10-fold CV is conducted either 'by hand' (not shown below) or using the errorest function in ipred (shown below) the answer is closer to 0.27, a more reasonable value.

(2) Description of system

I'm using the Debian Sarge version of R:

R : Copyright 2005, The R Foundation for Statistical Computing    Version 2.1.0 (2005-04-18), ISBN 3-900051-07-0

svm is in the e1071 package from CRAN:

Version: 1.5-11
Date: 2005-09-19

(3) Example code illustrating the problem

library(e1071)

set.seed(42)
# create data

x <- seq(0.1, 5, by = 0.05)
y <- log(x) + rnorm(x, sd = 0.2)
data <- as.data.frame(cbind(y,x))

# estimate model and predict input values
mysvm <- svm(y ~ x,data)
result <- predict(mysvm, data)
(rmse <- sqrt(mean((result-data[,1])**2)))
# 0.2390489

# built-in 10-fold CV estimate of prediction error
for (i in 1:20) {

mysvm <- svm(y ~ x,data,cross=10)
}
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or something similar)

# 10-fold CV using errorest

library(ipred)
mysvm <- function(formula,data) {
model <- svm(formula,data)
function(newdata) predict(model,newdata)   }
for (i in 1:20) {
spread[i] <- errorest(y ~ x, data, model=mysvm)\$error }