Re: [Rd] crossvalidation in svm regression in e1071 gives incorre ct results (PR#8554)

From: Liaw, Andy <andy_liaw_at_merck.com>
Date: Thu 02 Feb 2006 - 16:28:40 GMT

  1. This is _not_ a bug in R itself. Please don't use R's bug reporting system for contributed packages.
  2. This is _not_ a bug in svm() in `e1071'. I believe you forgot to take sqrt.
  3. You really should use the `tot.MSE' component rather than the mean of the `MSE' component, but this is only a very small difference.

So, instead of spread[i] <- mean(mysvm$MSE), you should have spread[i] <- sqrt(mysvm$tot.MSE). I get:

> spread <- rep(0,20)
> for (i in 1:20) {

+ spread[i] <- svm(y ~ x,data, cross=10)$tot.MSE + }
> summary(sqrt(spread[i]))

   Min. 1st Qu. Median Mean 3rd Qu. Max.  0.2679 0.2679 0.2679 0.2679 0.2679 0.2679

Andy

From: no228@cam.ac.uk
>
> Full_Name: Noel O'Boyle
> Version: 2.1.0
> OS: Debian GNU/Linux Sarge
> Submission from: (NULL) (131.111.8.96)
>
>
> (1) Description of error
>
> The 10-fold CV option for the svm function in e1071 appears
> to give incorrect
> results for the rmse.
>
> The example code in (3) uses the example regression data in the svm
> documentation. The rmse for internal prediction is 0.24. It
> is expected the
> 10-fold CV rmse should be bigger, but the result obtained
> using the "cross=10"
> option is 0.07. When the 10-fold CV is conducted either 'by
> hand' (not shown
> below) or using the errorest function in ipred (shown below)
> the answer is
> closer to 0.27, a more reasonable value.
>
> (2) Description of system
>
> I'm using the Debian Sarge version of R:
> R : Copyright 2005, The R Foundation for Statistical Computing
> Version 2.1.0 (2005-04-18), ISBN 3-900051-07-0
>
> svm is in the e1071 package from CRAN:
> Version: 1.5-11
> Date: 2005-09-19
>
> (3) Example code illustrating the problem
>
> library(e1071)
>
> set.seed(42)
> # create data
> x <- seq(0.1, 5, by = 0.05)
> y <- log(x) + rnorm(x, sd = 0.2)
> data <- as.data.frame(cbind(y,x))
>
> # estimate model and predict input values
> mysvm <- svm(y ~ x,data)
> result <- predict(mysvm, data)
> (rmse <- sqrt(mean((result-data[,1])**2)))
> # 0.2390489
>
> # built-in 10-fold CV estimate of prediction error
> spread <- rep(0,20)
> for (i in 1:20) {
> mysvm <- svm(y ~ x,data,cross=10)
> spread[i] <- mean(mysvm$MSE)
> }
> summary(spread)
> # Min. 1st Qu. Median Mean 3rd Qu. Max.
> # 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or
> something similar)
>
> # 10-fold CV using errorest
> library(ipred)
> mysvm <- function(formula,data) {
> model <- svm(formula,data)
> function(newdata) predict(model,newdata)
> }
> spread <- rep(0,20)
> for (i in 1:20) {
> spread[i] <- errorest(y ~ x, data, model=mysvm)$error
> }
> summary(spread)
> # Min. 1st Qu. Median Mean 3rd Qu. Max.
> # 0.2601 0.2649 0.2673 0.2696 0.2741 0.2927
>
>
> Regards,
> Noel O'Boyle.
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Feb 03 03:35:55 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Feb 2006 - 07:01:47 GMT