From: Trevor Wiens <twiens_at_interbaun.com>

Date: Fri 18 Mar 2005 - 15:59:44 EST

Date: Fri 18 Mar 2005 - 15:59:44 EST

This post is NOT a question, but an answer. For readers please disregard all earlier posts by myself about this question.

I'm posting for two reasons. First to say thanks, especially to Dimitris, for suggesting the use of errorest in the ipred library. Second, so that the solution to this problem is in the archives in case it gets asked again.

If one wants to run a k-fold cross-validation using specified folds, and get misclassification error and root mean squared error this is what you do.

Below is a script that will do this returning the results of two errorest results combined into a data frame. Now this script assumes that the variable you are going to fold with is an integer. If you want to have the predicted values or models returned from each fold, the calls to errorest, can be modified. Please see the help page for control.errorest on details.

T

-- Trevor Wiens twiens@interbaun.com ========================================================== myglm <- function(formula, family, data){ ret <- glm(formula, family=binomial, data=data) return(ret) } myfacpred <- function(object, newdata) { ret <- as.factor(ifelse(predict.glm(object, newdata, type='response') < 0.5, 0, 1)) return(ret) }Received on Fri Mar 18 16:04:04 2005

# logerrorest takes four arguments# mdata is a data frame holding the data to be modeled# form is the output of the is.formula function

# rvar is the response variable as a string, for example 'birdx'

# fvar is the fold variable, for example 'recordyear'

logerrorest <- function(mdata, form, rvar, fvar) { require(Hmisc) require(ipred)

# determine index of variables

rpos <- match(rvar, names(mdata)) fpos <- match(fvar, names(mdata))

# get fold values and count for each group

vardesc <- describe(mdata[[fpos]])$values fvarlist <- as.integer(dimnames(vardesc)[[2]]) k <- length(fvarlist) countlist <- vardesc[1,1] for (i in 2:k) { countlist[i] <- vardesc[1,i] } n <- length(mdata[[fpos]])

# get index list for each fold

cc <- list() for (i in 1:k) { cc[[i]] <- as.integer(rownames(mdata[mdata[[fpos]] == fvarlist[i], ])) }

# determine root mean squared error

ee <- errorest(form, mdata, estimator='cv', model=myglm, est.para=control.errorest(list.tindx = cc))

# determine misclassification error

# first convert to factor

width = length(mdata) response <- as.factor(as.integer(mdata[[rpos]])) newmatrix <- data.frame(cbind(mdata[1:(rpos-1)], response, mdata[(rpos+1):width])) newform <- as.formula(paste('response ~ ', as.character(form)[[3]])) me <- errorest(newform, newmatrix, estimator='cv', model=myglm, predict=myfacpred, est.para=control.errorest(list.tindx = cc)) ret <- data.frame(cbind(ee, me)) return(ret) } ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:30:51 EST
*