Re: [R] [caret package] [trainControl] supplying predefined partitions to train with cross validation

From: Fabon Dzogang <fabon.dzogang_at_lip6.fr>
Date: Tue, 10 May 2011 21:09:59 +0200

Here is an answer from Max Khun thank you !

Fabon,

If I understand the problem, there are two ways of doing it. First, if you are using caret's trian(), rfe() or sbf(), if you set the seed right before you call the models, they end up using the same resampled data sets. (btw, if you use the resamples() function in caret, it checks for the same resampling indices)

If you want to manually fix the data sets, there is an example in section 5.2 of

 http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf

Using LGOCV. For 10-fold CV, you can use createFolds() with an additional argument:

> createFolds(1:10, returnTrain = TRUE)
$Fold01
[1] 2 3 4 5 6 7 8 9 10

$Fold02
[1] 1 3 4 5 6 7 8 9 10

$Fold03
[1] 1 2 4 5 6 7 8 9 10

$Fold04
[1] 1 2 3 5 6 7 8 9 10

$Fold05
[1] 1 2 3 4 6 7 8 9 10

$Fold06
[1] 1 2 3 4 5 7 8 9 10

$Fold07
[1] 1 2 3 4 5 6 8 9 10

$Fold08
[1] 1 2 3 4 5 6 7 9 10

$Fold09
[1] 1 2 3 4 5 6 7 8 10

$Fold10
[1] 1 2 3 4 5 6 7 8 9

For the trainControl() function, the index argument should be a list of samples indices for each resample. So if I give it the the above results of createFolds(), it will do 10-fold cv.

MAx

On Fri, May 6, 2011 at 12:32 PM, Fabon Dzogang <fabon.dzogang_at_lip6.fr> wrote:
> Hello,
>
> Thank you for your reply but I'm not sure your code answers my needs,
> from what I read it creates a 10-fold partition and then extracts the
> kth partition for future processing.
>
> My question was rather: once I have a 10-fold partition of my data,
> how to supply it to the "train" function of the caret package. Here's
> some sample code :
>
> folds <- createFolds(my_dataset_classes, 10)
>
> # I can't use index=folds on this one, it will train on the 1/k and test on k-1
> t_control <- trainControl(method="cv", number=10)
>
> # here I would like train to take account of my predefined folds
> model <- train(my_dataset_predictors, my_dataset_classes,
> method="svmLinear", trControl = t_control)
>
> Cheers,
> Fabon.
>
> On Fri, May 6, 2011 at 10:59 AM, neetika nath <nikkihathi_at_gmail.com> wrote:
>> Hi,
>> I did the similar experiment with my data. may be following code will give
>> you some idea. It might not be the best solution but for me it worked.
>> please do share if you get other idea.
>> Thank you
>> #### CODE###
>>
>> library(dismo)
>>
>> set.seed(111)
>>
>> dd<-read.delim("yourfile.csv",sep=",",header=T)
>>
>> # To keep a check on error
>>
>> options(error=utils::recover)
>>
>> # dd- data to be split for 10 Fold CV, this will split complete data into 10
>> fold
>>
>> number<-kfold(dd, k=10)
>>
>> case 1: if k ==1
>>
>> x<-NULL;
>>
>> #retrieve all the index (from your data) for 1st fold in x, such that you
>> can use it as a test set and remaining can be used as train set for #1st
>> iteration.
>>
>> x<-which(number==k)
>>
>> On Thu, May 5, 2011 at 11:43 PM, Fabon Dzogang <fabon.dzogang_at_lip6.fr>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I run R 2.11.1 under ubuntu 10.10 and caret version 2.88.
>>>
>>> I use the caret package to compare different models on a dataset. In
>>> order to compare their different performances I would like to use the
>>> same data partitions for every models. I understand that using a LGOCV
>>> or a boot type re-sampling method along with the "index" argument of
>>> the trainControl function, one is able to supply a training partition
>>> to the train function.
>>>
>>> However, I would like to apply a 10-fold cross validation to validate
>>> the models and I did not find any way to supply some predefined
>>> partition (created with createFolds) in this setting. Any help ?
>>>
>>> Thank you and great package by the way !
>>>
>>> Fabon Dzogang.
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Fabon Dzogang
>

-- 
Fabon Dzogang

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 10 May 2011 - 19:13:30 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 10 May 2011 - 20:20:06 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive