Re: [R] logistic regression model + Cross-Validation

From: Weiwei Shi <helprhelp_at_gmail.com>
Date: Tue 23 Jan 2007 - 01:30:31 GMT

why not use lda{MASS} and it has cv=T option; it does "loo", though. Or use randomForest.

if you have to use lrm, then the following code might help:

n.fold <- 5 # 5-fold cv
n.sample <- 50 # assumed 50 samples
s <- sample(1:n.fold, size=n.sample, replace=T) for (i in 1:n.fold){
  # create your training data and validation data for each fold   trn <- YOURWHOLEDATAFRAME[s!=i,]
  val <- YOURWHOLEDATAFRAME[s==i,]
  # now do your own modeling using lrm
  # todo
}

HTH, weiwei

On 1/21/07, nitin jindal <nitin.jindal@gmail.com> wrote:
> If validate.lrm does not has this option, do any other function has it.
> I will certainly look into your advice on cross validation. Thnx.
>
> nitin
>
> On 1/21/07, Frank E Harrell Jr <f.harrell@vanderbilt.edu> wrote:
> >
> > nitin jindal wrote:
> > > Hi,
> > >
> > > I am trying to cross-validate a logistic regression model.
> > > I am using logistic regression model (lrm) of package Design.
> > >
> > > f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
> > > val <- validate.lrm(f, method="cross", B=5)
> >
> > val <- validate(f, ...) # .lrm not needed
> >
> > >
> > > My class cy has values 0 and 1.
> > >
> > > "val" variable will give me indicators like slope and AUC. But, I also
> > need
> > > the vector of predicted values of class variable "cy" for each record
> > while
> > > cross-validation, so that I can manually look at the results. So, is
> > there
> > > any way to get those probabilities assigned to each class.
> > >
> > > regards,
> > > Nitin
> >
> > No, validate.lrm does not have that option. Manually looking at the
> > results will not be easy when you do enough cross-validations. A single
> > 5-fold cross-validation does not provide accurate estimates. Either use
> > the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
> > k is often 10 but the optimum value may not be 10. Code for averaging
> > repeated cross-validations is in
> > http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
> > along with simulations of bootstrap vs. a few cross-validation methods
> > for binary logistic models.
> >
> > Frank
> > --
> > Frank E Harrell Jr Professor and Chair School of Medicine
> > Department of Biostatistics Vanderbilt University
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jan 23 12:37:48 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 23 Jan 2007 - 03:30:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.