From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>

Date: Thu, 29 May 2008 07:39:27 -0500

>> Xiaohui Chen wrote:

*>>> step or stepAIC functions do the job. You can opt
*

*>> to use BIC by changing
*

*>>> the mulplication of penalty.
*

*>>>
*

*>>> I think AIC and BIC are not only limited to
*

*>> compare two pre-defined
*

*>>> models, they can be used as model search criteria.
*

*>> You could enumerate
*

*>>> the information criteria for all possible models
*

*>> if the size of full
*

*>>> model is relatively small. But this is not
*

*>> generally scaled to practical
*

*>>> high-dimensional applications. Hence, it is often
*

*>> only possible to find
*

*>>> a 'best' model of a local optimum, e.g. measured
*

*>> by AIC/BIC.
*

*>>
*

*>> Sure you can use them that way, and they may perform
*

*>> better than other
*

*>> measures, but the resulting model will be highly
*

*>> biased (regression
*

*>> coefficients biased away from zero). AIC and BIC
*

*>> were not designed to
*

*>> be used in this fashion originally. Optimizing AIC
*

*>> or BIC will not
*

*>> produce well-calibrated models as does penalizing a
*

*>> large model.
*

*>>
*

*>>> On the other way around, I wouldn't like to say
*

*>> the over-penalization of
*

*>>> BIC. Instead, I think AIC is usually
*

*>> underpenalizing larger models in
*

*>>> terms of the positive probability of incoperating
*

*>> irrevalent variables
*

*>>> in linear models.
*

*>> If you put some constraints on the process (e.g., if
*

*>> using AIC to find
*

*>> the optimum penalty in penalized maximum likelihood
*

*>> estimation), AIC
*

*>> works very well and BIC results if far too much
*

*>> shrinkage
*

*>> (underfitting). If using a dangerous process such
*

*>> as stepwise variable
*

*>> selection, the more conservative BIC may be better
*

*>> in some sense, worse
*

*>> in others. The main problem with stepwise variable
*

*>> selection is the use
*

*>> of significance levels for entry below 1.0 and
*

*>> especially below 0.1.
*

*>>
*

*>> Frank
*

*>>
*

*>>> X
*

*>>>
*

*>>> Frank E Harrell Jr 写道:
*

*>>>> Smita Pakhale wrote:
*

*>>>>> Hi Maria,
*

*>>>>>
*

*>>>>> But why do you want to use forwards or backwards
*

*>>>>> methods? These all are 'backward' methods of
*

*>> modeling.
*

*>>>>> Try using AIC or BIC. BIC is much better than
*

*>> AIC.
*

*>>>>> And, you do not have to believe me or any one
*

*>> else on
*

*>>>>> this.
*

*>>>> How does that help? BIC gives too much
*

*>> penalization in certain
*

*>>>> contexts; both AIC and BIC were designed to
*

*>> compare two pre-specified
*

*>>>> models. They were not designed to fix problems of
*

*>> stepwise variable
*

*>>>> selection.
*

*>>>>
*

*>>>> Frank
*

*>>>>
*

*>>>>> Just make a small data set with a few variables
*

*>> with
*

*>>>>> known relationship amongst them. With this
*

*>> simulated
*

*>>>>> data set, use all your modeling methods:
*

*>> backwards,
*

*>>>>> forwards, AIC, BIC etc and then see which one
*

*>> gives
*

*>>>>> you a answer closest to the truth. The beauty of
*

*>> using
*

*>>>>> a simulated dataset is that, you 'know' the
*

*>> truth, as
*

*>>>>> you are the 'creater' of it!
*

*>>>>>
*

*>>>>> smita
*

*>>>>>
*

*>>>>> --- Charilaos Skiadas <cskiadas_at_gmail.com>
*

*>> wrote:
*

*>>>>>> A google search for "logistic regression with
*

*>>>>>> stepwise forward in r" returns the following
*

*>> post:
*

*>>>>>>
*

*>>>>>> about a few things.
*

*>>>>>>> I have only worked in Matlab because I wanted
*

*>> to
*

*>>>>>> do a logistic
*

*>>>>>>> regression. However Matlab does not do
*

*>> logistic
*

*>>>>>> regression with
*

*>>>>>>> stepwiseforward method. Therefore I thought
*

*>> about
*

*>>>>>> testing R. So my
*

*>>>>>>> question is
*

*>>>>>>> can I do logistic regression with stepwise
*

*>> forward
*

*>>>>>> in R?
*

*>>>>>>> Thanks /M
*

*>>>>>> ______________________________________________
*

*>>>
*

*>>
*

*>> --
*

*>> Frank E Harrell Jr Professor and Chair
*

*>> School of Medicine
*

*>> Department of Biostatistics
*

*>> Vanderbilt University
*

*>>
*

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 29 May 2008 - 14:00:29 GMT

Date: Thu, 29 May 2008 07:39:27 -0500

Smita Pakhale wrote:

> Using any 'significance level', I think is the main > problem in the stepwise variable selection method. As > such in 'normal' circumstances the interpretation of > p-value is topsy-turvy. Then you can only imagine as > to what happens to this p-value interpretation in this > process of variable selection...you no longer no, what > does the significance level mean, if at all anything? > smita

True, and AIC/BIC are just translations of P-values.

Frank

> > --- Frank E Harrell Jr <f.harrell_at_vanderbilt.edu> > wrote: >

>> Xiaohui Chen wrote:

> https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

>>>>>> Haris Skiadas

>>>>>> Department of Mathematics and Computer Science>>>>>> Hanover College>>>>>>>>>>>> On May 28, 2008, at 7:01 AM, Maria wrote:>>>>>>>>>>>>> Hello,>>>>>>> I am just about to install R and was wondering

> > > > > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt UniversityR-help_at_r-project.org mailing list______________________________________________

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 29 May 2008 - 14:00:29 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 15:00:41 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*