# Re: [R] can I do this with R?

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Thu, 29 May 2008 07:39:27 -0500

Smita Pakhale wrote:

```> Using any 'significance level', I think is the main
> problem in the stepwise variable selection method. As
> such in 'normal' circumstances the interpretation of
> p-value is topsy-turvy. Then you can only imagine as
> to what happens to this p-value interpretation in this
> process of variable selection...you no longer no, what
> does the significance level mean, if at all anything?
> smita

```

True, and AIC/BIC are just translations of P-values.

Frank

```>
> --- Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
> wrote:
>
```

>> Xiaohui Chen wrote:
>>> step or stepAIC functions do the job. You can opt
>> to use BIC by changing
>>> the mulplication of penalty.
>>>
>>> I think AIC and BIC are not only limited to
>> compare two pre-defined
>>> models, they can be used as model search criteria.
>> You could enumerate
>>> the information criteria for all possible models
>> if the size of full
>>> model is relatively small. But this is not
>> generally scaled to practical
>>> high-dimensional applications. Hence, it is often
>> only possible to find
>>> a 'best' model of a local optimum, e.g. measured
>> by AIC/BIC.
>>
>> Sure you can use them that way, and they may perform
>> better than other
>> measures, but the resulting model will be highly
>> biased (regression
>> coefficients biased away from zero). AIC and BIC
>> were not designed to
>> be used in this fashion originally. Optimizing AIC
>> or BIC will not
>> produce well-calibrated models as does penalizing a
>> large model.
>>
>>> On the other way around, I wouldn't like to say
>> the over-penalization of
>>> BIC. Instead, I think AIC is usually
>> underpenalizing larger models in
>>> terms of the positive probability of incoperating
>> irrevalent variables
>>> in linear models.
>> If you put some constraints on the process (e.g., if
>> using AIC to find
>> the optimum penalty in penalized maximum likelihood
>> estimation), AIC
>> works very well and BIC results if far too much
>> shrinkage
>> (underfitting). If using a dangerous process such
>> as stepwise variable
>> selection, the more conservative BIC may be better
>> in some sense, worse
>> in others. The main problem with stepwise variable
>> selection is the use
>> of significance levels for entry below 1.0 and
>> especially below 0.1.
>>
>> Frank
>>
>>> X
>>>
>>> Frank E Harrell Jr 写道:
>>>> Smita Pakhale wrote:
>>>>> Hi Maria,
>>>>>
>>>>> But why do you want to use forwards or backwards
>>>>> methods? These all are 'backward' methods of
>> modeling.
>>>>> Try using AIC or BIC. BIC is much better than
>> AIC.
>>>>> And, you do not have to believe me or any one
>> else on
>>>>> this.
>>>> How does that help? BIC gives too much
>> penalization in certain
>>>> contexts; both AIC and BIC were designed to
>> compare two pre-specified
>>>> models. They were not designed to fix problems of
>> stepwise variable
>>>> selection.
>>>>
>>>> Frank
>>>>
>>>>> Just make a small data set with a few variables
>> with
>>>>> known relationship amongst them. With this
>> simulated
>>>>> data set, use all your modeling methods:
>> backwards,
>>>>> forwards, AIC, BIC etc and then see which one
>> gives
>>>>> you a answer closest to the truth. The beauty of
>> using
>>>>> a simulated dataset is that, you 'know' the
>> truth, as
>>>>> you are the 'creater' of it!
>>>>>
>>>>> smita
>>>>>
>> wrote:
>>>>>> A google search for "logistic regression with
>>>>>> stepwise forward in r" returns the following
>> post:
>>>>>>
```> https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

>>>>>> Department of Mathematics and Computer Science
>>>>>> Hanover College
>>>>>>
>>>>>> On May 28, 2008, at 7:01 AM, Maria wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> I am just about to install R and was wondering
```
>>>>>>> I have only worked in Matlab because I wanted
>> to
>>>>>> do a logistic
>>>>>>> regression. However Matlab does not do
>> logistic
>>>>>> regression with
>>>>>>> stepwiseforward method. Therefore I thought
>>>>>> testing R. So my
>>>>>>> question is
>>>>>>> can I do logistic regression with stepwise
>> forward
>>>>>> in R?
>>>>>>> Thanks /M
>>>>>> ______________________________________________
>>>
>>
>> --
>> Frank E Harrell Jr Professor and Chair
>> School of Medicine
>> Department of Biostatistics
>> Vanderbilt University
>>
```>
>
>
>
>

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
Department of Biostatistics   Vanderbilt University

______________________________________________
```
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 29 May 2008 - 14:00:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 15:00:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.