Re: [R] can I do this with R?

From: Xiaohui Chen <>
Date: Wed, 28 May 2008 16:30:03 -0700

Andrew Robinson 写道:
> On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:

>> Frank E Harrell Jr ??????:
>>> Xiaohui Chen wrote:
>>>> step or stepAIC functions do the job. You can opt to use BIC by 
>>>> changing the mulplication of penalty.
>>>> I think AIC and BIC are not only limited to compare two pre-defined 
>>>> models, they can be used as model search criteria. You could 
>>>> enumerate the information criteria for all possible models if the 
>>>> size of full model is relatively small. But this is not generally 
>>>> scaled to practical high-dimensional applications. Hence, it is often 
>>>> only possible to find a 'best' model of a local optimum, e.g. 
>>>> measured by AIC/BIC.
>>> Sure you can use them that way, and they may perform better than other 
>>> measures, but the resulting model will be highly biased (regression 
>>> coefficients biased away from zero).  AIC and BIC were not designed to 
>>> be used in this fashion originally.  Optimizing AIC or BIC will not 
>>> produce well-calibrated models as does penalizing a large model.
>> Sure, I agree with this point. AIC is used to correct the bias from the 
>> estimations which minimize the KL distance of true model, provided the 
>> assumed model family contains the true model. BIC is designed for 
>> approximating the model marginal likelihood. Those are all 
>> post-selection estimating methods. For simutaneous variable selection 
>> and estimation, there are better penalizations like L1 penalty, which is 
>> much better than AIC/BIC in terms of consistency.

> Xiaohui,

> Tibshirani (1996) suggests that the quality of the L1 penalty depends
> on the structure of the dataset. As I recall, subset selection was
> preferred for finding a small number of large effects, lasso (L1) for
> finding a small to moderate number of moderate-sized effects, and
> ridge (L2) for many small effects.

I agree with you. Higher correlation between covariates makes the LASSO harder to choose the correct model asymptotically, see Zhao and Yu (2006). Subset selection based on prediction error tends to inflate the estimated variance of coefficients in linear models. L2 doesn't do the variable selection job as well known. But (convex) mixing L1 and L2 penalty is the elastic net proposed by Zou and Hastie (2006), which encourages the grouped effect. More recently, there are many other priors/penalties proposed if you go through the literature.

Zhao P. & Yu B. (2006) On Model Selection Consistency of Lasso. JMLR Zou H. and Hastie T. (2006) Regularization and variable selection via the elastic net. JRSSB
> Can you provide any references to more up-to-date simulations that you
> would recommend?


> Cheers,

> Andrew

> mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Wed 28 May 2008 - 23:37:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 29 May 2008 - 01:30:49 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive