Re: [R] BMA, logistic regression, odds ratio, model reduction etc

From: Frank Harrell <f.harrell_at_vanderbilt.edu>
Date: Wed, 20 Apr 2011 16:00:53 -0700 (PDT)

I think it's OK. You can also use the Hmisc package's varclus function. Frank

細田弘吉 wrote:

> 
> Dear Prof. Harrel,
> 
> Thank you very much for your quick advice.
> I will try rms package.
> 
> Regarding model reduction, is my model 2 method (clustering and recoding 
> that are blinded to the outcome) permissible?
> 
> Sincerely,
> 
> --
> KH
> 
> (11/04/20 22:01), Frank Harrell wrote:

>> Deleting variables is a bad idea unless you make that a formal part of
>> the
>> BMA so that the attempt to delete variables is penalized for. Instead of
>> BMA I recommend simple penalized maximum likelihood estimation (see the
>> lrm
>> function in the rms package) or pre-modeling data reduction that is
>> blinded
>> to the outcome variable.
>> Frank
>>
>>
>> 細田弘吉 wrote:
>>>
>>> Hi everybody,
>>> I apologize for long mail in advance.
>>>
>>> I have data of 104 patients, which consists of 15 explanatory variables
>>> and one binary outcome (poor/good). The outcome consists of 25 poor
>>> results and 79 good results. I tried to analyze the data with logistic
>>> regression. However, the 15 variables and 25 events means events per
>>> variable (EPV) is much less than 10 (rule of thumb). Therefore, I used R
>>> package, "BMA" to perform logistic regression with BMA to avoid this
>>> problem.
>>>
>>> model 1 (full model):
>>> x1, x2, x3, x4 are continuous variables and others are binary data.
>>>
>>>> x16.bic.glm<- bic.glm(outcome ~ ., data=x16.df,
>>> glm.family="binomial", OR20, strict=FALSE)
>>>> summary(x16.bic.glm)
>>> (The output below has been cut off at the right edge to save space)
>>>
>>> 62 models were selected
>>> Best 5 models (cumulative posterior probability = 0.3606 ):
>>>
>>> p!=0 EV SD model 1 model2
>>> Intercept 100 -5.1348545 1.652424 -4.4688 -5.15
>>> -5.1536
>>> age 3.3 0.0001634 0.007258 .
>>> sex 4.0
>>> .M -0.0243145 0.220314 .
>>> side 10.8
>>> .R 0.0811227 0.301233 .
>>> procedure 46.9 -0.5356894 0.685148 . -1.163
>>> symptom 3.8 -0.0099438 0.129690 . .
>>> stenosis 3.4 -0.0003343 0.005254 .
>>> x1 3.7 -0.0061451 0.144084 .
>>> x2 100.0 3.1707661 0.892034 3.2221 3.11
>>> x3 51.3 -0.4577885 0.551466 -0.9154 .
>>> HT 4.6
>>> .positive 0.0199299 0.161769 . .
>>> DM 3.3
>>> .positive -0.0019986 0.105910 . .
>>> IHD 3.5
>>> .positive 0.0077626 0.122593 . .
>>> smoking 9.1
>>> .positive 0.0611779 0.258402 . .
>>> hyperlipidemia 16.0
>>> .positive 0.1784293 0.512058 . .
>>> x4 8.2 0.0607398 0.267501 . .
>>>
>>>
>>> nVar 2 2
>>> 1 3 3
>>> BIC -376.9082
>>> -376.5588 -376.3094 -375.8468 -374.5582
>>> post prob 0.104
>>> 0.087 0.077 0.061 0.032
>>>
>>> [Question 1]
>>> Is it O.K to calculate odds ratio and its 95% confidence interval from
>>> "EV" (posterior distribution mean) and“SD”(posterior distribution
>>> standard deviation)?
>>> For example, 95%CI of EV of x2 can be calculated as;
>>>> exp(3.1707661)
>>> [1] 23.82573 -----> odds ratio
>>>> exp(3.1707661+1.96*0.892034)
>>> [1] 136.8866
>>>> exp(3.1707661-1.96*0.892034)
>>> [1] 4.146976
>>> ------------------> 95%CI (4.1 to 136.9)
>>> Is this O.K.?
>>>
>>> [Question 2]
>>> Is it permissible to delete variables with small value of "p!=0" and
>>> "EV", such as age (3.3% and 0.0001634) to reduce the number of
>>> explanatory variables and reconstruct new model without those variables
>>> for new session of BMA?
>>>
>>> model 2 (reduced model):
>>> I used R package, "pvclust", to reduce the model. The result suggested
>>> x1, x2 and x4 belonged to the same cluster, so I picked up only x2.
>>> Based on the subject knowledge, I made a simple unweighted sum, by
>>> counting the number of clinical features. For 9 features (sex, side,
>>> HT2, hyperlipidemia, DM, IHD, smoking, symptom, age), the sum ranges
>>> from 0 to 9. This score was defined as ClinicalScore. Consequently, I
>>> made up new data set (x6.df), which consists of 5 variables (stenosis,
>>> x2, x3, procedure, and ClinicalScore) and one binary outcome
>>> (poor/good). Then, for alternative BMA session...
>>>
>>>> BMAx6.glm<- bic.glm(postopDWI_HI ~ ., data=x6.df,
>>> glm.family="binomial", OR=20, strict=FALSE)
>>>> summary(BMAx6.glm)
>>> (The output below has been cut off at the right edge to save space)
>>> Call:
>>> bic.glm.formula(f = postopDWI_HI ~ ., data = x6.df, glm.family =
>>> "binomial", strict = FALSE, OR = 20)
>>>
>>>
>>> 13 models were selected
>>> Best 5 models (cumulative posterior probability = 0.7626 ):
>>>
>>> p!=0 EV SD model 1 model 2
>>> Intercept 100 -5.6918362 1.81220 -4.4688 -6.3166
>>> stenosis 8.1 -0.0008417 0.00815 . .
>>> x2 100.0 3.0606165 0.87765 3.2221 3.1154
>>> x3 46.5 -0.3998864 0.52688 -0.9154 .
>>> procedure 49.3 0.5747013 0.70164 . 1.1631
>>> ClinicalScore 27.1 0.0966633 0.19645 . .
>>>
>>>
>>> nVar 2 2 1
>>> 3 3
>>> BIC -376.9082 -376.5588
>>> -376.3094 -375.8468 -375.5025
>>> post prob 0.208 0.175
>>> 0.154 0.122 0.103
>>>
>>> [Question 3]
>>> Am I doing it correctly or not?
>>> I mean this kind of model reduction is permissible for BMA?
>>>
>>> [Question 4]
>>> I still have 5 variables, which violates the rule of thumb, "EPV> 10".
>>> Is it permissible to delete "stenosis" variable because of small value
>>> of "EV"? Or is it O.K. because this is BMA?
>>>
>>> Sorry for long post.
>>>
>>> I appreciate your help very much in advance.
>>>
>>> --
>>> KH
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> -----
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt University
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3462919.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> *************************************************
>  神戸大学大学院医学研究科 脳神経外科学分野
>  細田 弘吉
>  
>  〒650-0017 神戸市中央区楠町7丁目5-1
>      Phone: 078-382-5966
>      Fax  : 078-382-5979
>      E-mail address
>          Office: khosoda_at_med.kobe-u.ac.jp
> 	Home  : khosoda_at_venus.dti.ne.jp
> 
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/BMA-logistic-regression-odds-ratio-model-reduction-etc-tp3462416p3464392.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Wed 20 Apr 2011 - 23:03:05 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 21 Apr 2011 - 17:00:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive