From: B <khosoda_at_med.kobe-u.ac.jp>

Date: Fri, 25 Mar 2011 22:04:22 +0900

[1] 1 5 9 10 16

Active.Coefficients.fit1

[1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926

[1] 1 5 9 10 16

Active.Coefficients.fit2

[1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 Mar 2011 - 13:11:07 GMT

Date: Fri, 25 Mar 2011 22:04:22 +0900

Hi,

I am trying to do logistic regression for data of 104 patients, which
have one outcome (yes or no) and 15 variables (9 categorical factors

[yes or no] and 6 continuous variables). Number of yes outcome is 25.

Twenty-five events and 15 variables mean events per variable is much
less than 10. Therefore, I tried to analyze the data with penalized
regression method. I would like please some of the experts here to help me.

First of all, I standardized all 6 continuous variables by scale() with center=TRUE and scale=TRUE option. Nine categorical variables and one outcome variable were re-coded as 0 or 1. Then, I used glmnet with standardize=FALSE option because of presence of categorical variables.

x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15),
104, 15)

y <- outcome

library(glmnet)

fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE)
fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE)

default alpha=1, so this should be lasso penalty.

Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min)

Active.Index.fit1 <- which(Coefficients.fit1 !=0) Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1] Active.Index.fit1

[1] 1 5 9 10 16

Active.Coefficients.fit1

[1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926

My optimal model chose 5 active covariates including intercept as first one.

Second, I did the same things with alpha=0.5 option to do elastic net analysis.

fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5)
fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE,
alpha=0.5)

Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min)

Active.Index.fit2 <- which(Coefficients.fit2 !=0) Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2] Active.Index.fit2

[1] 1 5 9 10 16

Active.Coefficients.fit2

[1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459

This model chose the same 5 active covariates as first one with lasso penalty.

My questions are followings;

1. Am I doing it correctly or not?

2. Which model, I mean lasso or elastic net, should be selected? and
why? Both models chose the same variables but different coefficient values.
3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can
you calculate 95% confidence interval of odds ratio?
Or 95%CI is meaningless in this kind of analysis?

I would appreciate your help in advance. KH

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 Mar 2011 - 13:11:07 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 26 Mar 2011 - 15:20:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*