[R] A question on glmnet analysis

From: B <khosoda_at_med.kobe-u.ac.jp>
Date: Fri, 25 Mar 2011 22:04:22 +0900

I am trying to do logistic regression for data of 104 patients, which have one outcome (yes or no) and 15 variables (9 categorical factors
[yes or no] and 6 continuous variables). Number of yes outcome is 25.
Twenty-five events and 15 variables mean events per variable is much less than 10. Therefore, I tried to analyze the data with penalized regression method. I would like please some of the experts here to help me.

First of all, I standardized all 6 continuous variables by scale() with center=TRUE and scale=TRUE option. Nine categorical variables and one outcome variable were re-coded as 0 or 1. Then, I used glmnet with standardize=FALSE option because of presence of categorical variables.

x15std <- matrix(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15), 104, 15)
y <- outcome
fit.1 <- glmnet(x15std, y, family="binomial", standardize=FALSE) fit.1cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE)

default alpha=1, so this should be lasso penalty.

Coefficients.fit1 <- coef(fit1, s=fit1.cv$lambda.min)

Active.Index.fit1 <- which(Coefficients.fit1 !=0)
Active.Coefficients.fit1 <- Coefficients.fit1[Active.Index.fit1]

[1] 1 5 9 10 16
[1] -1.28774827 0.01420395 0.70444865 -0.27726625 0.18455926

My optimal model chose 5 active covariates including intercept as first one.

Second, I did the same things with alpha=0.5 option to do elastic net analysis.

fit.2 <- glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5) fit.2cv <- cv.glmnet(x15std, y, family="binomial", standardize=FALSE, alpha=0.5)
Coefficients.fit2 <- coef(fit2, s=fit2.cv$lambda.min)

Active.Index.fit2 <- which(Coefficients.fit2 !=0)
Active.Coefficients.fit2 <- Coefficients.fit2[Active.Index.fit2]

[1] 1 5 9 10 16
[1] -1.3286190 0.1410739 0.6315108 -0.2668022 0.2292459

This model chose the same 5 active covariates as first one with lasso penalty.

My questions are followings;
1. Am I doing it correctly or not?
2. Which model, I mean lasso or elastic net, should be selected? and why? Both models chose the same variables but different coefficient values. 3. Is it O.K. to calculate odds ratio by exp(coefficients)? And how can you calculate 95% confidence interval of odds ratio? Or 95%CI is meaningless in this kind of analysis?

I would appreciate your help in advance. KH

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 25 Mar 2011 - 13:11:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 26 Mar 2011 - 15:20:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive