[R] coxpath() in package glmpath

From: array chip <arrayprofile_at_yahoo.com>
Date: Sat, 01 Mar 2008 16:32:02 -0800 (PST)


Hi,

I am new to model selection by coefficient shrinkage method such as lasso. And I became particularly interested in variable selection in Cox regression by lasso. I became aware of the coxpath() in R package glmpath does lasso on Cox model. I have tried the sample script on the help page of coxpath(), but I have difficult time understanding the output. Therefore, I would greatly appreciate if anyone can help me understand how to use the function.

> data(lung.data)
> attach(lung.data)
> fit.a <- coxpath(lung.data)
> print(fit.a)

Call:
coxpath(data = lung.data)

Step 1 :  karno
Step 2 :  celltype
Step 5 :  trt
Step 6 :  prior
Step 7 :  age
Step 8 :  diagtime

> summary(fit.a)

Call:
coxpath(data = lung.data)

       Df Log.p.lik AIC BIC

Step 1  0 -505.8840 1011.7679 1011.7679
Step 2  1 -486.0691  974.1382  977.0581
Step 5  2 -484.8520  973.7040  979.5440
Step 6  3 -483.4018  972.8036  981.5636
Step 7  4 -483.3801  974.7602  986.4401
Step 8  5 -483.2287  976.4573  991.0572
Step 9  6 -483.1112  978.2224  995.7423

first of all, why the number of steps between the above 2 outputs are different? I confirmed with coxph() that the numbers (log.p.lik, AIC, BIC) on the 1st row of summary(fit.a) are from a NULL Cox model, i.e. a model with only an intercept. Then how Step 1 in
the output of summary(fit.a) is corresponding to "Step 1" in the output of print(fit.a) where it seems to mean a model with the variable "karno"?

>predict(fit.a)

    trt celltype karno diagtime age prior

1 0.0000 0.0000  0.0000 0.000e+00  0.000e+00 0.000e+00
2 0.0000 0.0076 -0.0256 0.000e+00  0.000e+00 0.000e+00
5 0.0000 0.0450 -0.0286 0.000e+00  0.000e+00 0.000e+00
6 0.1428 0.1033 -0.0330 0.000e+00  0.000e+00
-4.326e-05
7 0.1468 0.1048 -0.0332 0.000e+00 -1.043e-07 -3.506e-04
8 0.1755 0.1139 -0.0340 5.642e-06 -1.404e-03 -2.367e-03
attr(,"s")
[1] 1 2 5 6 7 8
attr(,"fraction")
[1] 0.000 0.125 0.500 0.625 0.750 0.875
attr(,"mode")
[1] "step"

Second, if we compare the output of print(fit.a) and predict(fit.a), I can see some discrepancies. For example, "Step 1" of print(fit.a) was variable "karno", however, predict(fit.a) showed that the coefficient of "karno" was still 0. The same went with variable "trt" in "Step 5". What is the meaning of the discrepancies? I think I probably misunderstand the whole meaning of coefficient shrinkage in the first place. So I would appreciate if anyone can shed some lights.

I would also like to have any opinion on how I should do variable selection from these output? Should I rely on the table (log.p.lik, aic, bic) from summary fit.a) , or should I rely on the coefficients table from print(fit.a) to eliminate those variables with 0 coefficients at certain step?

Thank you very much for your time.



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 02 Mar 2008 - 00:35:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 02 Mar 2008 - 02:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive