From: Trevor Hastie <hastie_at_stanford.edu>

Date: Sun, 04 Apr 2010 23:46:09 +0200

R-packages mailing list

R-packages_at_r-project.org

https://stat.ethz.ch/mailman/listinfo/r-packages Received on Mon 05 Apr 2010 - 07:49:41 EST

Date: Sun, 04 Apr 2010 23:46:09 +0200

glmnet_1.2 has been uploaded to CRAN.

This is a major upgrade, with the following additional features:

- poisson family, with dense or sparse x
- Cox proportional hazards family, for dense x
- wide range of cross-validation features. All models have several criteria for cross-validation. These include deviance, mean absolute error, misclassification error and "auc" for logistic or multinomial models. Observation weights are incorporated.
- offset is allowed in fitting the model

Here is the description of the package.

glmnet is a package that fits the regularization path for linear, two- and multi-class logistic regression models, poisson regression and the Cox model, with "elastic net" regularization (tunable mixture of L1 and L2 penalties). glmnet uses pathwise coordinate descent, and is very fast.

Some of the features of glmnet:

- by default it computes the path at 100 uniformly spaced (on the log scale) values of the regularization parameter
- glmnet appears to be faster than any of the packages that are freely available, in some cases by two orders of magnitude.
- recognizes and exploits sparse input matrices (ala Matrix package). Coefficient matrices are output in sparse matrix representation.
- penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and 1; a=0 is the Lasso penalty, a=1 is the ridge penalty. For many correlated predictors, a=.95 or thereabouts improves the performance of the lasso.
- convenient predict, plot, print, and coef methods
- variable-wise penalty modulation allows each variable to be penalized by a scalable amount; if zero that variable always enters
- glmnet uses a symmetric parametrization for multinomial, with constraints enforced by the penalization.
- a comprehensive set of cross-validation routines are provided for all models and several error measures
- offsets and weights can be provided for all models

Examples of glmnet speed trials:

Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along lasso path. Time = 2mins
14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along lasso path. Time = 30secs

Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani.

See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for implementation details, and comparisons with other related software.

Trevor Hastie hastie_at_stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977(650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall

390 Serra Mall, Stanford University, CA 94305-4065

R-packages mailing list

R-packages_at_r-project.org

https://stat.ethz.ch/mailman/listinfo/r-packages Received on Mon 05 Apr 2010 - 07:49:41 EST

*
This archive was generated by hypermail 2.2.0
: Mon 05 Apr 2010 - 07:50:03 EST
*