[R] Constrained regression

From: Carlos Alzola <calzola_at_cox.net>
Date: Wed, 05 Mar 2008 08:36:46 -0500


I would like to acknowledge the answers I received from Tom Filloon, Mike Cheung and Berwyn Turlach.
Berwyn's response was exactly what I needed. Use solve.QP from the quadprog package in R. S-Plus has the equivalent function solveQP in the NuOpt module.

Berwyn's response is below

G'day Carlos,

On Mon, Mar 3, 2008 at 11:52 AM
Carlos Alzola <calzola_at_cox.net> wrote:

> I am trying to get information on how to fit a linear regression with
> constrained parameters. Specifically, I have 8 predictors , their
> coeffiecients should all be non-negative and add up to 1. I understand
> it is a quadratic programming problem but I have no experience in the
> subject. I searched the archives but the results were inconclusive.
>
> Could someone provide suggestions and references to the literature,
> please?

A suggestion:

> library(MASS) ## to access the Boston data
> designmat <- model.matrix(medv~., data=Boston) Dmat <-
> crossprod(designmat, designmat) dvec <- crossprod(designmat,
> Boston$medv) Amat <- cbind(1, diag(NROW(Dmat))) bvec <- c(1,
> rep(0,NROW(Dmat)) meq <- 1
> library(quadprog)
> res <- solve.QP(Dmat, dvec, Amat, bvec, meq)

The solution seems to contain values that are, for all practical purposes, actually zero:

> res$solution

 [1] 4.535581e-16 2.661931e-18 1.016929e-01 -1.850699e-17 [5] 1.458219e-16 -3.892418e-15 8.544939e-01 0.000000e+00 [9] 2.410742e-16 2.905722e-17 -5.700600e-20 -4.227261e-17 [13] 4.381328e-02 -3.723065e-18

So perhaps better:

> zapsmall(res$solution)

 [1] 0.0000000 0.0000000 0.1016929 0.0000000 0.0000000 0.0000000 [7] 0.8544939 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 [13] 0.0438133 0.0000000

So the estimates seem to follow the constraints.

And the unconstrained solution is:

> res$unconstrainted.solution

 [1] 3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 [5] 2.686734e+00 -1.776661e+01 3.809865e+00 6.922246e-04 [9] -1.475567e+00 3.060495e-01 -1.233459e-02 -9.527472e-01 [13] 9.311683e-03 -5.247584e-01

which seems to coincide with what lm() thinks it should be:

> coef(lm(medv~., Boston))
  (Intercept) crim zn indus chas  3.645949e+01 -1.080114e-01 4.642046e-02 2.055863e-02 2.686734e+00

          nox rm age dis rad -1.776661e+01 3.809865e+00 6.922246e-04 -1.475567e+00 3.060495e-01

          tax ptratio black lstat -1.233459e-02 -9.527472e-01 9.311683e-03 -5.247584e-01

So there seem to be no numeric problems. Otherwise we could have done something else (e.g calculate the QR factorization of the design matrix, say X, and give the R factor to solve.QP, instead of calculating X'X and giving that one to solve.QP).

If the intercept is not supposed to be included in the set of constrained estimates, then something like the following can be done:

> Amat[1,] <- 0
> res <- solve.QP(Dmat, dvec, Amat, bvec, meq)
> zapsmall(res$solution)
 [1] 6.073972 0.000000 0.109124 0.000000 0.000000 0.000000 0.863421 [8] 0.000000 0.000000 0.000000 0.000000 0.000000 0.027455 0.000000

Of course, since after the first command in that last block the second column of Amat contains only zeros
> Amat[,2]

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
we might as well have removed it (and the corresponding entry in bvec)
> Amat <- Amat[, -2]
> bvec <- bvec[-2]
before calling solve.QP().

Note, the Boston data set was only used to illustrate how to fit such models, I do not want to imply that these models are sensible for these data. :-)

Hope this helps.

Cheers,

        Berwin

Carlos Alzola
calzola_at_cox.net
(703) 242-6747

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 05 Mar 2008 - 14:48:34 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 05 Mar 2008 - 15:30:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive