Message-Id: <199906301215.NAA01410@toucan.stats.ox.ac.uk>
Date: Wed, 30 Jun 1999 13:15:08 +0100 (BST)
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Subject: Re: [R] qr and Moore-Penrose
To: r-help@stat.math.ethz.ch, hothorn@amadeus.statistik.uni-dortmund.de
> Date: Wed, 30 Jun 1999 11:12:24 +0200 (MET DST)
> From: Torsten Hothorn <hothorn@amadeus.statistik.uni-dortmund.de>
>
> yesterday I had a little shock using qr (or lm). having a matrix
>
> X <- cbind(1,diag(3))
> y <- 1:3
>
> the qr.coef returns one NA (because X is singular). So I computed the
> Moore-Penrose inverse of X (just from the definition) and I get a correct
> result. Whats wrong about qr in this situation?
What is a correct result, by the way? There are infinitely many solutions
for the regression of y on X, and the Moore-Penrose one is just one choice
(that assumes that the coefficients are somehow comparable).
> X <- cbind(1, diag(3)); # singular matrix
> y <- 1:3
> Xp <- qr(X);
> b1 <- qr.coef(Xp, y); # contains NA
> b2 <- mpinv(X)%*%y # least square fit using Moore-Penrose
> X%*%b2 # == y
Those `;' are unnecessary: either CR or ; separates expressions in
S-like languages.
I find
> drop(b2)
[1] 1.5 -0.5 0.5 1.5
> b1
[1] 3 -2 -1 NA
> lm(y ~ X + 0)
Call:
lm(formula = y ~ X + 0)
Coefficients:
X1 X2 X3 X4
3 -2 -1 NA
> lm(y ~ X)
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X1 X2 X3 X4
2.5 NA -1.5 NA NA
[sic]
whereas S gives
> qr.coef(qr(X), y)
[1] 3 -2 -1 0
> lm(y ~ X, singular.ok=T)
Call:
lm(formula = y ~ X, singular.ok = T)
Coefficients: (2 not defined because of singularities)
(Intercept) X2 X3
3 -2 -1
Now, I can see the problem with lm under R, but what is wrong with qr.coef?
#--------
In R:
> qr(cbind(1,1,diag(3)))
$qr
[,1] [,2] [,3] [,4] [,5]
[1,] -1.7320508 -0.5773503 -1.732051 -0.5773503 -0.5773503
[2,] 0.5773503 0.8164966 0.000000 -0.4082483 -0.4082483
[3,] 0.5773503 0.7071068 0.000000 -0.7071068 0.7071068
$rank
[1] 2
$qraux
[1] 1.5773503 1.7071068 0.0000000 0.7071068 0.7071068
$pivot
[1] 1 3 2 4 5
It seems that there is a problem here (and S gets this right). I think
the changes to the Linpack pivoting strategy in dqrdc2 fail in this
example.
-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._