Re: [R] qr and Moore-Penrose

Prof Brian Ripley (ripley@stats.ox.ac.uk)
Wed, 30 Jun 1999 13:15:08 +0100 (BST)

Message-Id: <199906301215.NAA01410@toucan.stats.ox.ac.uk>
Date: Wed, 30 Jun 1999 13:15:08 +0100 (BST)
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Subject: Re: [R] qr and Moore-Penrose
To: r-help@stat.math.ethz.ch, hothorn@amadeus.statistik.uni-dortmund.de

> Date: Wed, 30 Jun 1999 11:12:24 +0200 (MET DST)
> From: Torsten Hothorn <hothorn@amadeus.statistik.uni-dortmund.de>
>
> yesterday I had a little shock using qr (or lm). having a matrix
>
> X <- cbind(1,diag(3))
> y <- 1:3
>
> the qr.coef returns one NA (because X is singular). So I computed the
> Moore-Penrose inverse of X (just from the definition) and I get a correct
> result. Whats wrong about qr in this situation?

What is a correct result, by the way? There are infinitely many solutions
for the regression of y on X, and the Moore-Penrose one is just one choice
(that assumes that the coefficients are somehow comparable).

> X <- cbind(1, diag(3)); # singular matrix
> y <- 1:3
> Xp <- qr(X);
> b1 <- qr.coef(Xp, y); # contains NA
> b2 <- mpinv(X)%*%y # least square fit using Moore-Penrose
> X%*%b2 # == y

Those `;' are unnecessary: either CR or ; separates expressions in
S-like languages.

I find
> drop(b2)
[1] 1.5 -0.5 0.5 1.5
> b1
[1] 3 -2 -1 NA
> lm(y ~ X + 0)

Call:
lm(formula = y ~ X + 0)

Coefficients:
X1 X2 X3 X4
3 -2 -1 NA
> lm(y ~ X)

Call:
lm(formula = y ~ X)

Coefficients:
(Intercept) X1 X2 X3 X4
2.5 NA -1.5 NA NA
[sic]

whereas S gives

> qr.coef(qr(X), y)
[1] 3 -2 -1 0
> lm(y ~ X, singular.ok=T)
Call:
lm(formula = y ~ X, singular.ok = T)

Coefficients: (2 not defined because of singularities)
(Intercept) X2 X3
3 -2 -1

Now, I can see the problem with lm under R, but what is wrong with qr.coef?

#--------

In R:

> qr(cbind(1,1,diag(3)))
$qr
[,1] [,2] [,3] [,4] [,5]
[1,] -1.7320508 -0.5773503 -1.732051 -0.5773503 -0.5773503
[2,] 0.5773503 0.8164966 0.000000 -0.4082483 -0.4082483
[3,] 0.5773503 0.7071068 0.000000 -0.7071068 0.7071068

$rank
[1] 2

$qraux
[1] 1.5773503 1.7071068 0.0000000 0.7071068 0.7071068

$pivot
[1] 1 3 2 4 5

It seems that there is a problem here (and S gets this right). I think
the changes to the Linpack pivoting strategy in dqrdc2 fail in this
example.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


This archive was generated by hypermail 1.02.