Re: [Rd] (PR#9623) qr.coef: permutes dimnames; inserts NA; promises

From: <ripley_at_stats.ox.ac.uk>
Date: Tue, 01 May 2007 16:02:08 +0200 (CEST)

On Thu, 19 Apr 2007, brech_at_delphioutpost.com wrote:

> Full_Name: Christian Brechbuehler
> Version: 2.4.1 Patched (2007-03-25 r40917)
> OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS
> Submission from: (NULL) (24.61.47.236)
>
>
> Splus and R have different ideas about what qr.coef(qr()) should return,
> which is fine... but I believe that R has a bug in that it is not
> internally consistent, and another separate bug in the documentation.

I agree with the bug in the dimnames, and and have corrected that in 2.5.0 patched. But I think the rest stems from the following misunderstanding:

> in math, zero times anything is zero, but in R, NA times
> anything (even zero) is NA. This seems somewhat inconvenient.

That just is not true. In 'math' it would be true for the real line, but R is working with the computer version of the extended real line. In 'math' 0 * Inf is not zero, and an NA value could be infinite.

Stemming from this, what R is reporting is that certain columns are not used in the calculation. There is a difference between computing a coefficient as zero, and excluding that column from the calculation (for which R uses NA, because what happens if it were included is unknown).

You are assuming that you can find a linear predictor by multiplying a matrix by the coefficients. That is not true in R (especially for linear models), and it is not said to be so in ?qr.coef.

> In particular, on this problem, Splus generates a permuted solution
> vector:
>
> Splus 6.2.1:
> > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
> "one")))
> > A
> zero one
> [1,] 0 1
> [2,] 0 1
> [3,] 0 1
> > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
> > y
> y
> [1,] 6
> [2,] 7
> [3,] 8
> > x <- qr.coef(qr(A), y)
> > x
> y
> one 7
> zero 0
> >
>
> To my taste, the answer from Splus is unexpected; I was looking for
> this:
>
> > x[qr(A)$pivot,,drop=F]
> y
> zero 0
> one 7
>
> But at least the answer is internally consistent: the values and the
> row names were scrambled in corresponding ways.
>
> However, R seems to helpfully un-permute the data values, but forgets
> to un-permute the dimnames, which seems broken.
>
> R version 2.4.1 Patched (2007-03-25 r40917)
> > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
> "one")))
> > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
> > x <- qr.coef(qr(A), y)
> > x
> y
> one NA
> zero 7
>
> I think the answer from qr.coef() should look more like this:
>
> > x.wanted
> y
> zero NA
> one 7
> >
>
> In addition, R uses NA to fill in rather than zero. NA for this
> problem above might mean "undefined": any value will do. But given
> the application of this function, where the solution might be
> multiplied by the matrix, any NA will cause that to turn into an NA
> result... in math, zero times anything is zero, but in R, NA times
> anything (even zero) is NA. This seems somewhat inconvenient. So I'd really
> like
> R to return this:
>
> > x
> y
> zero 0
> one 7
>
> More on the scrambling; I see in dqrsl.f that:
>
> c if pivoting was requested in dqrdc, the j-th
> c component of b will be associated with column jpvt(j)
> c of the original matrix x that was input into dqrdc.)
>
> which I think is referring to the helpful un-permuting, but I think
> qr.coef() in R needs to correspondingly un-permute the dimnames as
> well?
>
> Code from qr.coef:
> if (!is.null(nam <- colnames(qr$qr)))
> rownames(coef) <- nam
> Proposed fix: That second line could be changed to
> rownames(coef)[qr$pivot] <- nam
>
>
> Another issue (either with the documentation or the code) is as follows.
> In R, help(qr) promises:
>
> 'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
> solves systems of equations via the QR decomposition: if 'a' is a
> QR decomposition it is the same as 'solve.qr', but if 'a' is a
> rectangular matrix the QR decomposition is computed first. Either
> will handle over- and under-determined systems, providing a
> minimal-length solution or a least-squares fit if appropriate.

>
> So unlike Splus, R promises a minimal-length solution, but I don't
> think it delivers that. For example, let's get the minimal-length
> solution for the following under-determined system:
>
> x1 + 2*x2 == 5
>
> qr.solve would fail due to the singular matrix, so use qr.coef:
>
> > qr.coef(qr(t(1:2)),c(5))
>
> R:
> [1] 5 NA
>
> Splus:
> [1] 5 0
>
> 1*5 + 2*0 does equal 5, so modulo NA, this is a solution, but it is
> NOT minimal-length. OTOH, svd gives the right answer in both Splus and
> R:
>
> > d <- svd(t(1:2)); c(5) %*% d$u %*% diag(1/d$d, length(d$d)) %*% t(d$v)
> [,1] [,2]
> [1,] 1 2
>
> This *is* minimum length, and 1*1 + 2*2 == 5. So either qr.coef(qr())
> should deliver that, or the documentation should clarify when a
> minimal-length solution is delivered?
>

> And revisiting the NA issue... in this problem, for the qr.coef()
> result to be a solution, that 2nd value MUST be zero, no other values
> will work. So the NA seems wrong here, more clearly than in the other
> example.
>
> Finally, a simpler test case that shows the same issues with a 2x1
> example instead of 2x3; any fix to qr.coef() should handle this as
> well:
>
>> qr.coef(qr(matrix(0:1,1,dimnames=list(NULL, c("zero","one")))),5)
> one zero
> NA 5
>
> I think that should return:
>
> zero one
> 0 5
>
> or at least:
>
> zero one
> NA 5
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Tue 01 May 2007 - 16:48:15 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 03 May 2007 - 22:34:31 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.