From: <ripley_at_stats.ox.ac.uk>

Date: Tue, 01 May 2007 16:02:08 +0200 (CEST)

Date: Tue, 01 May 2007 16:02:08 +0200 (CEST)

On Thu, 19 Apr 2007, brech_at_delphioutpost.com wrote:

> Full_Name: Christian Brechbuehler

*> Version: 2.4.1 Patched (2007-03-25 r40917)
**> OS: Linux 2.6.15-27-adm64-xeon; Ubuntu 6.06.1 LTS
**> Submission from: (NULL) (24.61.47.236)
**>
**>
**> Splus and R have different ideas about what qr.coef(qr()) should return,
**> which is fine... but I believe that R has a bug in that it is not
**> internally consistent, and another separate bug in the documentation.
*

I agree with the bug in the dimnames, and and have corrected that in 2.5.0 patched. But I think the rest stems from the following misunderstanding:

> in math, zero times anything is zero, but in R, NA times

*> anything (even zero) is NA. This seems somewhat inconvenient.
*

That just is not true. In 'math' it would be true for the real line, but R is working with the computer version of the extended real line. In 'math' 0 * Inf is not zero, and an NA value could be infinite.

Stemming from this, what R is reporting is that certain columns are not used in the calculation. There is a difference between computing a coefficient as zero, and excluding that column from the calculation (for which R uses NA, because what happens if it were included is unknown).

You are assuming that you can find a linear predictor by multiplying a matrix by the coefficients. That is not true in R (especially for linear models), and it is not said to be so in ?qr.coef.

> In particular, on this problem, Splus generates a permuted solution

*> vector:
**>
**> Splus 6.2.1:
**> > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
**> "one")))
**> > A
**> zero one
**> [1,] 0 1
**> [2,] 0 1
**> [3,] 0 1
**> > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
**> > y
**> y
**> [1,] 6
**> [2,] 7
**> [3,] 8
**> > x <- qr.coef(qr(A), y)
**> > x
**> y
**> one 7
**> zero 0
**> >
**>
**> To my taste, the answer from Splus is unexpected; I was looking for
**> this:
**>
**> > x[qr(A)$pivot,,drop=F]
**> y
**> zero 0
**> one 7
**>
**> But at least the answer is internally consistent: the values and the
**> row names were scrambled in corresponding ways.
**>
**> However, R seems to helpfully un-permute the data values, but forgets
**> to un-permute the dimnames, which seems broken.
**>
**> R version 2.4.1 Patched (2007-03-25 r40917)
**> > A <- matrix(c(0,0,0, 1,1,1), nrow=3,dimnames=list(NULL, c("zero",
**> "one")))
**> > y <- matrix(c(6,7,8), nrow=3, dimnames=list(NULL, "y"))
**> > x <- qr.coef(qr(A), y)
**> > x
**> y
**> one NA
**> zero 7
**>
**> I think the answer from qr.coef() should look more like this:
**>
**> > x.wanted
**> y
**> zero NA
**> one 7
**> >
**>
**> In addition, R uses NA to fill in rather than zero. NA for this
**> problem above might mean "undefined": any value will do. But given
**> the application of this function, where the solution might be
**> multiplied by the matrix, any NA will cause that to turn into an NA
**> result... in math, zero times anything is zero, but in R, NA times
**> anything (even zero) is NA. This seems somewhat inconvenient. So I'd really
**> like
**> R to return this:
**>
**> > x
**> y
**> zero 0
**> one 7
**>
**> More on the scrambling; I see in dqrsl.f that:
**>
**> c if pivoting was requested in dqrdc, the j-th
**> c component of b will be associated with column jpvt(j)
**> c of the original matrix x that was input into dqrdc.)
**>
**> which I think is referring to the helpful un-permuting, but I think
**> qr.coef() in R needs to correspondingly un-permute the dimnames as
**> well?
**>
**> Code from qr.coef:
**> if (!is.null(nam <- colnames(qr$qr)))
**> rownames(coef) <- nam
**> Proposed fix: That second line could be changed to
**> rownames(coef)[qr$pivot] <- nam
**>
**>
**> Another issue (either with the documentation or the code) is as follows.
**> In R, help(qr) promises:
**>
**> 'solve.qr' is the method for 'solve' for 'qr' objects. 'qr.solve'
**> solves systems of equations via the QR decomposition: if 'a' is a
**> QR decomposition it is the same as 'solve.qr', but if 'a' is a
**> rectangular matrix the QR decomposition is computed first. Either
**> will handle over- and under-determined systems, providing a
**> minimal-length solution or a least-squares fit if appropriate.
**>
**> So unlike Splus, R promises a minimal-length solution, but I don't
**> think it delivers that. For example, let's get the minimal-length
**> solution for the following under-determined system:
**>
**> x1 + 2*x2 == 5
**>
**> qr.solve would fail due to the singular matrix, so use qr.coef:
**>
**> > qr.coef(qr(t(1:2)),c(5))
**>
**> R:
**> [1] 5 NA
**>
**> Splus:
**> [1] 5 0
**>
**> 1*5 + 2*0 does equal 5, so modulo NA, this is a solution, but it is
**> NOT minimal-length. OTOH, svd gives the right answer in both Splus and
**> R:
**>
**> > d <- svd(t(1:2)); c(5) %*% d$u %*% diag(1/d$d, length(d$d)) %*% t(d$v)
**> [,1] [,2]
**> [1,] 1 2
**>
**> This *is* minimum length, and 1*1 + 2*2 == 5. So either qr.coef(qr())
**> should deliver that, or the documentation should clarify when a
**> minimal-length solution is delivered?
**>
**> And revisiting the NA issue... in this problem, for the qr.coef()
**> result to be a solution, that 2nd value MUST be zero, no other values
**> will work. So the NA seems wrong here, more clearly than in the other
**> example.
**>
**> Finally, a simpler test case that shows the same issues with a 2x1
**> example instead of 2x3; any fix to qr.coef() should handle this as
**> well:
**>
**>> qr.coef(qr(matrix(0:1,1,dimnames=list(NULL, c("zero","one")))),5)
**> one zero
**> NA 5
**>
**> I think that should return:
**>
**> zero one
**> 0 5
**>
**> or at least:
**>
**> zero one
**> NA 5
**>
**> ______________________________________________
**> R-devel_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-develReceived on Tue 01 May 2007 - 16:48:15 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 03 May 2007 - 22:34:31 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*