Re: [R] Computing the rank of a matrix.

From: RAVI VARADHAN <rvaradhan_at_jhmi.edu>
Date: Sat, 07 Apr 2007 18:39:36 -0400

Hi Martin,

I played a bit with rankMat. I will first state the conclusions of my numerical experiments and then present the results to support them:

  1. I don't believe that rankMat, or equivalently Matlab's rank computation, is necessarily better than qr(.)$rank or (qr., LAPACK=TRUE)$rank. In fact, for the notorious Hilbert matrix, rankMat can give poor estimates of rank.
  2. There exists no universally optimal (i.e. optimal for all A) tol in qr(A, tol)$rank that would be optimally close to rankMat. The discrepancy in the ranks computed by qr(A)$rank and rankMat(A) obviously depends on the matrix A. This is evident from the tol used in rankMat: tol <- max(d) * .Machine$double.eps * abs(singValA[1]) So, this value of tol in qr will also minimize the discrepancy.
  3. The tol parameter is irrelevant in qr(A, tol, LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize the tol parameter when computing the rank. However, qr(A, tol, LAPACK=FALSE)$rank does depend on tol.

Now, here are the results:
1.
> hilbert.rank <- matrix(NA,20,3)
> hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
> for (i in 1:20) {

+ himat <- hilbert(i)
+ hilbert.rank[i,1] <- rankMat(himat)
+ hilbert.rank[i,2] <- qr(himat,tol=1.0e-14)$rank
+ hilbert.rank[i,3] <- qr(himat, tol = .Machine$double.eps, LAPACK = TRUE)$rank
+ }

> hilbert.rank
[,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 [3,] 3 3 3 [4,] 4 4 4 [5,] 5 5 5 [6,] 6 6 6 [7,] 7 7 7 [8,] 8 8 8 [9,] 9 9 9 [10,] 10 10 10 [11,] 10 11 11 [12,] 11 12 12 [13,] 11 12 13 [14,] 11 13 14 [15,] 12 13 15 [16,] 12 15 16 [17,] 12 16 17 [18,] 12 16 18

[19,] 13 17 19
[20,] 13 17 20

We see that rankMat underestimates the rank for n > 10, but qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right.

2.
Here I first, created a random rectangular matrix, and then added a number of rows to it, where these new rows are the same as some of the original rows except for a tiny amount of noise, which I call eps. So, the degree of rank deficiency is controlled by eps. I let eps take on 3 values: 1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being close to rankMat) depends on the level of precision (eps) in the matrix.
> set.seed(123)
> nrow <- 15
> ncol <- 20
> nsim <- 5000
> ndefic <- 4 # number of "nearly" dependent rows
> eps <- 1.e-07
> rnk <- matrix(NA, nsim, 5)
> for (j in 1:nsim) {

+ A <- matrix(rnorm(nrow*ncol),nrow, ncol)
+ newrows <- matrix(NA, ndefic, ncol)
+ for (i in 1:ndefic) {
+ newrows[i,] <- A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))* eps)
+ }
+ 
+ B <- rbind(A,newrows)
+ rnk[j,1] <- rankMat(B)
+ rnk[j,2] <- qr(B, tol = 1.e-07)$rank
+ rnk[j,3] <- qr(B, tol = 1.e-11)$rank
+ rnk[j,4] <- qr(B, tol = 1.0e-14)$rank
+ rnk[j,5] <- qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank
+ }

> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
[1] 19351 53 0 0

We observe that both qr(B, tol=1.e-14)$rank and qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank give exactly the same rank as rankMat.

Now, we change eps <- 1.e-10 and the results are:
> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
[1] 19778 14263 166 220
This means that a tolerance of 1.e-14 works best.

Now we lower eps further: eps <- 1.e-14
> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
[1] 0 3 665 20000
Clearly, the smaller tolerances yield rank estimates that are higher than that given by rankMat. That is, rankMat underestimates the rank as in the case of Hilbert matrix.

3.
Now we show that qr(., tol, LAPACK=TRUE)$rank is independent of tol:
> exp <- -(7:16)
> tol <- 10^exp
> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=FALSE)$rank)
 [1] 10 12 14 14 15 16 16 17 18 19
> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=TRUE)$rank)
 [1] 20 20 20 20 20 20 20 20 20 20

Looking forward to comments.

Best,
Ravi.

> >>>>> "Ravi" == Ravi Varadhan <rvaradhan_at_jhmi.edu>
> >>>>> on Fri, 6 Apr 2007 12:44:33 -0400 writes:
>
> Ravi> Hi, qr(A)$rank will work, but just be wary of the
> Ravi> tolerance parameter (default is 1.e-07), since the
> Ravi> rank computation could be sensitive to the tolerance
> Ravi> chosen.
>
> Yes, indeed.
>
> The point is that rank(<Matrix>)
> is well defined in pure math (linear algebra), as well as
> a "singular matrix" is.
>
> The same typically no longer makes sense as soon as you enter
> the real world: A matrix "close to singular" may have to be
> treated "as if singular" depending on its "singularity
> closeness" {{ learn about the condition number of a matrix }}
> and the same issues arise with rank(<matrix>).
>
> Of course, the matlab programmers know all this (and much more),
> and indeed, matlab's rank(A) really is
> rank(A, tol = tol.default(A))
>
> and is based on the SVD instead of QR decomposition since the
> former is said to be more reliable (even though slightly slower).
>
> R's equivalent (with quite a bit of fool-proofing) would be the
> following function (assuming correct online documentation of matlab):
>
>
> rankMat <- function(A, tol = NULL, singValA = svd(A, 0,0)$d)
> {
> ## Purpose: rank of a matrix ``as Matlab''
> ## ----------------------------------------------------------------------
> ## Arguments: A: a numerical matrix, maybe non-square
> ## tol: numerical tolerance (compared to singular values)
> ## singValA: vector of non-increasing singular values
> of A
> ## (pass as argument if already known)
> ## ----------------------------------------------------------------------
> ## Author: Martin Maechler, Date: 7 Apr 2007, 16:16
> d <- dim(A)
> stopifnot(length(d) == 2, length(singValA) == min(d),
> diff(singValA) < 0) # must be sorted decreasingly
> if(is.null(tol))
> tol <- max(d) * .Machine$double.eps * abs(singValA[1])
> else stopifnot(is.numeric(tol), tol >= 0)
> sum(singValA >= tol)
> }
>
>
> A small scale simulation with random matrices,
> i.e., things like
>
> ## ranks of random matrices; here will have 5 all the time:
> table(replicate(1000, rankMat(matrix(rnorm(5*12),5, 12) )))# < 1 sec.
>
> indicates that qr(.)$rank gives the same typically,
> where I assume one should really use
>
> qr(., tol = .Machine$double.eps, LAPACK = TRUE)$rank
>
> to be closer to Matlab's default tolerance.
>
> Ok, who has time to investigate further?
> Research exercise:
>
> 1>> Is there a fixed number, say t0 <- 1e-15
> 1>> for which qr(A, tol = t0, LAPACK=TRUE)$rank is
> 1>> ``optimally close'' to rankMat(A) ?
>
> 2>> how easily do you get cases showing svd(.) to more reliable
> 2>> than qr(., LAPACK=TRUE)?
>
> To solve this in an interesting way, you should probably
> investigate classes of "almost rank-deficient" matrices,
> and I'd also be interested if you "randomly" ever get matrices A
> with rank(A) < min(dim(A)) - 1
> (unless you construct some columns/rows exactly from earlier
> ones or use all-0 ones)
>
> Martin Maechler, ETH Zurich
>
>
>
>
> Ravi> Ravi.
>
> Ravi> -----------------------------------------------------------------
> Ravi> Ravi Varadhan, Ph.D.
> Ravi> Assistant Professor, The Center on Aging and Health
> .......

> Ravi>
>
>
> Ravi> --------------------------------------------------------------------
>
>
>
> >> How about
>
> >> qr(A)$rank
>
> >> or perhaps
>
> >> qr(A, LAPACK=TRUE)$rank
>
> >> Cheers,
>
> >> Andy
>
>
>
>
>
>
>
>
>
> >> Hi! Maybe this is a silly question, but I need the
> >> column rank (
> >> of a matrix and R function 'rank()' only gives me the
> >> ordering of the elements of my matrix. How can I
> >> compute the column rank of a matrix? Is there not an R
> >> equivalent to Matlab's 'rank()'? I've been browsing
> >> for a time now and I can't find anything, so any help
> >> will be greatly appreciated. Best regards!

>
>
> >> -- -- Jose Luis Aznarte M.
> >> Department of Computer
> >> Science and Artificial Intelligence Universidad de
> >> Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax:
> >> +34 - 958 - 24 00 79



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sat 07 Apr 2007 - 22:47:17 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 24 Jul 2007 - 18:33:15 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.