Re: [R] sparse matrix, rnorm, malloc

From: roger koenker <roger_at_ysidro.econ.uiuc.edu>
Date: Sun 11 Jun 2006 - 09:13:31 EST

As an example of how one might do this sort of thing in SparseM ignoring the rounding aspect...

require(SparseM)
require(msm) #for rtnorm
sm <- function(dim,rnd,q){

         n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1)
         ia <- sample(dim,n,replace = TRUE)
         ja <- sample(dim,n,replace = TRUE)
         ra <- rtnorm(n,lower = -q, upper = q)
         A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer 
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
         A <- as.matrix.csr(A)
         }

For dim = 5000 and q = .03 which exceeds Gavin's suggested 1 percent density, this takes about 30 seconds on my imac and according to Rprof about 95 percent of that (total) time is spent generating the truncated normals.
Word of warning: pushing this too much further gets tedious since the number of random numbers grows like dim^2. For example, dim = 20,000 and q = .02 takes 432 seconds with again 93% of the total time spent in rnorm and rtnorm...

url:    www.econ.uiuc.edu/~roger                Roger Koenker
email   rkoenker@uiuc.edu                       Department of Economics
vox:    217-333-4558                            University of Illinois
fax:    217-244-6678                            Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
> ptm <- proc.time()
> x <- round(rnorm(dims*dims),rnd)
> x[((abs(x) - p) < 0)] <- 0
> y <- matrix(x,nrow=dims,ncol=dims)
> proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length
> 900000000
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 900000000 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Jun 11 09:21:52 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sun 11 Jun 2006 - 10:11:15 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.