From: roger koenker <roger_at_ysidro.econ.uiuc.edu>

Date: Sun 11 Jun 2006 - 09:13:31 EST

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Jun 11 09:21:52 2006

Date: Sun 11 Jun 2006 - 09:13:31 EST

As an example of how one might do this sort of thing in SparseM ignoring the rounding aspect...

require(SparseM)

require(msm) #for rtnorm

sm <- function(dim,rnd,q){

n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1) ia <- sample(dim,n,replace = TRUE) ja <- sample(dim,n,replace = TRUE) ra <- rtnorm(n,lower = -q, upper = q) A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer (ja), ra = ra, dimension = as.integer(c(dim,dim))) A <- as.matrix.csr(A) }

For dim = 5000 and q = .03 which exceeds Gavin's suggested 1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the
truncated normals.

Word of warning: pushing this too much further gets tedious since the
number of random numbers grows like dim^2. For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...

url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker@uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820

On Jun 10, 2006, at 12:53 PM, g l wrote:

*> Hi,
**>
*

> I'm Sorry for any cross-posting. I've reviewed the archives and could

*> not find an exact answer to my question below.
**>
**> I'm trying to generate very large sparse matrices (< 1% non-zero
**> entries per row). I have a sparse matrix function below which works
**> well until the row/col count exceeds 10,000. This is being run on a
**> machine with 32G memory:
**>
**> sparse_matrix <- function(dims,rnd,p) {
**> ptm <- proc.time()
**> x <- round(rnorm(dims*dims),rnd)
**> x[((abs(x) - p) < 0)] <- 0
**> y <- matrix(x,nrow=dims,ncol=dims)
**> proc.time() - ptm
**> }
**>
**> When trying to generate the matrix around 20,000 rows/cols on a
**> machine with 32G of memory, the error message I receive is:
**>
**> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
**> R(335) malloc: *** error: can't allocate region
**> R(335) malloc: *** set a breakpoint in szone_error to debug
**> R(335) malloc: *** vm_allocate(size=3200004096) failed (error code=3)
**> R(335) malloc: *** error: can't allocate region
**> R(335) malloc: *** set a breakpoint in szone_error to debug
**> Error: cannot allocate vector of size 3125000 Kb
**> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
**> 'x' in selecting a method for function 'round'
**>
**> * Last error line is obvious. Question: on machine w/32G memory, why
**> can't it allocate a vector of size 3125000 Kb?
**>
**> When trying to generate the matrix around 30,000 rows/cols, the error
**> message I receive is:
**>
**> Error in rnorm(dims * dims) : cannot allocate vector of length
**> 900000000
**> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
**> 'x' in selecting a method for function 'round'
**>
**> * Last error line is obvious. Question: is this 900000000 bytes?
**> kilobytes? This error seems to be specific now to rnorm, but it
**> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
**> rows/cols. Even if this Mb, why can't this be allocated on a machine
**> with 32G free memory?
**>
**> When trying to generate the matrix with over 50,000 rows/cols, the
**> error message I receive is:
**>
**> Error in rnorm(n, mean, sd) : invalid arguments
**> In addition: Warning message:
**> NAs introduced by coercion
**> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
**> 'x' in selecting a method for function 'round'
**>
**> * Same.
**>
**> Why would it generate different errors in each case? Code fixes? Any
**> simple ways to generate sparse matrices which would avoid above
**> problems?
**>
**> Thanks in advance,
**>
**> Gavin
**>
**> ______________________________________________
**> R-help@stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide! http://www.R-project.org/posting-
**> guide.html
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sun Jun 11 09:21:52 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Sun 11 Jun 2006 - 10:11:15 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*