Re: [R] memory problem --- use sparse matrices

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Mon 08 Jan 2007 - 18:18:26 GMT

>>>>> "UweL" == Uwe Ligges <ligges@statistik.uni-dortmund.de> >>>>> on Sun, 07 Jan 2007 09:42:08 +0100 writes:

    UweL> Zoltan Kmetty wrote:
>> Hi!
>>
>> I had some memory problem with R - hope somebody could
>> tell me a solution.
>>
>> I work with very large datasets, but R cannot allocate
>> enough memoty to handle these datasets.
>>
>> I want work a matrix with row= 100 000 000 and column=10
>>
>> A know this is 1 milliard cases, but i thought R could
>> handle it (other commercial software like spss could do),
>> but R wrote out everytime: not enough memory..
>>
>> any good idea?

    UweL> Buy a machine that has at least 8Gb (better 16Gb) of     UweL> RAM and proceed ...

Well, I doubt that Zoltan wants to *fill* his matrix with all non-zeros. If he does, Uwe and Roger are right.

Otherwise, working with a *sparse* matrix, using the 'Matrix' (my recommendation, but I am biased) or 'SparseM' package, might well be feasible:

install.packages("Matrix") # if needed; only once for your R

library(Matrix) # each time you need it

TsparseMatrix <- function(nrow, ncol, i,j,x) {

    ## Purpose: User friendly construction of sparse "Matrix" from triple
    ## ----------------------------------------------------------------------
    ## Arguments: (i,j,x): 2 integer and 1 numeric vector of the same length:
    ##
    ##	The matrix M will have
    ##       M[i[k], j[k]] == x[k] , for k = 1,2,..., length(i)
    ##    and M[ i', j' ]  ==  0  for `` all other pairs (i',j')
    ## ----------------------------------------------------------------------
    ## Author: Martin Maechler, Date:  8 Jan 2007, 18:46
    nnz <- length(i)
    stopifnot(length(j) == nnz, length(x) == nnz,

              is.numeric(x), is.numeric(i), is.numeric(j))     dim <- c(as.integer(nrow), as.integer(ncol))     ## The conformability of (i,j) with 'dim' will be checked automatically     ## by an internal "validObject()" that is part of new(.):     new("dgTMatrix", x = x, Dim = dim,

        ## our "Tsparse" Matrices use  0-based indices :
        i = as.integer(i - 1:1),
        j = as.integer(j - 1:1))

}

For example :

> TsparseMatrix(10,20, c(1,3:8), c(2,9,6:10), 7 * (1:7)) 10 x 20 sparse Matrix of class "dgTMatrix"                                                   

 [1,] . 7 . . .  .  .  .  .  . . . . . . . . . . .
 [2,] . . . . .  .  .  .  .  . . . . . . . . . . .
 [3,] . . . . .  .  .  . 14  . . . . . . . . . . .
 [4,] . . . . . 21  .  .  .  . . . . . . . . . . .
 [5,] . . . . .  . 28  .  .  . . . . . . . . . . .
 [6,] . . . . .  .  . 35  .  . . . . . . . . . . .
 [7,] . . . . .  .  .  . 42  . . . . . . . . . . .
 [8,] . . . . .  .  .  .  . 49 . . . . . . . . . .
 [9,] . . . . .  .  .  .  .  . . . . . . . . . . .
[10,] . . . . . . . . . . . . . . . . . . . .

But

nr <- 1e8
nc <- 10
set.seed(1)

i <- sample(nr, 10000)
j <- sample(nc, 10000)
x <- round(rnorm(10000), 2)

M <- TsparseMatrix(nr, nc, i=i, j=j, x=x)

works,
e.g. you can

x <- 1:10
system.time(y <- M %*% x) # needs around 4 sec on one of our better machines y <- as.vector(y)

## but you can become even more efficient, translating from the
## so-called "triplet" to the (recommended) "Csparse"
## representation:

M. <- as(M, "CsparseMatrix")

object.size(M) / object.size(M.)
## 1.328921; i.e. we saved 33%

## and

system.time(y. <- M. %*% x) # much faster (1 sec)

identical(as.vector(y.), y)

I hope this is useful to you.

Martin Maechler,
ETH Zurich



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 09 18:56:55 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 09 Jan 2007 - 23:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.