From: Henrik Bengtsson <hb_at_stat.berkeley.edu>

Date: Thu, 27 Sep 2007 11:30:34 -0700

> x <- matrix(1, nrow=5000, ncol=5000)

> print(gc())

> x[1,1] <- 2

> print(gc())

}

> x <- matrix(1, nrow=5000, ncol=5000)

> print(gc())

> x[1,1] <- 2

> print(gc())

> Thank you for your response. Unfortunately, you are right. For example,

*> the allocated memory determined by top command on Linux may change during
*

*> a session as follows:
*

*> a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
*

*> a[1,1] <- 0 # 3.0g
*

*> gc() # 1.5g
*

> In the current applicatin, I modify the matrix only using my own C code

*> and only read it on R level. So, the above is not a big problem for me
*

*> (at least not now).
*

> However, there is a related thing, which could be a bug. The following

*> code determines the value of NAMED field in SEXP header of an object:
*

> SEXP getnamed(SEXP a)

*> {
*

*> SEXP out;
*

*> PROTECT(out = allocVector(INTSXP, 1));
*

*> INTEGER(out)[0] = NAMED(a);
*

*> UNPROTECT(1);
*

*> return(out);
*

*> }
*

> Now, consider the following session

> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)

> .Call("getnamed",u) # 1 (OK)

> length(u)

*> .Call("getnamed",u) # 1 (OK)
*

> dim(u)

*> .Call("getnamed",u) # 1 (OK)
*

> nrow(u)

*> .Call("getnamed",u) # 2 (why?)
*

*> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
*

*> .Call("getnamed",u) # 1 (OK)
*

> ncol(u)

*> .Call("getnamed",u) # 2 (so, ncol does the same)
*

> Is this a bug?

> Petr Savicky.

> https://stat.ethz.ch/mailman/listinfo/r-devel

>

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 27 Sep 2007 - 18:33:46 GMT

Date: Thu, 27 Sep 2007 11:30:34 -0700

As others already mentioned, in your example you are first creating an
integer matrix and the coercing it to a double matrix by assigning
(double) 1 to element [1,1]. However, even when correcting for this
mistake, there is an extra copy created when using matrix().

Try this in a fresh vanilla R session:

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136684 3.7 350000 9.4 350000 9.4 Vcells 81026 0.7 786432 6.0 473127 3.7

> x <- matrix(1, nrow=5000, ncol=5000)

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136793 3.7 350000 9.4 350000 9.4Vcells 25081043 191.4 27989266 213.6 25081056 191.4

> x[1,1] <- 2

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136797 3.7 350000 9.4 350000 9.4Vcells 25081044 191.4 52830254 403.1 50081058 382.1

So, yes, in that x[1,1] <- 2 assignment an extra copy is created. It is related to to the fact that there is NAMED matrix object being created inside matrix(), cf. the last rows in matrix():

x <- .Internal(matrix(data, nrow, ncol, byrow))
dimnames(x) <- dimnames

x

Here is a patch for matrix() that avoids this problem *when dimnames is NULL* (which is many time the case):

matrix <- function(data=NA, nrow=1, ncol=1, byrow=FALSE, dimnames=NULL) { data <- as.vector(data);

if(missing(nrow)) {

nrow <- ceiling(length(data)/ncol); } else if(missing(ncol)) { ncol <- ceiling(length(data)/nrow);

}

# Trick to avoid extra copy in the case when 'dimnames' is NULL.
if (is.null(dimnames)) {

.Internal(matrix(data, nrow, ncol, byrow));
} else {

x <- .Internal(matrix(data, nrow, ncol, byrow));
dimnames(x) <- dimnames;

x;

}

} # matrix()

Try the above again in a fresh R session with this patch applied and you'll get:

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136805 3.7 350000 9.4 350000 9.4 Vcells 81122 0.7 786432 6.0 473127 3.7

> x <- matrix(1, nrow=5000, ncol=5000)

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136919 3.7 350000 9.4 350000 9.4Vcells 25081139 191.4 27989372 213.6 25081152 191.4

> x[1,1] <- 2

> print(gc())

used (Mb) gc trigger (Mb) max used (Mb) Ncells 136923 3.7 350000 9.4 350000 9.4Vcells 25081140 191.4 29468840 224.9 25081276 191.4

Voila!

I talked to Luke Tierney about this and he though the internal method should be updated to take the dimnames argument, i.e. .Internal(matrix(data, nrow, ncol, byrow, dimnames)). However, until that is happening, may I suggest this simple patch/workaround to go in R v2.6.0?

Cheers

Henrik

On 9/27/07, Petr Savicky <savicky_at_cs.cas.cz> wrote:

> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:

*> > For the most part, doing anything to an R object result in it's
**> > duplication. You generally have to do a lot of work to NOT copy an R
**> > object.
*

>

> Thank you for your response. Unfortunately, you are right. For example,

>

> In the current applicatin, I modify the matrix only using my own C code

>

> However, there is a related thing, which could be a bug. The following

>

> SEXP getnamed(SEXP a)

>

> Now, consider the following session

>

> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)

> .Call("getnamed",u) # 1 (OK)

>

> length(u)

>

> dim(u)

>

> nrow(u)

>

> ncol(u)

>

> Is this a bug?

>

> Petr Savicky.

>> R-devel_at_r-project.org mailing list

> ______________________________________________

> https://stat.ethz.ch/mailman/listinfo/r-devel

>

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Thu 27 Sep 2007 - 18:33:46 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 27 Sep 2007 - 18:41:41 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*