[Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Fri, 01 Aug 2008 17:33:38 +0200

[[Topic diverted from R-help]]

>>>>> "VK" == Vadim Kutsyy <vadim_at_kutsyy.com> >>>>> on Fri, 01 Aug 2008 07:35:01 -0700 writes:

    VK> Martin Maechler wrote:

      VK> The problem is in array.c, where allocMatrix check for
      VK> "if ((double)nrow * (double)ncol > INT_MAX)".  But why
      VK> itn is used and not long int for indexing? (max int is
      VK> 2147483647, max long int is 9223372036854775807)

>> Well, Brian gave you all info:
>> ( ?Memory-limits )

    VK> exactly, and given that most modern system used for
    VK> computations (i.e.  64bit system) have long int which is
    VK> much larger than int, I am wondering why long int is not
    VK> used for indexing (I don't think that 4 bit vs 8 bit
    VK> storage is an issue).

Well, fortunately, reasonable compilers have indeed kept 'long' == 'long int' to mean 32-bit integers ((less reasonable compiler writers have not, AFAIK: which leads   of course to code that no longer compiles correctly when   originally it did))
But of course you are right that 64-bit integers (typically == 'long long', and really == 'int64') are very natural on 64-bit architectures.
But see below.

>> Did you really carefully read ?Memory-limits ??

    VK> Yes, it is specify that 4 bit int is used for indexing
    VK> in all version of R, but why? I think 2147483647
    VK> elements for a single vector is OK, but not as total
    VK> number of elements for the matrix.  I am running out of
    VK> indexing at mere 10% memory consumption.

If you have too large a numeric matrix, it would be larger than 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes. If that is is 10% only for you, you'd have around 160 GB of RAM. That's quite a impressive.
I agree that it is at least in the "ball park" of what is available today.


    VK> PS: I have no problem to go and modify C code, but I am
    VK> just wondering what are the reasons for having such
    VK> limitation.

Compatibility for one:

Note that R objects are (pointers to) C structs that are "well-defined" platform independently, and I'd say that this should remain so.
Consequently 64ints (or another "longer int"), would have to be
there "in R", also on 32bit platforms. That may well be feasible, but it would double the size of quite a few objects.

I think what you are implicitly proposing is that we'd want 64-bit integer as an R-level type, and that are R would use (and/or coerce to it from 'int32') for indexing everywhere.

But more importantly, all (or very much of) the currently existing C- and Fortran-code (called via .Call(), .C(), .Fortran) would also have to be able to deal with the "longer ints".

One of the last times this topic came up (within R-core), we found that for all the matrix/vector operations, we really would need versions of BLAS / LAPACK that would also work with these "big" matrices, ie. such a BLAS/Lapack would also have to internally use "longer int" for indexing. At that point in time, we had decied we would at least wait to hear about the development of such BLAS/LAPACK libraries.

Interested to hear other opinions / get more info on this topic. I do agree that it would be nice to get over this limit within a few years.


R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 01 Aug 2008 - 15:39:12 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 01 Aug 2008 - 19:35:46 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive