Re: [Rd] 4-int indexing limit of R {Re: [R] allocMatrix limits}

From: Martin Maechler <maechler_at_stat.math.ethz.ch>
Date: Fri, 01 Aug 2008 21:05:16 +0200

>>>>> "VK" == Vadim Kutsyy <vadim_at_kutsyy.com> >>>>> on Fri, 01 Aug 2008 10:22:43 -0700 writes:

    VK> Martin Maechler wrote:
>> [[Topic diverted from R-help]]
>>
>> Well, fortunately, reasonable compilers have indeed kept
>> 'long' == 'long int' to mean 32-bit integers ((less
>> reasonable compiler writers have not, AFAIK: which leads
>> of course to code that no longer compiles correctly when
>> originally it did)) But of course you are right that
>> 64-bit integers (typically == 'long long', and really ==
>> 'int64') are very natural on 64-bit architectures. But
>> see below.

... I wrote complete rubbish,
and I am embarrassed ...

>>

    VK> well in 64bit Ubunty, /usr/include/limits.h defines:

    VK> /* Minimum and maximum values a `signed long int' can hold.  */
    VK> #  if __WORDSIZE == 64
    VK> #   define LONG_MAX     9223372036854775807L
    VK> #  else
    VK> #   define LONG_MAX     2147483647L
    VK> #  endif
    VK> #  define LONG_MIN      (-LONG_MAX - 1L)

    VK> and using simple code to test 

    VK> (http://home.att.net/~jackklein/c/inttypes.html#int) my desktop, which     VK> is standard Intel computer, does show.

    VK> Signed long min: -9223372036854775808 max: 9223372036854775807

yes. I am really embarrassed.

What I was trying to say was that
the definition of int / long /... should not change when going from 32bit architecture to 64bit
and that the R internal structures consequently should also be the same on 32-bit and 64-bit platforms

>> If you have too large a numeric matrix, it would be larger than
>> 2^31 * 8 bytes ~= 2^34 / 2^20 ~= 16'000 Megabytes.
>> If that is is 10% only for you, you'd have around 160 GB of
>> RAM. That's quite a impressive.
>>
>> cat /proc/meminfo | grep MemTotal
    VK> MemTotal: 145169248 kB

    VK> We have "smaller" SGI NUMAflex to play with, where the memory can 
    VK> increased to 512Gb ("larger" version doesn't have this "limitation").  
    VK> But with even commodity hardware you can easily get 128Gb for reasonable 
    VK> price (i.e. Dell PowerEdge R900)

>> Note that R objects are (pointers to) C structs that are
>> "well-defined" platform independently, and I'd say that this
>> should remain so.

>>

    VK> I forgot that R stores two dimensional array in a single dimensional  C 
    VK> array. Now I understand why there is a limitation on total number of 
    VK> elements.  But this is a big limitations.

Yes, maybe

>> One of the last times this topic came up (within R-core),
>> we found that for all the matrix/vector operations,
>> we really would need versions of BLAS / LAPACK that would also
>> work with these "big" matrices, ie. such a BLAS/Lapack would
>> also have to internally use "longer int" for indexing.
>> At that point in time, we had decied we would at least wait to
>> hear about the development of such BLAS/LAPACK libraries

    VK> BLAS supports two dimensional metrics definition, so if we would store 
    VK> matrix as two dimensional object, we would be fine.  But than all R code 
    VK> as well as all packages would have to be modified.

exactly. And that was what I meant when I said "Compatibility".

But rather than changing the
 "matrix = colmunwise stored as long vector" paradigm, should rather change from 32-bit indexing to longer one.

The hope is that we eventually make up a scheme which would basically allow to just recompile all packages :

In src/include/Rinternals.h,
we have had the following three lines for several years now:



/* type for length of vectors etc */
typedef int R_len_t; /* will be long later, LONG64 or ssize_t on Win64 */ #define R_LEN_T_MAX INT_MAX

and you are right, that it may be time to experiment a bit more with replacing 'int' with long (and also the corresponding _MAX) setting there,
and indeed, in the array.c code you cited, should repalce INT_MAX by R_LEN_T_MAX

This still does not solve the problem that we'd have to get to a BLAS / Lapack version that correctly works with "long indices"... which may (or may not) be easier than I had thought.

Martin



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri 01 Aug 2008 - 19:09:36 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 04 Aug 2008 - 03:36:18 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive