From: Jon Clayden <jon.clayden_at_gmail.com>

Date: Wed, 30 Mar 2011 00:01:17 +0100

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 29 Mar 2011 - 23:03:35 GMT

Date: Wed, 30 Mar 2011 00:01:17 +0100

On 29 March 2011 22:40, Simon Urbanek <simon.urbanek_at_r-project.org> wrote:

*> Jon,
**>
*

> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:

*>
**>> Dear Simon,
**>>
**>> Thank you for the response.
**>>
**>> On 29 March 2011 15:06, Simon Urbanek <simon.urbanek_at_r-project.org> wrote:
**>>>
**>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
**>>>
**>>>> Dear all,
**>>>>
**>>>> I see from some previous threads that support for 64-bit integers in R
**>>>> may be an aim for future versions, but in the meantime I'm wondering
**>>>> whether it is possible to read in integers of greater than 32 bits at
**>>>> all. Judging from ?readBin, it should be possible to read 8-byte
**>>>> integers to some degree, but it is clearly limited in practice by R's
**>>>> internally 32-bit integer type:
**>>>>
**>>>>> x <- as.raw(c(0,0,0,0,1,0,0,0))
**>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
**>>>> [1] 16777216
**>>>>> x <- as.raw(c(0,0,0,1,0,0,0,0))
**>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
**>>>> [1] 0
**>>>>
**>>>> For values that fit into 32 bits it works fine, but for larger values
**>>>> it fails. (I'm a bit surprised by the zero - should the value not be
**>>>> NA if it is out of range?
**>>>
**>>> No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.
**>>
**>> The fact remains that I ask for the value of an 8-byte integer and
**>> don't get it.
**>
**> I think you're misinterpreting the documentation:
**>
**> If ‘size’ is specified and not the natural size of the object,
**> each element of the vector is coerced to an appropriate type
**> before being written or as it is read.
**>
**> The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).
*

OK, but it still seems like there is a case for raising a warning. As it is there is no way to tell when reading an 8-byte integer from a file whether its value is really 0, or if it merely has 0 in its least-significant 4 bytes. If 99% of such stored numbers are below 2^31, one is going to need some extra logic to catch the other 1% where you (silently) get the wrong value. In essence, unless you're certain that you will never come across a number that actually uses the upper 4 bytes, you will always have to read it as two 4-byte numbers and check that the high-order one (which is endianness dependent, of course) is zero. A C-level sanity check seems more efficient and more helpful to me.

>> Pretending that it's really only four bytes because of

*>> the limits of R's integer type isn't all that helpful. Perhaps a
**>> warning should be put out if the cast will affect the value of the
**>> result? It looks like the relevant lines in src/main/connections.c are
**>> 3689-3697 in the current alpha:
**>>
**>> #if SIZEOF_LONG == 8
**>> case sizeof(long):
**>> INTEGER(ans)[i] = (int)*((long *)buf);
**>> break;
**>> #elif SIZEOF_LONG_LONG == 8
**>> case sizeof(_lli_t):
**>> INTEGER(ans)[i] = (int)*((_lli_t *)buf);
**>> break;
**>> #endif
**>>
**>>>> ) The value can be represented as a double,
**>>>> though:
**>>>>
**>>>>> 4294967296
**>>>> [1] 4294967296
**>>>>
**>>>> I wouldn't expect readBin() to return a double if an integer was
**>>>> requested, but is there any way to get the correct value out of it?
**>>>
**>>> Trivially (for your unsigned big-endian case):
**>>>
**>>> y <- readBin(x, "integer", n=length(x)/4L, endian="big")
**>>> y <- ifelse(y < 0, 2^32 + y, y)
**>>> i <- seq(1,length(y),2)
**>>> y <- y[i] * 2^32 + y[i + 1L]
**>>
**>> Thanks for the code, but I'm not sure I would call that trivial,
**>> especially if one needs to cater for little endian and signed cases as
**>> well!
**>
**> I was saying for your case and it's trivial as in read as integers, convert to double precision and add.
**>
**>
**>> This is what I meant by reconstructing the number manually...
**>>
**>
**> You didn't say so - you were talking about reconstructing it from a raw vector which seems a lot more painful since you can't compute with enough precision on raw vectors.
*

True - I should have been more specific. Sorry.

R-devel_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Tue 29 Mar 2011 - 23:03:35 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 30 Mar 2011 - 01:20:39 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel.
Please read the posting
guide before posting to the list.
*