Re: [Rd] Reading 64-bit integers

From: Jon Clayden <jon.clayden_at_gmail.com>
Date: Wed, 30 Mar 2011 12:40:01 +0100

On 30 March 2011 02:49, Simon Urbanek <simon.urbanek_at_r-project.org> wrote:
>
> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
>
>> On 29/03/2011 7:01 PM, Jon Clayden wrote:
>>> Dear Simon,
>>>
>>> On 29 March 2011 22:40, Simon Urbanek<simon.urbanek_at_r-project.org>  wrote:
>>>> Jon,
>>>>
>>>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
>>>>
>>>>> Dear Simon,
>>>>>
>>>>> Thank you for the response.
>>>>>
>>>>> On 29 March 2011 15:06, Simon Urbanek<simon.urbanek_at_r-project.org>  wrote:
>>>>>>
>>>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I see from some previous threads that support for 64-bit integers in R
>>>>>>> may be an aim for future versions, but in the meantime I'm wondering
>>>>>>> whether it is possible to read in integers of greater than 32 bits at
>>>>>>> all. Judging from ?readBin, it should be possible to read 8-byte
>>>>>>> integers to some degree, but it is clearly limited in practice by R's
>>>>>>> internally 32-bit integer type:
>>>>>>>
>>>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0))
>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>>> [1] 16777216
>>>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0))
>>>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
>>>>>>> [1] 0
>>>>>>>
>>>>>>> For values that fit into 32 bits it works fine, but for larger values
>>>>>>> it fails. (I'm a bit surprised by the zero - should the value not be
>>>>>>> NA if it is out of range?
>>>>>>
>>>>>> No, it's not out of range - int is only 4 bytes so only 4 first bytes (respecting endianness order, hence LSB) are used.
>>>>>
>>>>> The fact remains that I ask for the value of an 8-byte integer and
>>>>> don't get it.
>>>>
>>>> I think you're misinterpreting the documentation:
>>>>
>>>>     If ‘size’ is specified and not the natural size of the object,
>>>>     each element of the vector is coerced to an appropriate type
>>>>     before being written or as it is read.
>>>>
>>>> The "integer" object type is defined as signed 32-bit in R, so if you ask for "8 bytes into object type integer", you get a coercion into that object type -- 32-bit signed integer -- as documented. I think the issue may come from the confusion of the object type "integer" with general "integer number" in mathematical sense that has no representation restrictions. (FWIW in C the "integer" type is "int" and it is 32-bit on all modern OSes regardless of platform - that's where the limitation comes from, it's not something R has made up).
>>>
>>> OK, but it still seems like there is a case for raising a warning. As
>>> it is there is no way to tell when reading an 8-byte integer from a
>>> file whether its value is really 0, or if it merely has 0 in its
>>> least-significant 4 bytes. If 99% of such stored numbers are below
>>> 2^31, one is going to need some extra logic to catch the other 1%
>>> where you (silently) get the wrong value. In essence, unless you're
>>> certain that you will never come across a number that actually uses
>>> the upper 4 bytes, you will always have to read it as two 4-byte
>>> numbers and check that the high-order one (which is endianness
>>> dependent, of course) is zero. A C-level sanity check seems more
>>> efficient and more helpful to me.
>>
>> Seems to me that the S-PLUS solution (output="double") would be a lot more useful.  I'd commit that if you write it; I don't think I'd commit the warning.
>>
>
> I was going to write some thing similar (idea = good, patch welcome ;)). My only worry is that the "output" argument is a bit misleading in that one could expect to use any combination of "input"/"output" which may be a maintenance nightmare. If I understand it correctly it's only a special case for integer input. I don't have S+ so can't say how they deal with that.

I don't have S+ either, but I agree that this is a better solution - although, I would guess, more involved to implement. Depending on how important compatibility with S+ is, I guess a more specific, logical, "convert large integers to double" option would be clearer than "output". I'm happy to try to draft a patch, but it may be a little while before I have some time.

All the best,
Jon



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 30 Mar 2011 - 11:53:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Mar 2011 - 17:50:40 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive