Re: [Rd] Reading 64-bit integers

From: William Dunlap <wdunlap_at_tibco.com>
Date: Wed, 30 Mar 2011 10:38:06 -0700

> -----Original Message-----
> From: r-devel-bounces_at_r-project.org
> [mailto:r-devel-bounces@r-project.org] On Behalf Of Simon Urbanek
> Sent: Tuesday, March 29, 2011 6:49 PM
> To: Duncan Murdoch
> Cc: r-devel_at_r-project.org
> Subject: Re: [Rd] Reading 64-bit integers
>
>
> On Mar 29, 2011, at 8:47 PM, Duncan Murdoch wrote:
>
> > On 29/03/2011 7:01 PM, Jon Clayden wrote:
> >> Dear Simon,
> >>
> >> On 29 March 2011 22:40, Simon
> Urbanek<simon.urbanek_at_r-project.org> wrote:
> >>> Jon,
> >>>
> >>> On Mar 29, 2011, at 1:33 PM, Jon Clayden wrote:
> >>>
> >>>> Dear Simon,
> >>>>
> >>>> Thank you for the response.
> >>>>
> >>>> On 29 March 2011 15:06, Simon
> Urbanek<simon.urbanek_at_r-project.org> wrote:
> >>>>>
> >>>>> On Mar 29, 2011, at 8:46 AM, Jon Clayden wrote:
> >>>>>
> >>>>>> Dear all,
> >>>>>>
> >>>>>> I see from some previous threads that support for
> 64-bit integers in R
> >>>>>> may be an aim for future versions, but in the meantime
> I'm wondering
> >>>>>> whether it is possible to read in integers of greater
> than 32 bits at
> >>>>>> all. Judging from ?readBin, it should be possible to
> read 8-byte
> >>>>>> integers to some degree, but it is clearly limited in
> practice by R's
> >>>>>> internally 32-bit integer type:
> >>>>>>
> >>>>>>> x<- as.raw(c(0,0,0,0,1,0,0,0))
> >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
> >>>>>> [1] 16777216
> >>>>>>> x<- as.raw(c(0,0,0,1,0,0,0,0))
> >>>>>>> (readBin(x,"integer",n=1,size=8,signed=F,endian="big"))
> >>>>>> [1] 0
> >>>>>>
> >>>>>> For values that fit into 32 bits it works fine, but
> for larger values
> >>>>>> it fails. (I'm a bit surprised by the zero - should
> the value not be
> >>>>>> NA if it is out of range?
> >>>>>
> >>>>> No, it's not out of range - int is only 4 bytes so only
> 4 first bytes (respecting endianness order, hence LSB) are used
.
> >>>>
> >>>> The fact remains that I ask for the value of an 8-byte
> integer and
> >>>> don't get it.
> >>>
> >>> I think you're misinterpreting the documentation:
> >>>
> >>> If 'size' is specified and not the natural size of the object,
> >>> each element of the vector is coerced to an appropriate type
> >>> before being written or as it is read.
> >>>
> >>> The "integer" object type is defined as signed 32-bit in
> R, so if you ask for "8 bytes into object type integer", you
> get a coercion into that object type -- 32-bit signed integer
> -- as documented. I think the issue may come from the
> confusion of the object type "integer" with general "integer
> number" in mathematical sense that has no representation
> restrictions. (FWIW in C the "integer" type is "int" and it
> is 32-bit on all modern OSes regardless of platform - that's
> where the limitation comes from, it's not something R has made up).
> >>
> >> OK, but it still seems like there is a case for raising a
> warning. As
> >> it is there is no way to tell when reading an 8-byte integer from a
> >> file whether its value is really 0, or if it merely has 0 in its
> >> least-significant 4 bytes. If 99% of such stored numbers are below
> >> 2^31, one is going to need some extra logic to catch the other 1%
> >> where you (silently) get the wrong value. In essence, unless you're
> >> certain that you will never come across a number that actually uses
> >> the upper 4 bytes, you will always have to read it as two 4-byte
> >> numbers and check that the high-order one (which is endianness
> >> dependent, of course) is zero. A C-level sanity check seems more
> >> efficient and more helpful to me.
> >
> > Seems to me that the S-PLUS solution (output="double")
> would be a lot more useful. I'd commit that if you write it;

> I don't think I'd commit the warning.

> >
>
> I was going to write some thing similar (idea = good, patch
> welcome ;)). My only worry is that the "output" argument is a
> bit misleading in that one could expect to use any
> combination of "input"/"output" which may be a maintenance
> nightmare. If I understand it correctly it's only a special
> case for integer input. I don't have S+ so can't say how they
> deal with that.

In S+'s readBin the output argument can be only double() or single() when what is double() or single() (S+ still has a real single precision storage mode) and can be any
numeric type or logical when what is integer().

The output=double() seemed like the only useful case.

It does not warn when precision is lost in the 8-byte integer to double conversion. Perhaps it should.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> Cheers,
> Simon
>
>
> >
> >>
> >>>> Pretending that it's really only four bytes because of
> >>>> the limits of R's integer type isn't all that helpful. Perhaps a
> >>>> warning should be put out if the cast will affect the
> value of the
> >>>> result? It looks like the relevant lines in
> src/main/connections.c are
> >>>> 3689-3697 in the current alpha:
> >>>>
> >>>> #if SIZEOF_LONG == 8
> >>>> case sizeof(long):
> >>>> INTEGER(ans)[i] = (int)*((long *)buf);
> >>>> break;
> >>>> #elif SIZEOF_LONG_LONG == 8
> >>>> case sizeof(_lli_t):
> >>>> INTEGER(ans)[i] = (int)*((_lli_t *)buf);
> >>>> break;
> >>>> #endif
> >>>>
> >>>>>> ) The value can be represented as a double,
> >>>>>> though:
> >>>>>>
> >>>>>>> 4294967296
> >>>>>> [1] 4294967296
> >>>>>>
> >>>>>> I wouldn't expect readBin() to return a double if an
> integer was
> >>>>>> requested, but is there any way to get the correct
> value out of it?
> >>>>>
> >>>>> Trivially (for your unsigned big-endian case):
> >>>>>
> >>>>> y<- readBin(x, "integer", n=length(x)/4L, endian="big")
> >>>>> y<- ifelse(y< 0, 2^32 + y, y)
> >>>>> i<- seq(1,length(y),2)
> >>>>> y<- y[i] * 2^32 + y[i + 1L]
> >>>>
> >>>> Thanks for the code, but I'm not sure I would call that trivial,
> >>>> especially if one needs to cater for little endian and
> signed cases as
> >>>> well!
> >>>
> >>> I was saying for your case and it's trivial as in read as
> integers, convert to double precision and
add.
> >>>
> >>>
> >>>> This is what I meant by reconstructing the number manually...
> >>>>
> >>>
> >>> You didn't say so - you were talking about reconstructing
> it from a raw vector which seems a lot more painful since you

> can't compute with enough precision on raw vectors.

> >>
> >> True - I should have been more specific. Sorry.
> >>
> >> Jon
> >>
> >> ______________________________________________
> >> R-devel_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
> ______________________________________________
> R-devel_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel Received on Wed 30 Mar 2011 - 17:50:37 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 30 Mar 2011 - 20:30:37 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive