Re: [Rd] bug in sum() on integer vector

From: Hervé Pagès <hpages_at_fhcrc.org>
Date: Fri, 09 Dec 2011 13:41:59 -0800

ink1">ink1">Hi Duncan,

On 11-12-09 11:39 AM, Duncan Murdoch wrote:
> On 09/12/2011 1:40 PM, Hervé Pagès wrote:
>> Hi,
>>
>> x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>>
>> This is correct:
>>
>> > sum(as.double(x))
>> [1] 0
>>
>> This is not:
>>
>> > sum(x)
>> [1] 4996000
>>
>> Returning NA (with a warning) would also be acceptable for the latter.
>> That would make it consistent with cumsum(x):
>>
>> > cumsum(x)[length(x)]
>> [1] NA
>> Warning message:
>> Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
>
> This is a 64 bit problem; in 32 bits things work out properly.
> I'd guess
> in 64 bit arithmetic we or the run-time are doing something to simulate
> 32 bit arithmetic (since integers are 32 bits), but it looks as though
> we're not quite getting it right.

It doesn't work properly for me on Leopard (32-bit mode):

   > x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))    > sum(as.double(x))
   [1] 0
   > sum(x)
   [1] 4996000
   > sessionInfo()
   R version 2.14.0 RC (2011-10-27 r57452)    Platform: i386-apple-darwin9.8.0/i386 (32-bit)

   locale:
   [1] C

   attached base packages:
   [1] stats graphics grDevices utils datasets methods base

It looks like the problem is that isum() (in src/main/summary.c) uses a 'double' internally to do the sum, whereas rsum() and csum() use a 'long double'.

Note that isum() seems to be assuming that NA_INTEGER and NA_LOGICAL will always be the same (probably fine) and that TRUE values in the input vector are always represented as a 1 (not so sure about this one).

A more fundamental question: is switching back and forth between 'int' and 'double' (or 'long double') the right thing to do for doing "safe" arithmetic on integers?

Thanks!
H.

>
> Duncan Murdoch
>
>> Thanks!
>> H.
>>
>> > sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages_at_fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Fri 09 Dec 2011 - 22:05:14 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 11 Dec 2011 - 21:30:17 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive