Re: [R] bitwise addition

From: Charles C. Berry <>
Date: Wed 17 May 2006 - 01:55:47 EST

If you are SURE you need the exact distribution of the t-statistic (and asymptotics ala Pitman, 1938, just will not do), then

You can get it by hacking the

         dec.index <- dec.index + <something.else>

line in the for loop and making several passes through the data to accumulate the necessary pieces to construct the sufficient statistics.

e.g. for the sums for one group <something.else> is 'y[pos]':

> y <- rnorm(24) # simulate 24 observations
> hi.index <- 24
> n.index <- 12
> breaks <- lapply(1:n.index,function(x) choose(-1+1:hi.index,x))
> index <- seq(0,length=choose(hi.index,n.index))
> dec.index <- rep(as.double(0),choose(hi.index,n.index))
> system.time(

+             for (i in n.index:1){
+               pos <- findInterval(index,breaks[[i]])
+               index <- index - breaks[[i]][pos]
+               dec.index <- dec.index + y[pos]
+             }
+             )

[1] 5.13 0.62 5.78 NA NA
> # well-known asymptotics illustrated here:
> hist(dec.index - mean(y)*n.index )
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  181730  4.9     407500  10.9   350000   9.4
Vcells 6854944 52.3 17713407 135.2 17695082 135.1

HTH, Chuck

On Tue, 16 May 2006, Nameeta Lobo wrote:

> Hi all
> I am actually trying to create a permuatation matrix whose number of columns
> equals the number of subjects I have. This matrix is actually supposed to have
> -1s and 1s in a unique combination and hence the expand grid question before.
> The reason I need this was because I was going to multiply the conditions by
> this matrix and calculate a t-test or any other stat and I needed unique
> simulations.(these simulations were being generated for further calculations to
> find significant data). As the number of subjects grew more, we decided to
> switch signs for just half the number of subjects so as to reduce the number of
> simulations and get more reliable output. Besides for e.g. 1 1 1 -1 and -1 -1 -1
> 1 would give me the same output being mirror images and so I could reduce the
> number of simulations even more.
> I had written all this is a for loop with the 1s and -1s and then I wasnt really
> sure if I tried bitwise addition whether it would be faster. I thought I could
> then just replace the 0s in the matrix by -1s. I knew that computer time as well
> as RAM required would be a major problem but was still unsure. I have 11
> subjects in one study and so I followed the 2^11 with the expand grid and with
> switching half the signs just change it for 6 of the 11 but just wanted to know
> how I would manage with a higher number of subjects.
> Sorry about all this. I am new to R and I was just trying to speed up the
> process without killing the machine and just wanted to be aware of what was out
> there.
> thanks a million for all your extremely prompt responses. This is really
> appreciated.
> Nameeta
> Quoting Marc Schwartz <>:
>> On Tue, 2006-05-16 at 09:45 +0200, Uwe Ligges wrote:
>>> Nameeta Lobo wrote:
>>>> Hello all
>>>> thank you very much for all your suggestions. I actually need binary
>>>> representations. I tried all the methods that Marc,Jim and Charles have
>>>> suggested and they ran fine(thanks a lot). I tried doing it then with 26
>> and 13
>>>> and that's when the computer gave way. I just got a message with all
>> three
>>>> methods that a vector of .....Kb cannot be allocated. guess I will have
>> to
>>>> change the environment to allow for huge vector size allocation. How do I
>> do that?
>>> You should have *at least* 512Mb in your machine for the solution given
>>> by Charles C. Berry with the numbers given above, better a machine with
>> 1Gb.
>>> Uwe Ligges
>> In addition to Uwe's comment, there are some practical issues that will
>> apply here shortly if Nameeta continues to increase the size of the
>> source vector:
>> 1. R has a limitation of 2^32 - 1 elements in a vector. This is the same
>> for both 32 and 64 bit platforms. Thus, if Nameeta is planning to
>> continue to expand the upper limit of the range, you will hit this
>> fairly quickly. You would then need to consider some form of a
>> partitioning approach if you go beyond that limit.
>> 2. The RAM requirements to simply apply Charles' solution will continue
>> to expand as the upper limit increases, so Uwe's figure is but one
>> number that solves the indicated example of 2^26, but will be
>> insufficient beyond that.
>> 3. This still does not address Nameeta's now explicitly stated desire
>> for the binary character representations, which requires additional
>> memory beyond that required for the initial step of identifying the
>> numbers that meet the 'bit requirements' alone.
>>> From my prior post over the weekend, to store the character matrix of
>> binary representations for 2^25 with 9 bits, which contained 2,042,975
>> values, it required approximately 128 Mb for the final paste()'d
>> versions of the numbers.
>> That is AFTER doing the initial conversion using digitsBase(), which
>> required 400 Mb to store the intermediate integer matrix result. One
>> could certainly do that in a partitioned or loop based approach to
>> conserve memory, but it still will hit practical limits in short order.
>> Those figures too will expand dramatically as the upper limit increases.
>> For example, going from 2^24 with 12 bits to 2^26 with 13 bits, results
>> in going from 2,704,156 values in the result to 10,400,600 in the
>> result. That's a 3.8 fold increase in the result vector size. It does
>> not take long to figure out how much memory will be required for these
>> operations as the upper range increases.
>> Depending upon what Nameeta is planning to do with the final resultant
>> character vectors, one could consider a loop based print method/function
>> that takes the values in the initial 'dec.index' vector and simply
>> cat()'s them to some output. However, you would not be able to actually
>> store them as a single matrix given the memory requirements.
>> Perhaps Nameeta can indicate what the primary problem is here, which
>> might in turn allow someone to offer an alternative approach that is
>> more resource sparing.
>> HTH,
>> Marc Schwartz
> -------------------------------------------------
> This email is intended only for the use of the individual ...{{dropped}} mailing list PLEASE do read the posting guide! Received on Wed May 17 02:01:36 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Wed 17 May 2006 - 04:10:09 EST.

Mailing list information is available at Please read the posting guide before posting to the list.