From: Charles C. Berry <cberry_at_tajo.ucsd.edu>
Date: Wed 17 May 2006 - 01:55:47 EST

If you are SURE you need the exact distribution of the t-statistic (and asymptotics ala Pitman, 1938, just will not do), then

You can get it by hacking the

dec.index <- dec.index + <something.else>

line in the for loop and making several passes through the data to accumulate the necessary pieces to construct the sufficient statistics.

e.g. for the sums for one group <something.else> is 'y[pos]':

> y <- rnorm(24) # simulate 24 observations
> hi.index <- 24
> n.index <- 12
> breaks <- lapply(1:n.index,function(x) choose(-1+1:hi.index,x))
> index <- seq(0,length=choose(hi.index,n.index))
> dec.index <- rep(as.double(0),choose(hi.index,n.index))
> system.time(

```+             for (i in n.index:1){
+               pos <- findInterval(index,breaks[[i]])
+               index <- index - breaks[[i]][pos]
+               dec.index <- dec.index + y[pos]
+             }
+             )
```

[1] 5.13 0.62 5.78 NA NA
> # well-known asymptotics illustrated here:
> hist(dec.index - mean(y)*n.index )
> gc()
```           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  181730  4.9     407500  10.9   350000   9.4
```
Vcells 6854944 52.3 17713407 135.2 17695082 135.1
>

HTH, Chuck

On Tue, 16 May 2006, Nameeta Lobo wrote:

> Hi all
>
> I am actually trying to create a permuatation matrix whose number of columns
> equals the number of subjects I have. This matrix is actually supposed to have
> -1s and 1s in a unique combination and hence the expand grid question before.
> The reason I need this was because I was going to multiply the conditions by
> this matrix and calculate a t-test or any other stat and I needed unique
> simulations.(these simulations were being generated for further calculations to
> find significant data). As the number of subjects grew more, we decided to
> switch signs for just half the number of subjects so as to reduce the number of
> simulations and get more reliable output. Besides for e.g. 1 1 1 -1 and -1 -1 -1
> 1 would give me the same output being mirror images and so I could reduce the
> number of simulations even more.
>
> I had written all this is a for loop with the 1s and -1s and then I wasnt really
> sure if I tried bitwise addition whether it would be faster. I thought I could
> then just replace the 0s in the matrix by -1s. I knew that computer time as well
> as RAM required would be a major problem but was still unsure. I have 11
> subjects in one study and so I followed the 2^11 with the expand grid and with
> switching half the signs just change it for 6 of the 11 but just wanted to know
> how I would manage with a higher number of subjects.
>
> Sorry about all this. I am new to R and I was just trying to speed up the
> process without killing the machine and just wanted to be aware of what was out
> there.
>
> thanks a million for all your extremely prompt responses. This is really
> appreciated.
>
> Nameeta
>
>
>
>
>
>
>
>
> Quoting Marc Schwartz <MSchwartz@mn.rr.com>:
>
>> On Tue, 2006-05-16 at 09:45 +0200, Uwe Ligges wrote:
>>> Nameeta Lobo wrote:
>>>
>>>> Hello all
>>>>
>>>> thank you very much for all your suggestions. I actually need binary
>>>> representations. I tried all the methods that Marc,Jim and Charles have
>>>> suggested and they ran fine(thanks a lot). I tried doing it then with 26
>> and 13
>>>> and that's when the computer gave way. I just got a message with all
>> three
>>>> methods that a vector of .....Kb cannot be allocated. guess I will have
>> to
>>>> change the environment to allow for huge vector size allocation. How do I
>> do that?
>>>
>>>
>>> You should have *at least* 512Mb in your machine for the solution given
>>> by Charles C. Berry with the numbers given above, better a machine with
>> 1Gb.
>>>
>>> Uwe Ligges
>>
>> In addition to Uwe's comment, there are some practical issues that will
>> apply here shortly if Nameeta continues to increase the size of the
>> source vector:
>>
>> 1. R has a limitation of 2^32 - 1 elements in a vector. This is the same
>> for both 32 and 64 bit platforms. Thus, if Nameeta is planning to
>> continue to expand the upper limit of the range, you will hit this
>> fairly quickly. You would then need to consider some form of a
>> partitioning approach if you go beyond that limit.
>>
>> 2. The RAM requirements to simply apply Charles' solution will continue
>> to expand as the upper limit increases, so Uwe's figure is but one
>> number that solves the indicated example of 2^26, but will be
>> insufficient beyond that.
>>
>> 3. This still does not address Nameeta's now explicitly stated desire
>> for the binary character representations, which requires additional
>> memory beyond that required for the initial step of identifying the
>> numbers that meet the 'bit requirements' alone.
>>
>>> From my prior post over the weekend, to store the character matrix of
>> binary representations for 2^25 with 9 bits, which contained 2,042,975
>> values, it required approximately 128 Mb for the final paste()'d
>> versions of the numbers.
>>
>> That is AFTER doing the initial conversion using digitsBase(), which
>> required 400 Mb to store the intermediate integer matrix result. One
>> could certainly do that in a partitioned or loop based approach to
>> conserve memory, but it still will hit practical limits in short order.
>>
>> Those figures too will expand dramatically as the upper limit increases.
>>
>> For example, going from 2^24 with 12 bits to 2^26 with 13 bits, results
>> in going from 2,704,156 values in the result to 10,400,600 in the
>> result. That's a 3.8 fold increase in the result vector size. It does
>> not take long to figure out how much memory will be required for these
>> operations as the upper range increases.
>>
>> Depending upon what Nameeta is planning to do with the final resultant
>> character vectors, one could consider a loop based print method/function
>> that takes the values in the initial 'dec.index' vector and simply
>> cat()'s them to some output. However, you would not be able to actually
>> store them as a single matrix given the memory requirements.
>>
>> Perhaps Nameeta can indicate what the primary problem is here, which
>> might in turn allow someone to offer an alternative approach that is
>> more resource sparing.
>>
>> HTH,
>>
>> Marc Schwartz
>>
>>
>>
>>
>>
>
>
>
>
>
> -------------------------------------------------
> This email is intended only for the use of the individual ...{{dropped}}

R-help@stat.math.ethz.ch mailing list